當前位置: 首頁>>代碼示例 >>用法及示例精選 >>正文


R forcats fct_lump 將不常見因子集中到“其他”級別


一個家庭,用於將滿足某些標準的級別集中在一起。

  • fct_lump_min():出現次數少於min的塊級別。

  • fct_lump_prop():出現次數少於(或等於)prop * n 的腫塊級別。

  • fct_lump_n() 集中除最常見的 n 之外的所有級別(如果 n < 0 則為最不頻繁)

  • fct_lump_lowfreq() 將最不頻繁的級別集中在一起,確保 "other" 仍然是最小的級別。

fct_lump() 的存在主要是出於曆史原因,因為它根據其參數自動在這些不同的方法之間進行選擇。我們不再建議您使用它。

用法

fct_lump(
  f,
  n,
  prop,
  w = NULL,
  other_level = "Other",
  ties.method = c("min", "average", "first", "last", "random", "max")
)

fct_lump_min(f, min, w = NULL, other_level = "Other")

fct_lump_prop(f, prop, w = NULL, other_level = "Other")

fct_lump_n(
  f,
  n,
  w = NULL,
  other_level = "Other",
  ties.method = c("min", "average", "first", "last", "random", "max")
)

fct_lump_lowfreq(f, w = NULL, other_level = "Other")

參數

f

因子(或字符向量)。

n

正值 n 保留最常見的 n 值。負值 n 保留最不常見的 -n 值。如果存在平局,您將至少獲得 abs(n) 值。

prop

正的prop 塊值至少在prop 時間內不出現。負prop 最多不會出現-prop 時間的值。

w

一個可選的數值向量,給出 f 中每個值(不是級別)的頻率權重。

other_level

用於 "other" 值的級別值。始終放置在關卡末尾。

ties.method

指定如何處理關係的字符串。有關詳細信息,請參閱rank()

min

保留至少出現 min 次的級別。

也可以看看

fct_other() 將指定級別轉換為其他級別。

例子

x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1)))
x %>% table()
#> .
#>  A  B  C  D  E  F  G  H  I 
#> 40 10  5 27  1  1  1  1  1 
x %>%
  fct_lump_n(3) %>%
  table()
#> .
#>     A     B     D Other 
#>    40    10    27    10 
x %>%
  fct_lump_prop(0.10) %>%
  table()
#> .
#>     A     B     D Other 
#>    40    10    27    10 
x %>%
  fct_lump_min(5) %>%
  table()
#> .
#>     A     B     C     D Other 
#>    40    10     5    27     5 
x %>%
  fct_lump_lowfreq() %>%
  table()
#> .
#>     A     D Other 
#>    40    27    20 

x <- factor(letters[rpois(100, 5)])
x
#>   [1] b e d f f g e e a e c e b g c d d b i c d d f b d c g g h e d g b i
#>  [35] c h j d d f g c d c h h g d d c b a e e e e g a f c b d b f c d g i
#>  [69] b f d d d b e e c a e h e k d g e g d g h d f g a e i g k g l e
#> Levels: a b c d e f g h i j k l
table(x)
#> x
#>  a  b  c  d  e  f  g  h  i  j  k  l 
#>  5 10 11 20 17  8 15  6  4  1  2  1 
table(fct_lump_lowfreq(x))
#> 
#>  a  b  c  d  e  f  g  h  i  j  k  l 
#>  5 10 11 20 17  8 15  6  4  1  2  1 

# Use positive values to collapse the rarest
fct_lump_n(x, n = 3)
#>   [1] Other e     d     Other Other g     e     e     Other e     Other
#>  [12] e     Other g     Other d     d     Other Other Other d     d    
#>  [23] Other Other d     Other g     g     Other e     d     g     Other
#>  [34] Other Other Other Other d     d     Other g     Other d     Other
#>  [45] Other Other g     d     d     Other Other Other e     e     e    
#>  [56] e     g     Other Other Other Other d     Other Other Other d    
#>  [67] g     Other Other Other d     d     d     Other e     e     Other
#>  [78] Other e     Other e     Other d     g     e     g     d     g    
#>  [89] Other d     Other g     Other e     Other g     Other g     Other
#> [100] e    
#> Levels: d e g Other
fct_lump_prop(x, prop = 0.1)
#>   [1] Other e     d     Other Other g     e     e     Other e     c    
#>  [12] e     Other g     c     d     d     Other Other c     d     d    
#>  [23] Other Other d     c     g     g     Other e     d     g     Other
#>  [34] Other c     Other Other d     d     Other g     c     d     c    
#>  [45] Other Other g     d     d     c     Other Other e     e     e    
#>  [56] e     g     Other Other c     Other d     Other Other c     d    
#>  [67] g     Other Other Other d     d     d     Other e     e     c    
#>  [78] Other e     Other e     Other d     g     e     g     d     g    
#>  [89] Other d     Other g     Other e     Other g     Other g     Other
#> [100] e    
#> Levels: c d e g Other

# Use negative values to collapse the most common
fct_lump_n(x, n = -3)
#>   [1] Other Other Other Other Other Other Other Other Other Other Other
#>  [12] Other Other Other Other Other Other Other Other Other Other Other
#>  [23] Other Other Other Other Other Other Other Other Other Other Other
#>  [34] Other Other Other j     Other Other Other Other Other Other Other
#>  [45] Other Other Other Other Other Other Other Other Other Other Other
#>  [56] Other Other Other Other Other Other Other Other Other Other Other
#>  [67] Other Other Other Other Other Other Other Other Other Other Other
#>  [78] Other Other Other Other k     Other Other Other Other Other Other
#>  [89] Other Other Other Other Other Other Other Other k     Other l    
#> [100] Other
#> Levels: j k l Other
fct_lump_prop(x, prop = -0.1)
#>   [1] b     Other Other f     f     Other Other Other a     Other Other
#>  [12] Other b     Other Other Other Other b     i     Other Other Other
#>  [23] f     b     Other Other Other Other h     Other Other Other b    
#>  [34] i     Other h     j     Other Other f     Other Other Other Other
#>  [45] h     h     Other Other Other Other b     a     Other Other Other
#>  [56] Other Other a     f     Other b     Other b     f     Other Other
#>  [67] Other i     b     f     Other Other Other b     Other Other Other
#>  [78] a     Other h     Other k     Other Other Other Other Other Other
#>  [89] h     Other f     Other a     Other i     Other k     Other l    
#> [100] Other
#> Levels: a b f h i j k l Other

# Use weighted frequencies
w <- c(rep(2, 50), rep(1, 50))
fct_lump_n(x, n = 5, w = w)
#>   [1] b     e     d     Other Other g     e     e     Other e     c    
#>  [12] e     b     g     c     d     d     b     Other c     d     d    
#>  [23] Other b     d     c     g     g     Other e     d     g     b    
#>  [34] Other c     Other Other d     d     Other g     c     d     c    
#>  [45] Other Other g     d     d     c     b     Other e     e     e    
#>  [56] e     g     Other Other c     b     d     b     Other c     d    
#>  [67] g     Other b     Other d     d     d     b     e     e     c    
#>  [78] Other e     Other e     Other d     g     e     g     d     g    
#>  [89] Other d     Other g     Other e     Other g     Other g     Other
#> [100] e    
#> Levels: b c d e g Other

# Use ties.method to control how tied factors are collapsed
fct_lump_n(x, n = 6)
#>   [1] b     e     d     f     f     g     e     e     Other e     c    
#>  [12] e     b     g     c     d     d     b     Other c     d     d    
#>  [23] f     b     d     c     g     g     Other e     d     g     b    
#>  [34] Other c     Other Other d     d     f     g     c     d     c    
#>  [45] Other Other g     d     d     c     b     Other e     e     e    
#>  [56] e     g     Other f     c     b     d     b     f     c     d    
#>  [67] g     Other b     f     d     d     d     b     e     e     c    
#>  [78] Other e     Other e     Other d     g     e     g     d     g    
#>  [89] Other d     f     g     Other e     Other g     Other g     Other
#> [100] e    
#> Levels: b c d e f g Other
fct_lump_n(x, n = 6, ties.method = "max")
#>   [1] b     e     d     f     f     g     e     e     Other e     c    
#>  [12] e     b     g     c     d     d     b     Other c     d     d    
#>  [23] f     b     d     c     g     g     Other e     d     g     b    
#>  [34] Other c     Other Other d     d     f     g     c     d     c    
#>  [45] Other Other g     d     d     c     b     Other e     e     e    
#>  [56] e     g     Other f     c     b     d     b     f     c     d    
#>  [67] g     Other b     f     d     d     d     b     e     e     c    
#>  [78] Other e     Other e     Other d     g     e     g     d     g    
#>  [89] Other d     f     g     Other e     Other g     Other g     Other
#> [100] e    
#> Levels: b c d e f g Other

# Use fct_lump_min() to lump together all levels with fewer than `n` values
table(fct_lump_min(x, min = 10))
#> 
#>     b     c     d     e     g Other 
#>    10    11    20    17    15    27 
table(fct_lump_min(x, min = 15))
#> 
#>     d     e     g Other 
#>    20    17    15    48 
源代碼:R/lump.R

相關用法


注:本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Lump uncommon factor together levels into "other"。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。