当前位置: 首页>>代码示例 >>用法及示例精选 >>正文


R forcats fct_lump 将不常见因子集中到“其他”级别


一个家庭,用于将满足某些标准的级别集中在一起。

  • fct_lump_min():出现次数少于min的块级别。

  • fct_lump_prop():出现次数少于(或等于)prop * n 的肿块级别。

  • fct_lump_n() 集中除最常见的 n 之外的所有级别(如果 n < 0 则为最不频繁)

  • fct_lump_lowfreq() 将最不频繁的级别集中在一起,确保 "other" 仍然是最小的级别。

fct_lump() 的存在主要是出于历史原因,因为它根据其参数自动在这些不同的方法之间进行选择。我们不再建议您使用它。

用法

fct_lump(
  f,
  n,
  prop,
  w = NULL,
  other_level = "Other",
  ties.method = c("min", "average", "first", "last", "random", "max")
)

fct_lump_min(f, min, w = NULL, other_level = "Other")

fct_lump_prop(f, prop, w = NULL, other_level = "Other")

fct_lump_n(
  f,
  n,
  w = NULL,
  other_level = "Other",
  ties.method = c("min", "average", "first", "last", "random", "max")
)

fct_lump_lowfreq(f, w = NULL, other_level = "Other")

参数

f

因子(或字符向量)。

n

正值 n 保留最常见的 n 值。负值 n 保留最不常见的 -n 值。如果存在平局,您将至少获得 abs(n) 值。

prop

正的prop 块值至少在prop 时间内不出现。负prop 最多不会出现-prop 时间的值。

w

一个可选的数值向量,给出 f 中每个值(不是级别)的频率权重。

other_level

用于 "other" 值的级别值。始终放置在关卡末尾。

ties.method

指定如何处理关系的字符串。有关详细信息,请参阅rank()

min

保留至少出现 min 次的级别。

也可以看看

fct_other() 将指定级别转换为其他级别。

例子

x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1)))
x %>% table()
#> .
#>  A  B  C  D  E  F  G  H  I 
#> 40 10  5 27  1  1  1  1  1 
x %>%
  fct_lump_n(3) %>%
  table()
#> .
#>     A     B     D Other 
#>    40    10    27    10 
x %>%
  fct_lump_prop(0.10) %>%
  table()
#> .
#>     A     B     D Other 
#>    40    10    27    10 
x %>%
  fct_lump_min(5) %>%
  table()
#> .
#>     A     B     C     D Other 
#>    40    10     5    27     5 
x %>%
  fct_lump_lowfreq() %>%
  table()
#> .
#>     A     D Other 
#>    40    27    20 

x <- factor(letters[rpois(100, 5)])
x
#>   [1] b e d f f g e e a e c e b g c d d b i c d d f b d c g g h e d g b i
#>  [35] c h j d d f g c d c h h g d d c b a e e e e g a f c b d b f c d g i
#>  [69] b f d d d b e e c a e h e k d g e g d g h d f g a e i g k g l e
#> Levels: a b c d e f g h i j k l
table(x)
#> x
#>  a  b  c  d  e  f  g  h  i  j  k  l 
#>  5 10 11 20 17  8 15  6  4  1  2  1 
table(fct_lump_lowfreq(x))
#> 
#>  a  b  c  d  e  f  g  h  i  j  k  l 
#>  5 10 11 20 17  8 15  6  4  1  2  1 

# Use positive values to collapse the rarest
fct_lump_n(x, n = 3)
#>   [1] Other e     d     Other Other g     e     e     Other e     Other
#>  [12] e     Other g     Other d     d     Other Other Other d     d    
#>  [23] Other Other d     Other g     g     Other e     d     g     Other
#>  [34] Other Other Other Other d     d     Other g     Other d     Other
#>  [45] Other Other g     d     d     Other Other Other e     e     e    
#>  [56] e     g     Other Other Other Other d     Other Other Other d    
#>  [67] g     Other Other Other d     d     d     Other e     e     Other
#>  [78] Other e     Other e     Other d     g     e     g     d     g    
#>  [89] Other d     Other g     Other e     Other g     Other g     Other
#> [100] e    
#> Levels: d e g Other
fct_lump_prop(x, prop = 0.1)
#>   [1] Other e     d     Other Other g     e     e     Other e     c    
#>  [12] e     Other g     c     d     d     Other Other c     d     d    
#>  [23] Other Other d     c     g     g     Other e     d     g     Other
#>  [34] Other c     Other Other d     d     Other g     c     d     c    
#>  [45] Other Other g     d     d     c     Other Other e     e     e    
#>  [56] e     g     Other Other c     Other d     Other Other c     d    
#>  [67] g     Other Other Other d     d     d     Other e     e     c    
#>  [78] Other e     Other e     Other d     g     e     g     d     g    
#>  [89] Other d     Other g     Other e     Other g     Other g     Other
#> [100] e    
#> Levels: c d e g Other

# Use negative values to collapse the most common
fct_lump_n(x, n = -3)
#>   [1] Other Other Other Other Other Other Other Other Other Other Other
#>  [12] Other Other Other Other Other Other Other Other Other Other Other
#>  [23] Other Other Other Other Other Other Other Other Other Other Other
#>  [34] Other Other Other j     Other Other Other Other Other Other Other
#>  [45] Other Other Other Other Other Other Other Other Other Other Other
#>  [56] Other Other Other Other Other Other Other Other Other Other Other
#>  [67] Other Other Other Other Other Other Other Other Other Other Other
#>  [78] Other Other Other Other k     Other Other Other Other Other Other
#>  [89] Other Other Other Other Other Other Other Other k     Other l    
#> [100] Other
#> Levels: j k l Other
fct_lump_prop(x, prop = -0.1)
#>   [1] b     Other Other f     f     Other Other Other a     Other Other
#>  [12] Other b     Other Other Other Other b     i     Other Other Other
#>  [23] f     b     Other Other Other Other h     Other Other Other b    
#>  [34] i     Other h     j     Other Other f     Other Other Other Other
#>  [45] h     h     Other Other Other Other b     a     Other Other Other
#>  [56] Other Other a     f     Other b     Other b     f     Other Other
#>  [67] Other i     b     f     Other Other Other b     Other Other Other
#>  [78] a     Other h     Other k     Other Other Other Other Other Other
#>  [89] h     Other f     Other a     Other i     Other k     Other l    
#> [100] Other
#> Levels: a b f h i j k l Other

# Use weighted frequencies
w <- c(rep(2, 50), rep(1, 50))
fct_lump_n(x, n = 5, w = w)
#>   [1] b     e     d     Other Other g     e     e     Other e     c    
#>  [12] e     b     g     c     d     d     b     Other c     d     d    
#>  [23] Other b     d     c     g     g     Other e     d     g     b    
#>  [34] Other c     Other Other d     d     Other g     c     d     c    
#>  [45] Other Other g     d     d     c     b     Other e     e     e    
#>  [56] e     g     Other Other c     b     d     b     Other c     d    
#>  [67] g     Other b     Other d     d     d     b     e     e     c    
#>  [78] Other e     Other e     Other d     g     e     g     d     g    
#>  [89] Other d     Other g     Other e     Other g     Other g     Other
#> [100] e    
#> Levels: b c d e g Other

# Use ties.method to control how tied factors are collapsed
fct_lump_n(x, n = 6)
#>   [1] b     e     d     f     f     g     e     e     Other e     c    
#>  [12] e     b     g     c     d     d     b     Other c     d     d    
#>  [23] f     b     d     c     g     g     Other e     d     g     b    
#>  [34] Other c     Other Other d     d     f     g     c     d     c    
#>  [45] Other Other g     d     d     c     b     Other e     e     e    
#>  [56] e     g     Other f     c     b     d     b     f     c     d    
#>  [67] g     Other b     f     d     d     d     b     e     e     c    
#>  [78] Other e     Other e     Other d     g     e     g     d     g    
#>  [89] Other d     f     g     Other e     Other g     Other g     Other
#> [100] e    
#> Levels: b c d e f g Other
fct_lump_n(x, n = 6, ties.method = "max")
#>   [1] b     e     d     f     f     g     e     e     Other e     c    
#>  [12] e     b     g     c     d     d     b     Other c     d     d    
#>  [23] f     b     d     c     g     g     Other e     d     g     b    
#>  [34] Other c     Other Other d     d     f     g     c     d     c    
#>  [45] Other Other g     d     d     c     b     Other e     e     e    
#>  [56] e     g     Other f     c     b     d     b     f     c     d    
#>  [67] g     Other b     f     d     d     d     b     e     e     c    
#>  [78] Other e     Other e     Other d     g     e     g     d     g    
#>  [89] Other d     f     g     Other e     Other g     Other g     Other
#> [100] e    
#> Levels: b c d e f g Other

# Use fct_lump_min() to lump together all levels with fewer than `n` values
table(fct_lump_min(x, min = 10))
#> 
#>     b     c     d     e     g Other 
#>    10    11    20    17    15    27 
table(fct_lump_min(x, min = 15))
#> 
#>     d     e     g Other 
#>    20    17    15    48 
源代码:R/lump.R

相关用法


注:本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Lump uncommon factor together levels into "other"。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。