一個家庭,用於將滿足某些標準的級別集中在一起。
-
fct_lump_min()
:出現次數少於min
的塊級別。 -
fct_lump_prop()
:出現次數少於(或等於)prop * n
的腫塊級別。 -
fct_lump_n()
集中除最常見的n
之外的所有級別(如果n < 0
則為最不頻繁) -
fct_lump_lowfreq()
將最不頻繁的級別集中在一起,確保 "other" 仍然是最小的級別。
fct_lump()
的存在主要是出於曆史原因,因為它根據其參數自動在這些不同的方法之間進行選擇。我們不再建議您使用它。
用法
fct_lump(
f,
n,
prop,
w = NULL,
other_level = "Other",
ties.method = c("min", "average", "first", "last", "random", "max")
)
fct_lump_min(f, min, w = NULL, other_level = "Other")
fct_lump_prop(f, prop, w = NULL, other_level = "Other")
fct_lump_n(
f,
n,
w = NULL,
other_level = "Other",
ties.method = c("min", "average", "first", "last", "random", "max")
)
fct_lump_lowfreq(f, w = NULL, other_level = "Other")
參數
- f
-
因子(或字符向量)。
- n
-
正值
n
保留最常見的n
值。負值n
保留最不常見的-n
值。如果存在平局,您將至少獲得abs(n)
值。 - prop
-
正的
prop
塊值至少在prop
時間內不出現。負prop
最多不會出現-prop
時間的值。 - w
-
一個可選的數值向量,給出 f 中每個值(不是級別)的頻率權重。
- other_level
-
用於 "other" 值的級別值。始終放置在關卡末尾。
- ties.method
-
指定如何處理關係的字符串。有關詳細信息,請參閱
rank()
。 - min
-
保留至少出現
min
次的級別。
也可以看看
fct_other()
將指定級別轉換為其他級別。
例子
x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1)))
x %>% table()
#> .
#> A B C D E F G H I
#> 40 10 5 27 1 1 1 1 1
x %>%
fct_lump_n(3) %>%
table()
#> .
#> A B D Other
#> 40 10 27 10
x %>%
fct_lump_prop(0.10) %>%
table()
#> .
#> A B D Other
#> 40 10 27 10
x %>%
fct_lump_min(5) %>%
table()
#> .
#> A B C D Other
#> 40 10 5 27 5
x %>%
fct_lump_lowfreq() %>%
table()
#> .
#> A D Other
#> 40 27 20
x <- factor(letters[rpois(100, 5)])
x
#> [1] b e d f f g e e a e c e b g c d d b i c d d f b d c g g h e d g b i
#> [35] c h j d d f g c d c h h g d d c b a e e e e g a f c b d b f c d g i
#> [69] b f d d d b e e c a e h e k d g e g d g h d f g a e i g k g l e
#> Levels: a b c d e f g h i j k l
table(x)
#> x
#> a b c d e f g h i j k l
#> 5 10 11 20 17 8 15 6 4 1 2 1
table(fct_lump_lowfreq(x))
#>
#> a b c d e f g h i j k l
#> 5 10 11 20 17 8 15 6 4 1 2 1
# Use positive values to collapse the rarest
fct_lump_n(x, n = 3)
#> [1] Other e d Other Other g e e Other e Other
#> [12] e Other g Other d d Other Other Other d d
#> [23] Other Other d Other g g Other e d g Other
#> [34] Other Other Other Other d d Other g Other d Other
#> [45] Other Other g d d Other Other Other e e e
#> [56] e g Other Other Other Other d Other Other Other d
#> [67] g Other Other Other d d d Other e e Other
#> [78] Other e Other e Other d g e g d g
#> [89] Other d Other g Other e Other g Other g Other
#> [100] e
#> Levels: d e g Other
fct_lump_prop(x, prop = 0.1)
#> [1] Other e d Other Other g e e Other e c
#> [12] e Other g c d d Other Other c d d
#> [23] Other Other d c g g Other e d g Other
#> [34] Other c Other Other d d Other g c d c
#> [45] Other Other g d d c Other Other e e e
#> [56] e g Other Other c Other d Other Other c d
#> [67] g Other Other Other d d d Other e e c
#> [78] Other e Other e Other d g e g d g
#> [89] Other d Other g Other e Other g Other g Other
#> [100] e
#> Levels: c d e g Other
# Use negative values to collapse the most common
fct_lump_n(x, n = -3)
#> [1] Other Other Other Other Other Other Other Other Other Other Other
#> [12] Other Other Other Other Other Other Other Other Other Other Other
#> [23] Other Other Other Other Other Other Other Other Other Other Other
#> [34] Other Other Other j Other Other Other Other Other Other Other
#> [45] Other Other Other Other Other Other Other Other Other Other Other
#> [56] Other Other Other Other Other Other Other Other Other Other Other
#> [67] Other Other Other Other Other Other Other Other Other Other Other
#> [78] Other Other Other Other k Other Other Other Other Other Other
#> [89] Other Other Other Other Other Other Other Other k Other l
#> [100] Other
#> Levels: j k l Other
fct_lump_prop(x, prop = -0.1)
#> [1] b Other Other f f Other Other Other a Other Other
#> [12] Other b Other Other Other Other b i Other Other Other
#> [23] f b Other Other Other Other h Other Other Other b
#> [34] i Other h j Other Other f Other Other Other Other
#> [45] h h Other Other Other Other b a Other Other Other
#> [56] Other Other a f Other b Other b f Other Other
#> [67] Other i b f Other Other Other b Other Other Other
#> [78] a Other h Other k Other Other Other Other Other Other
#> [89] h Other f Other a Other i Other k Other l
#> [100] Other
#> Levels: a b f h i j k l Other
# Use weighted frequencies
w <- c(rep(2, 50), rep(1, 50))
fct_lump_n(x, n = 5, w = w)
#> [1] b e d Other Other g e e Other e c
#> [12] e b g c d d b Other c d d
#> [23] Other b d c g g Other e d g b
#> [34] Other c Other Other d d Other g c d c
#> [45] Other Other g d d c b Other e e e
#> [56] e g Other Other c b d b Other c d
#> [67] g Other b Other d d d b e e c
#> [78] Other e Other e Other d g e g d g
#> [89] Other d Other g Other e Other g Other g Other
#> [100] e
#> Levels: b c d e g Other
# Use ties.method to control how tied factors are collapsed
fct_lump_n(x, n = 6)
#> [1] b e d f f g e e Other e c
#> [12] e b g c d d b Other c d d
#> [23] f b d c g g Other e d g b
#> [34] Other c Other Other d d f g c d c
#> [45] Other Other g d d c b Other e e e
#> [56] e g Other f c b d b f c d
#> [67] g Other b f d d d b e e c
#> [78] Other e Other e Other d g e g d g
#> [89] Other d f g Other e Other g Other g Other
#> [100] e
#> Levels: b c d e f g Other
fct_lump_n(x, n = 6, ties.method = "max")
#> [1] b e d f f g e e Other e c
#> [12] e b g c d d b Other c d d
#> [23] f b d c g g Other e d g b
#> [34] Other c Other Other d d f g c d c
#> [45] Other Other g d d c b Other e e e
#> [56] e g Other f c b d b f c d
#> [67] g Other b f d d d b e e c
#> [78] Other e Other e Other d g e g d g
#> [89] Other d f g Other e Other g Other g Other
#> [100] e
#> Levels: b c d e f g Other
# Use fct_lump_min() to lump together all levels with fewer than `n` values
table(fct_lump_min(x, min = 10))
#>
#> b c d e g Other
#> 10 11 20 17 15 27
table(fct_lump_min(x, min = 15))
#>
#> d e g Other
#> 20 17 15 48
相關用法
- R forcats fct_relevel 手動重新排序因子級別
- R forcats fct_anon 匿名因子水平
- R forcats fct_inorder 按首次出現、頻率或數字順序對因子水平重新排序
- R forcats fct_rev 因子水平的倒序
- R forcats fct_match 測試因子中是否存在水平
- R forcats fct_relabel 使用函數重新標記因子水平,並根據需要折疊
- R forcats fct_drop 刪除未使用的級別
- R forcats fct_c 連接因子,組合級別
- R forcats fct_collapse 將因子級別折疊為手動定義的組
- R forcats fct_shuffle 隨機排列因子水平
- R forcats fct_cross 組合兩個或多個因子的水平以創建新因子
- R forcats fct_other 手動將級別替換為“其他”
- R forcats fct_recode 手動更改因子水平
- R forcats fct_na_value_to_level NA 值和 NA 水平之間的轉換
- R forcats fct_unique 一個因子的唯一值,作為一個因子
- R forcats fct_shift 將因子水平向左或向右移動,在末尾環繞
- R forcats fct_unify 統一因子列表中的水平
- R forcats fct_count 計算因子中的條目數
- R forcats fct_expand 向因子添加附加級別
- R forcats fct_reorder 通過沿另一個變量排序來重新排序因子水平
- R forcats fct 創建一個因子
- R forcats as_factor 將輸入轉換為因子
- R forcats lvls_union 查找因子列表中的所有級別
- R forcats lvls 用於操縱級別的低級函數
- R forcats gss_cat 一般社會調查中的分類變量樣本
注:本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Lump uncommon factor together levels into "other"。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。