R ggplot2 cut_interval 將數值數據離散化為分類數據

cut_interval() 使 n 組具有相等的範圍，cut_number() 使 n 具有(大約)相等數量的觀察值組； cut_width() 製作寬度為 width 的組。

用法

cut_interval(x, n = NULL, length = NULL, ...)

cut_number(x, n = NULL, ...)

cut_width(x, width, center = NULL, boundary = NULL, closed = "right", ...)

參數

x

數值向量

n

要創建的間隔數，或者

length

每個間隔的長度

...

參數傳遞給base::cut.default

breaks: 兩個或多個唯一切割點的數值向量或單個數字(大於或等於 2)，給出 x 要切割成的間隔數。
labels: 結果類別級別的標簽。默認情況下，標簽是使用 "(a,b]" 間隔表示法構造的。如果 labels = FALSE ，則返回簡單整數代碼而不是因子。
right: 邏輯，指示間隔是否應在右側關閉(並在左側打開)，反之亦然。
dig.lab: 未給出標簽時使用的整數。它確定用於格式化中斷編號的位數。
ordered_result: 邏輯：結果應該是有序因子嗎？

width

箱子寬度。

center, boundary

指定容器的邊位置或中心位置。由於所有 bin 都是對齊的，因此指定單個 bin 的位置(不需要在數據範圍內)會影響所有 bin 的位置。如果未指定，則使用“平鋪層算法”，並將邊界設置為 binwidth 的一半。

以整數為中心， width = 1 和 center = 0 。 boundary = 0.5 。

closed

"right" 或 "left" 之一指示該箱中是否包含箱的右邊或左邊。

作者

Randall Prium 貢獻了 cut_width() 的大部分實現。

例子

table(cut_interval(1:100, 10))
#> 
#>    [1,10.9] (10.9,20.8] (20.8,30.7] (30.7,40.6] (40.6,50.5] (50.5,60.4] 
#>          10          10          10          10          10          10 
#> (60.4,70.3] (70.3,80.2] (80.2,90.1]  (90.1,100] 
#>          10          10          10          10 
table(cut_interval(1:100, 11))
#> 
#>   [1,10]  (10,19]  (19,28]  (28,37]  (37,46]  (46,55]  (55,64]  (64,73] 
#>       10        9        9        9        9        9        9        9 
#>  (73,82]  (82,91] (91,100] 
#>        9        9        9 

set.seed(1)

table(cut_number(runif(1000), 10))
#> 
#> [0.00131,0.105]   (0.105,0.201]   (0.201,0.312]   (0.312,0.398] 
#>             100             100             100             100 
#>   (0.398,0.483]   (0.483,0.596]   (0.596,0.706]   (0.706,0.797] 
#>             100             100             100             100 
#>    (0.797,0.91]        (0.91,1] 
#>             100             100 

table(cut_width(runif(1000), 0.1))
#> 
#> [-0.05,0.05]  (0.05,0.15]  (0.15,0.25]  (0.25,0.35]  (0.35,0.45] 
#>           59          109          103           96          110 
#>  (0.45,0.55]  (0.55,0.65]  (0.65,0.75]  (0.75,0.85]  (0.85,0.95] 
#>           85           89           86          113           97 
#>  (0.95,1.05] 
#>           53 
table(cut_width(runif(1000), 0.1, boundary = 0))
#> 
#>   [0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] (0.6,0.7] 
#>       106       106       108       100        99       107        84 
#> (0.7,0.8] (0.8,0.9]   (0.9,1] 
#>        96        95        99 
table(cut_width(runif(1000), 0.1, center = 0))
#> 
#> [-0.05,0.05]  (0.05,0.15]  (0.15,0.25]  (0.25,0.35]  (0.35,0.45] 
#>           72          104           80          104          100 
#>  (0.45,0.55]  (0.55,0.65]  (0.65,0.75]  (0.75,0.85]  (0.85,0.95] 
#>           91           94           75          115          110 
#>  (0.95,1.05] 
#>           55 
table(cut_width(runif(1000), 0.1, labels = FALSE))
#> 
#>   1   2   3   4   5   6   7   8   9  10  11 
#>  49  92 100  98 112 102  88  89  97 116  57

源代碼：R/utilities-break.R

相關用法

注：本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Discretise numeric data into categorical。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。