R dplyr count 計算每組中的觀察結果

count() 可讓您快速計算一個或多個變量的唯一值：df %>% count(a, b) 大致相當於 df %>% group_by(a, b) %>% summarise(n = n()) 。 count() 與 tally() 配對，lower-level 助手相當於 df %>% summarise(n = n()) 。提供 wt 來執行加權計數，將摘要從 n = n() 切換到 n = sum(wt) 。

add_count() 和 add_tally() 相當於 count() 和 tally()，但使用 mutate() 而不是 summarise()，以便它們添加具有分組計數的新列。

用法

count(x, ..., wt = NULL, sort = FALSE, name = NULL)

# S3 method for data.frame
count(
  x,
  ...,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  .drop = group_by_drop_default(x)
)

tally(x, wt = NULL, sort = FALSE, name = NULL)

add_count(x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = deprecated())

add_tally(x, wt = NULL, sort = FALSE, name = NULL)

參數

x

數據幀、數據幀擴展(例如 tibble)或惰性數據幀(例如來自 dbplyr 或 dtplyr)。

...

< data-masking > 分組依據的變量。

wt

< data-masking > 頻率權重。可以是 NULL 或變量：

如果是NULL(默認值)，則計算每個組中的行數。
如果是變量，則計算每個組的sum(wt)。

sort

如果 TRUE ，將在頂部顯示最大的組。

name

輸出中新列的名稱。

如果省略，則默認為 n 。如果已經有一個名為 n 的列，它將使用 nn 。如果有一個名為 n 和 nn 的列，它將使用 nnn ，依此類推，添加 n 直到獲得新名稱。

.drop

處理數據中未出現的因子水平，傳遞給 group_by() 。

對於 count() ：如果 FALSE 將包括空組的計數(即數據中不存在的因子水平)。

為了add_count()：已棄用，因為它實際上不會影響輸出。

值

與 .data 類型相同的對象。 count() 和 add_count()

瞬時分組，因此輸出具有與輸入相同的組。

例子

# count() is a convenient way to get a sense of the distribution of
# values in a dataset
starwars %>% count(species)
#> # A tibble: 38 × 2
#>    species       n
#>    <chr>     <int>
#>  1 Aleena        1
#>  2 Besalisk      1
#>  3 Cerean        1
#>  4 Chagrian      1
#>  5 Clawdite      1
#>  6 Droid         6
#>  7 Dug           1
#>  8 Ewok          1
#>  9 Geonosian     1
#> 10 Gungan        3
#> # ℹ 28 more rows
starwars %>% count(species, sort = TRUE)
#> # A tibble: 38 × 2
#>    species      n
#>    <chr>    <int>
#>  1 Human       35
#>  2 Droid        6
#>  3 NA           4
#>  4 Gungan       3
#>  5 Kaminoan     2
#>  6 Mirialan     2
#>  7 Twi'lek      2
#>  8 Wookiee      2
#>  9 Zabrak       2
#> 10 Aleena       1
#> # ℹ 28 more rows
starwars %>% count(sex, gender, sort = TRUE)
#> # A tibble: 6 × 3
#>   sex            gender        n
#>   <chr>          <chr>     <int>
#> 1 male           masculine    60
#> 2 female         feminine     16
#> 3 none           masculine     5
#> 4 NA             NA            4
#> 5 hermaphroditic masculine     1
#> 6 none           feminine      1
starwars %>% count(birth_decade = round(birth_year, -1))
#> # A tibble: 15 × 2
#>    birth_decade     n
#>           <dbl> <int>
#>  1           10     1
#>  2           20     6
#>  3           30     4
#>  4           40     6
#>  5           50     8
#>  6           60     4
#>  7           70     4
#>  8           80     2
#>  9           90     3
#> 10          100     1
#> 11          110     1
#> 12          200     1
#> 13          600     1
#> 14          900     1
#> 15           NA    44

# use the `wt` argument to perform a weighted count. This is useful
# when the data has already been aggregated once
df <- tribble(
  ~name,    ~gender,   ~runs,
  "Max",    "male",       10,
  "Sandra", "female",      1,
  "Susan",  "female",      4
)
# counts rows:
df %>% count(gender)
#> # A tibble: 2 × 2
#>   gender     n
#>   <chr>  <int>
#> 1 female     2
#> 2 male       1
# counts runs:
df %>% count(gender, wt = runs)
#> # A tibble: 2 × 2
#>   gender     n
#>   <chr>  <dbl>
#> 1 female     5
#> 2 male      10

# When factors are involved, `.drop = FALSE` can be used to retain factor
# levels that don't appear in the data
df2 <- tibble(
  id = 1:5,
  type = factor(c("a", "c", "a", NA, "a"), levels = c("a", "b", "c"))
)
df2 %>% count(type)
#> # A tibble: 3 × 2
#>   type      n
#>   <fct> <int>
#> 1 a         3
#> 2 c         1
#> 3 NA        1
df2 %>% count(type, .drop = FALSE)
#> # A tibble: 4 × 2
#>   type      n
#>   <fct> <int>
#> 1 a         3
#> 2 b         0
#> 3 c         1
#> 4 NA        1

# Or, using `group_by()`:
df2 %>% group_by(type, .drop = FALSE) %>% count()
#> # A tibble: 4 × 2
#> # Groups:   type [4]
#>   type      n
#>   <fct> <int>
#> 1 a         3
#> 2 b         0
#> 3 c         1
#> 4 NA        1

# tally() is a lower-level function that assumes you've done the grouping
starwars %>% tally()
#> # A tibble: 1 × 1
#>       n
#>   <int>
#> 1    87
starwars %>% group_by(species) %>% tally()
#> # A tibble: 38 × 2
#>    species       n
#>    <chr>     <int>
#>  1 Aleena        1
#>  2 Besalisk      1
#>  3 Cerean        1
#>  4 Chagrian      1
#>  5 Clawdite      1
#>  6 Droid         6
#>  7 Dug           1
#>  8 Ewok          1
#>  9 Geonosian     1
#> 10 Gungan        3
#> # ℹ 28 more rows

# both count() and tally() have add_ variants that work like
# mutate() instead of summarise
df %>% add_count(gender, wt = runs)
#> # A tibble: 3 × 4
#>   name   gender  runs     n
#>   <chr>  <chr>  <dbl> <dbl>
#> 1 Max    male      10    10
#> 2 Sandra female     1     5
#> 3 Susan  female     4     5
df %>% add_tally(wt = runs)
#> # A tibble: 3 × 4
#>   name   gender  runs     n
#>   <chr>  <chr>  <dbl> <dbl>
#> 1 Max    male      10    15
#> 2 Sandra female     1    15
#> 3 Susan  female     4    15

源代碼：R/count-tally.R

相關用法

注：本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Count the observations in each group。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。