R dplyr count 计算每组中的观察结果

count() 可让您快速计算一个或多个变量的唯一值：df %>% count(a, b) 大致相当于 df %>% group_by(a, b) %>% summarise(n = n()) 。 count() 与 tally() 配对，lower-level 助手相当于 df %>% summarise(n = n()) 。提供 wt 来执行加权计数，将摘要从 n = n() 切换到 n = sum(wt) 。

add_count() 和 add_tally() 相当于 count() 和 tally()，但使用 mutate() 而不是 summarise()，以便它们添加具有分组计数的新列。

用法

count(x, ..., wt = NULL, sort = FALSE, name = NULL)

# S3 method for data.frame
count(
  x,
  ...,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  .drop = group_by_drop_default(x)
)

tally(x, wt = NULL, sort = FALSE, name = NULL)

add_count(x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = deprecated())

add_tally(x, wt = NULL, sort = FALSE, name = NULL)

参数

x

数据帧、数据帧扩展(例如 tibble)或惰性数据帧(例如来自 dbplyr 或 dtplyr)。

...

< data-masking > 分组依据的变量。

wt

< data-masking > 频率权重。可以是 NULL 或变量：

如果是NULL(默认值)，则计算每个组中的行数。
如果是变量，则计算每个组的sum(wt)。

sort

如果 TRUE ，将在顶部显示最大的组。

name

输出中新列的名称。

如果省略，则默认为 n 。如果已经有一个名为 n 的列，它将使用 nn 。如果有一个名为 n 和 nn 的列，它将使用 nnn ，依此类推，添加 n 直到获得新名称。

.drop

处理数据中未出现的因子水平，传递给 group_by() 。

对于 count() ：如果 FALSE 将包括空组的计数(即数据中不存在的因子水平)。

为了add_count()：已弃用，因为它实际上不会影响输出。

值

与 .data 类型相同的对象。 count() 和 add_count()

瞬时分组，因此输出具有与输入相同的组。

例子

# count() is a convenient way to get a sense of the distribution of
# values in a dataset
starwars %>% count(species)
#> # A tibble: 38 × 2
#>    species       n
#>    <chr>     <int>
#>  1 Aleena        1
#>  2 Besalisk      1
#>  3 Cerean        1
#>  4 Chagrian      1
#>  5 Clawdite      1
#>  6 Droid         6
#>  7 Dug           1
#>  8 Ewok          1
#>  9 Geonosian     1
#> 10 Gungan        3
#> # ℹ 28 more rows
starwars %>% count(species, sort = TRUE)
#> # A tibble: 38 × 2
#>    species      n
#>    <chr>    <int>
#>  1 Human       35
#>  2 Droid        6
#>  3 NA           4
#>  4 Gungan       3
#>  5 Kaminoan     2
#>  6 Mirialan     2
#>  7 Twi'lek      2
#>  8 Wookiee      2
#>  9 Zabrak       2
#> 10 Aleena       1
#> # ℹ 28 more rows
starwars %>% count(sex, gender, sort = TRUE)
#> # A tibble: 6 × 3
#>   sex            gender        n
#>   <chr>          <chr>     <int>
#> 1 male           masculine    60
#> 2 female         feminine     16
#> 3 none           masculine     5
#> 4 NA             NA            4
#> 5 hermaphroditic masculine     1
#> 6 none           feminine      1
starwars %>% count(birth_decade = round(birth_year, -1))
#> # A tibble: 15 × 2
#>    birth_decade     n
#>           <dbl> <int>
#>  1           10     1
#>  2           20     6
#>  3           30     4
#>  4           40     6
#>  5           50     8
#>  6           60     4
#>  7           70     4
#>  8           80     2
#>  9           90     3
#> 10          100     1
#> 11          110     1
#> 12          200     1
#> 13          600     1
#> 14          900     1
#> 15           NA    44

# use the `wt` argument to perform a weighted count. This is useful
# when the data has already been aggregated once
df <- tribble(
  ~name,    ~gender,   ~runs,
  "Max",    "male",       10,
  "Sandra", "female",      1,
  "Susan",  "female",      4
)
# counts rows:
df %>% count(gender)
#> # A tibble: 2 × 2
#>   gender     n
#>   <chr>  <int>
#> 1 female     2
#> 2 male       1
# counts runs:
df %>% count(gender, wt = runs)
#> # A tibble: 2 × 2
#>   gender     n
#>   <chr>  <dbl>
#> 1 female     5
#> 2 male      10

# When factors are involved, `.drop = FALSE` can be used to retain factor
# levels that don't appear in the data
df2 <- tibble(
  id = 1:5,
  type = factor(c("a", "c", "a", NA, "a"), levels = c("a", "b", "c"))
)
df2 %>% count(type)
#> # A tibble: 3 × 2
#>   type      n
#>   <fct> <int>
#> 1 a         3
#> 2 c         1
#> 3 NA        1
df2 %>% count(type, .drop = FALSE)
#> # A tibble: 4 × 2
#>   type      n
#>   <fct> <int>
#> 1 a         3
#> 2 b         0
#> 3 c         1
#> 4 NA        1

# Or, using `group_by()`:
df2 %>% group_by(type, .drop = FALSE) %>% count()
#> # A tibble: 4 × 2
#> # Groups:   type [4]
#>   type      n
#>   <fct> <int>
#> 1 a         3
#> 2 b         0
#> 3 c         1
#> 4 NA        1

# tally() is a lower-level function that assumes you've done the grouping
starwars %>% tally()
#> # A tibble: 1 × 1
#>       n
#>   <int>
#> 1    87
starwars %>% group_by(species) %>% tally()
#> # A tibble: 38 × 2
#>    species       n
#>    <chr>     <int>
#>  1 Aleena        1
#>  2 Besalisk      1
#>  3 Cerean        1
#>  4 Chagrian      1
#>  5 Clawdite      1
#>  6 Droid         6
#>  7 Dug           1
#>  8 Ewok          1
#>  9 Geonosian     1
#> 10 Gungan        3
#> # ℹ 28 more rows

# both count() and tally() have add_ variants that work like
# mutate() instead of summarise
df %>% add_count(gender, wt = runs)
#> # A tibble: 3 × 4
#>   name   gender  runs     n
#>   <chr>  <chr>  <dbl> <dbl>
#> 1 Max    male      10    10
#> 2 Sandra female     1     5
#> 3 Susan  female     4     5
df %>% add_tally(wt = runs)
#> # A tibble: 3 × 4
#>   name   gender  runs     n
#>   <chr>  <chr>  <dbl> <dbl>
#> 1 Max    male      10    15
#> 2 Sandra female     1    15
#> 3 Susan  female     4    15

源代码：R/count-tally.R

相关用法

注：本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Count the observations in each group。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。