R dplyr summarise_all 汇总多列

作用域动词( _if 、 _at 、 _all )已被现有动词中的 pick() 或 across() 取代。有关详细信息，请参阅vignette("colwise")。

summarise() 的 scoped 变体可以轻松地将相同的转换应用于多个变量。有三种变体。

summarise_all() 影响每个变量
summarise_at() 影响使用字符向量或 vars() 选择的变量
summarise_if() 影响使用谓词函数选择的变量

用法

summarise_all(.tbl, .funs, ...)

summarise_if(.tbl, .predicate, .funs, ...)

summarise_at(.tbl, .vars, .funs, ..., .cols = NULL)

summarize_all(.tbl, .funs, ...)

summarize_if(.tbl, .predicate, .funs, ...)

summarize_at(.tbl, .vars, .funs, ..., .cols = NULL)

参数

.tbl: tbl 对象。
.funs: 函数 fun 、 quosure 样式 lambda ~ fun(.) 或任一形式的列表。
...: .funs 中函数调用的附加参数。这些仅在 tidy dots 支持下评估一次。
.predicate: 应用于列或逻辑向量的谓词函数。选择.predicate 为或返回TRUE 的变量。该参数传递给rlang::as_function()，因此支持quosure-style lambda 函数和表示函数名称的字符串。
.vars: 由 vars() 生成的列列表、列名称的字符向量、列位置的数值向量或 NULL 。
.cols: 此参数已重命名为 .vars 以符合 dplyr 的术语，并且已弃用。

值

一个 DataFrame 。默认情况下，新创建的列具有唯一标识输出所需的最短名称。要强制包含名称(即使不需要)，请为输入命名(有关详细信息，请参阅示例)。

对变量进行分组

如果应用于分组的 tibble，则这些操作不会应用于分组变量。该行为取决于选择是隐式的(all 和 if 选择)还是显式的(at 选择)。

对 summarise_at() 中的显式选择所覆盖的变量进行分组始终是一个错误。将 -group_cols() 添加到 vars() 选择中以避免出现这种情况：
```
data %>%
  summarise_at(vars(-group_cols(), ...), myoperation)
```
或者从列名的字符向量中删除group_vars()：
```
nms <- setdiff(nms, group_vars(data))
data %>% summarise_at(nms, myoperation)
```
summarise_all() 和 summarise_if() 会默默地忽略隐式选择所涵盖的分组变量。

命名

新列的名称源自输入变量的名称和函数的名称。

如果只有一个未命名函数(即，如果 .funs 是长度为 1 的未命名列表)，则使用输入变量的名称来命名新列；
对于 _at 函数，如果只有一个未命名变量(即，如果 .vars 的形式为 vars(a_single_column) )并且 .funs 的长度大于 1，则使用函数的名称来命名新列;
否则，通过连接输入变量的名称和函数的名称来创建新名称，并用下划线 "_" 分隔。

.funs 参数可以是命名或未命名列表。如果函数未命名并且无法自动派生名称，则使用 "fn#" 形式的名称。同样，vars() 接受命名和未命名参数。如果 .vars 中的变量被命名，则会创建一个具有该名称的新列。

新列中的名称冲突使用唯一的后缀消除歧义。

也可以看看

The other scoped verbs、vars()

例子

# The _at() variants directly support strings:
starwars %>%
  summarise_at(c("height", "mass"), mean, na.rm = TRUE)
#> # A tibble: 1 × 2
#>   height  mass
#>    <dbl> <dbl>
#> 1   174.  97.3
# ->
starwars %>% summarise(across(c("height", "mass"), ~ mean(.x, na.rm = TRUE)))
#> # A tibble: 1 × 2
#>   height  mass
#>    <dbl> <dbl>
#> 1   174.  97.3

# You can also supply selection helpers to _at() functions but you have
# to quote them with vars():
starwars %>%
  summarise_at(vars(height:mass), mean, na.rm = TRUE)
#> # A tibble: 1 × 2
#>   height  mass
#>    <dbl> <dbl>
#> 1   174.  97.3
# ->
starwars %>%
  summarise(across(height:mass, ~ mean(.x, na.rm = TRUE)))
#> # A tibble: 1 × 2
#>   height  mass
#>    <dbl> <dbl>
#> 1   174.  97.3

# The _if() variants apply a predicate function (a function that
# returns TRUE or FALSE) to determine the relevant subset of
# columns. Here we apply mean() to the numeric columns:
starwars %>%
  summarise_if(is.numeric, mean, na.rm = TRUE)
#> # A tibble: 1 × 3
#>   height  mass birth_year
#>    <dbl> <dbl>      <dbl>
#> 1   174.  97.3       87.6
starwars %>%
  summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))
#> # A tibble: 1 × 3
#>   height  mass birth_year
#>    <dbl> <dbl>      <dbl>
#> 1   174.  97.3       87.6

by_species <- iris %>%
  group_by(Species)

# If you want to apply multiple transformations, pass a list of
# functions. When there are multiple functions, they create new
# variables instead of modifying the variables in place:
by_species %>%
  summarise_all(list(min, max))
#> # A tibble: 3 × 9
#>   Species    Sepal.Length_fn1 Sepal.Width_fn1 Petal.Length_fn1
#>   <fct>                 <dbl>           <dbl>            <dbl>
#> 1 setosa                  4.3             2.3              1  
#> 2 versicolor              4.9             2                3  
#> 3 virginica               4.9             2.2              4.5
#> # ℹ 5 more variables: Petal.Width_fn1 <dbl>, Sepal.Length_fn2 <dbl>,
#> #   Sepal.Width_fn2 <dbl>, Petal.Length_fn2 <dbl>, Petal.Width_fn2 <dbl>
# ->
by_species %>%
  summarise(across(everything(), list(min = min, max = max)))
#> # A tibble: 3 × 9
#>   Species    Sepal.Length_min Sepal.Length_max Sepal.Width_min
#>   <fct>                 <dbl>            <dbl>           <dbl>
#> 1 setosa                  4.3              5.8             2.3
#> 2 versicolor              4.9              7               2  
#> 3 virginica               4.9              7.9             2.2
#> # ℹ 5 more variables: Sepal.Width_max <dbl>, Petal.Length_min <dbl>,
#> #   Petal.Length_max <dbl>, Petal.Width_min <dbl>, Petal.Width_max <dbl>

源代码：R/colwise-mutate.R

相关用法

注：本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Summarise multiple columns。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。