R dplyr pick 选择列的子集

pick() 提供了一种在 "data-masking" 函数(如 mutate() 或 summarise() )内使用 select() 语义轻松从数据中选择列子集的方法。 pick() 返回一个 DataFrame ，其中包含当前组的选定列。

pick() 与 across() 互补：

使用 pick() ，您通常将函数应用于完整数据帧。
对于 across() ，您通常会对每一列应用一个函数。

用法

pick(...)

参数

...

<tidy-select>

可供选择的列。

您无法选择分组列，因为它们已经由动词自动处理(即 summarise() 或 mutate() )。

值

包含当前组的选定列的小标题。

细节

理论上， pick() 旨在可替换为对 tibble() 的等效调用。例如，pick(a, c) 可以替换为 tibble(a = a, c = c) ，并且带有列 a 、 b 和 c 的数据帧上的 pick(everything()) 可以替换为 tibble(a = a, b = b, c = c) 。 pick() 通过返回 1 行、0 列小标题专门处理空选择的情况，因此精确替换更像是：

size <- vctrs::vec_size_common(..., .absent = 1L)
out <- vctrs::vec_recycle_common(..., .size = size)
tibble::new_tibble(out, nrow = size)

也可以看看

across()

例子

df <- tibble(
  x = c(3, 2, 2, 2, 1),
  y = c(0, 2, 1, 1, 4),
  z1 = c("a", "a", "a", "b", "a"),
  z2 = c("c", "d", "d", "a", "c")
)
df
#> # A tibble: 5 × 4
#>       x     y z1    z2   
#>   <dbl> <dbl> <chr> <chr>
#> 1     3     0 a     c    
#> 2     2     2 a     d    
#> 3     2     1 a     d    
#> 4     2     1 b     a    
#> 5     1     4 a     c    

# `pick()` provides a way to select a subset of your columns using
# tidyselect. It returns a data frame.
df %>% mutate(cols = pick(x, y))
#> # A tibble: 5 × 5
#>       x     y z1    z2    cols$x    $y
#>   <dbl> <dbl> <chr> <chr>  <dbl> <dbl>
#> 1     3     0 a     c          3     0
#> 2     2     2 a     d          2     2
#> 3     2     1 a     d          2     1
#> 4     2     1 b     a          2     1
#> 5     1     4 a     c          1     4

# This is useful for functions that take data frames as inputs.
# For example, you can compute a joint rank between `x` and `y`.
df %>% mutate(rank = dense_rank(pick(x, y)))
#> # A tibble: 5 × 5
#>       x     y z1    z2     rank
#>   <dbl> <dbl> <chr> <chr> <int>
#> 1     3     0 a     c         4
#> 2     2     2 a     d         3
#> 3     2     1 a     d         2
#> 4     2     1 b     a         2
#> 5     1     4 a     c         1

# `pick()` is also useful as a bridge between data-masking functions (like
# `mutate()` or `group_by()`) and functions with tidy-select behavior (like
# `select()`). For example, you can use `pick()` to create a wrapper around
# `group_by()` that takes a tidy-selection of columns to group on. For more
# bridge patterns, see
# https://rlang.r-lib.org/reference/topic-data-mask-programming.html#bridge-patterns.
my_group_by <- function(data, cols) {
  group_by(data, pick({{ cols }}))
}

df %>% my_group_by(c(x, starts_with("z")))
#> # A tibble: 5 × 4
#> # Groups:   x, z1, z2 [4]
#>       x     y z1    z2   
#>   <dbl> <dbl> <chr> <chr>
#> 1     3     0 a     c    
#> 2     2     2 a     d    
#> 3     2     1 a     d    
#> 4     2     1 b     a    
#> 5     1     4 a     c    

# Or you can use it to dynamically select columns to `count()` by
df %>% count(pick(starts_with("z")))
#> # A tibble: 3 × 3
#>   z1    z2        n
#>   <chr> <chr> <int>
#> 1 a     c         2
#> 2 a     d         2
#> 3 b     a         1

源代码：R/pick.R

相关用法

注：本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Select a subset of columns。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。