R dplyr rowwise 按行对输入进行分组

rowwise() 允许您在数据帧上计算row-at-a-time。当向量化函数不存在时，这是最有用的。

大多数 dplyr 动词保留按行分组。例外是 summarise() ，它返回 grouped_df 。您可以使用 ungroup() 或 as_tibble() 显式取消分组，或使用 group_by() 转换为 grouped_df 。

用法

rowwise(data, ...)

参数

data

输入 DataFrame 。

...

< tidy-select > 调用 summarise() 时要保留的变量。这通常是一组变量，其组合唯一地标识每一行。

注意：与 group_by() 不同，您不能在此处创建新变量，但您可以使用(例如)everything() 选择多个变量。

值

具有类 rowwise_df 的按行 DataFrame 。请注意， rowwise_df 隐式按行分组，但不是 grouped_df 。

列表列

因为 rowwise 每组只有一行，所以它为处理列表列提供了一点便利。通常， summarise() 和 mutate() 会使用 [ 提取一组数据。但是当你以这种方式索引一个列表时，你会得到另一个列表。当您使用 rowwise tibble 时，dplyr 将使用 [[ 而不是 [ 让您的生活更轻松。

也可以看看

nest_by() 用于使用嵌套数据创建行式 DataFrame 的便捷方法。

例子

df <- tibble(x = runif(6), y = runif(6), z = runif(6))
# Compute the mean of x, y, z in each row
df %>% rowwise() %>% mutate(m = mean(c(x, y, z)))
#> # A tibble: 6 × 4
#> # Rowwise: 
#>       x      y      z     m
#>   <dbl>  <dbl>  <dbl> <dbl>
#> 1 0.922 0.476  0.211  0.536
#> 2 0.139 0.552  0.0723 0.254
#> 3 0.197 0.879  0.611  0.563
#> 4 0.228 0.778  0.251  0.419
#> 5 0.960 0.0823 0.401  0.481
#> 6 0.283 0.968  0.551  0.601
# use c_across() to more easily select many variables
df %>% rowwise() %>% mutate(m = mean(c_across(x:z)))
#> # A tibble: 6 × 4
#> # Rowwise: 
#>       x      y      z     m
#>   <dbl>  <dbl>  <dbl> <dbl>
#> 1 0.922 0.476  0.211  0.536
#> 2 0.139 0.552  0.0723 0.254
#> 3 0.197 0.879  0.611  0.563
#> 4 0.228 0.778  0.251  0.419
#> 5 0.960 0.0823 0.401  0.481
#> 6 0.283 0.968  0.551  0.601

# Compute the minimum of x and y in each row
df %>% rowwise() %>% mutate(m = min(c(x, y, z)))
#> # A tibble: 6 × 4
#> # Rowwise: 
#>       x      y      z      m
#>   <dbl>  <dbl>  <dbl>  <dbl>
#> 1 0.922 0.476  0.211  0.211 
#> 2 0.139 0.552  0.0723 0.0723
#> 3 0.197 0.879  0.611  0.197 
#> 4 0.228 0.778  0.251  0.228 
#> 5 0.960 0.0823 0.401  0.0823
#> 6 0.283 0.968  0.551  0.283 
# In this case you can use an existing vectorised function:
df %>% mutate(m = pmin(x, y, z))
#> # A tibble: 6 × 4
#>       x      y      z      m
#>   <dbl>  <dbl>  <dbl>  <dbl>
#> 1 0.922 0.476  0.211  0.211 
#> 2 0.139 0.552  0.0723 0.0723
#> 3 0.197 0.879  0.611  0.197 
#> 4 0.228 0.778  0.251  0.228 
#> 5 0.960 0.0823 0.401  0.0823
#> 6 0.283 0.968  0.551  0.283 
# Where these functions exist they'll be much faster than rowwise
# so be on the lookout for them.

# rowwise() is also useful when doing simulations
params <- tribble(
 ~sim, ~n, ~mean, ~sd,
    1,  1,     1,   1,
    2,  2,     2,   4,
    3,  3,    -1,   2
)
# Here I supply variables to preserve after the computation
params %>%
  rowwise(sim) %>%
  reframe(z = rnorm(n, mean, sd))
#> # A tibble: 6 × 2
#>     sim      z
#>   <dbl>  <dbl>
#> 1     1  2.34 
#> 2     2 -1.41 
#> 3     2 -2.60 
#> 4     3  0.983
#> 5     3  2.00 
#> 6     3  0.394

# If you want one row per simulation, put the results in a list()
params %>%
  rowwise(sim) %>%
  summarise(z = list(rnorm(n, mean, sd)), .groups = "keep")
#> # A tibble: 3 × 2
#> # Groups:   sim [3]
#>     sim z        
#>   <dbl> <list>   
#> 1     1 <dbl [1]>
#> 2     2 <dbl [2]>
#> 3     3 <dbl [3]>

源代码：R/rowwise.R

相关用法

注：本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Group input by rows。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。