R dplyr group_map 对每个组应用一个函数

group_map() 、 group_modify() 和 group_walk() 是 purrr-style 函数，可用于迭代分组的 tibbles。

用法

group_map(.data, .f, ..., .keep = FALSE)

group_modify(.data, .f, ..., .keep = FALSE)

group_walk(.data, .f, ..., .keep = FALSE)

参数

.data

分组的小标题

.f

应用于每个组的函数或公式。

如果是函数，则按原样使用。它应该至少有 2 个正式参数。

如果有一个公式，例如~ head(.x) ，转换为函数。

在公式中，您可以使用

. 或 .x 引用给定组的 .tbl 的行子集
.y 引用键，单行标题，每个分组变量一列，用于标识组

...

传递给.f的附加参数

.keep

分组变量保存在.x中

值

group_modify() 返回分组的小标题。在这种情况下，.f 必须返回一个数据帧。
group_map() 返回对每个组调用 .f 的结果列表。
group_walk() 调用 .f 产生副作用，并以不可见的方式返回输入 .tbl 。

细节

当 summarize() 就您需要为每个组执行的操作和返回的内容而言过于有限时，请使用 group_modify()。 group_modify() 适用于“数据帧输入，数据帧输出”。如果这太有限，您需要使用 nested 或 split 工作流程。 group_modify() 是 do() 的演变(如果您以前使用过它)。

数据帧的每个概念组都暴露给函数.f，其中包含两条信息：

该组的数据子集，公开为 .x 。
键是一个小标题，每个分组变量只有一行和一列，公开为 .y 。

为了完整起见， group_modify() 、 group_map 和 group_walk() 也适用于未分组的数据帧，在这种情况下，该函数将应用于整个数据帧(公开为 .x )，并且 .y 是单行小标题没有列，与 group_keys() 一致。

也可以看看

其他分组函数：group_by()、group_nest()、group_split()、group_trim()

例子


# return a list
mtcars %>%
  group_by(cyl) %>%
  group_map(~ head(.x, 2L))
#> [[1]]
#> # A tibble: 2 × 10
#>     mpg  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  22.8  108     93  3.85  2.32  18.6     1     1     4     1
#> 2  24.4  147.    62  3.69  3.19  20       1     0     4     2
#> 
#> [[2]]
#> # A tibble: 2 × 10
#>     mpg  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    21   160   110   3.9  2.62  16.5     0     1     4     4
#> 2    21   160   110   3.9  2.88  17.0     0     1     4     4
#> 
#> [[3]]
#> # A tibble: 2 × 10
#>     mpg  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  18.7   360   175  3.15  3.44  17.0     0     0     3     2
#> 2  14.3   360   245  3.21  3.57  15.8     0     0     3     4
#> 

# return a tibble grouped by `cyl` with 2 rows per group
# the grouping data is recalculated
mtcars %>%
  group_by(cyl) %>%
  group_modify(~ head(.x, 2L))
#> # A tibble: 6 × 11
#> # Groups:   cyl [3]
#>     cyl   mpg  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     4  22.8  108     93  3.85  2.32  18.6     1     1     4     1
#> 2     4  24.4  147.    62  3.69  3.19  20       1     0     4     2
#> 3     6  21    160    110  3.9   2.62  16.5     0     1     4     4
#> 4     6  21    160    110  3.9   2.88  17.0     0     1     4     4
#> 5     8  18.7  360    175  3.15  3.44  17.0     0     0     3     2
#> 6     8  14.3  360    245  3.21  3.57  15.8     0     0     3     4

# a list of tibbles
iris %>%
  group_by(Species) %>%
  group_map(~ broom::tidy(lm(Petal.Length ~ Sepal.Length, data = .x)))
#> [[1]]
#> # A tibble: 2 × 5
#>   term         estimate std.error statistic p.value
#>   <chr>           <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)     0.803    0.344       2.34  0.0238
#> 2 Sepal.Length    0.132    0.0685      1.92  0.0607
#> 
#> [[2]]
#> # A tibble: 2 × 5
#>   term         estimate std.error statistic  p.value
#>   <chr>           <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)     0.185    0.514      0.360 7.20e- 1
#> 2 Sepal.Length    0.686    0.0863     7.95  2.59e-10
#> 
#> [[3]]
#> # A tibble: 2 × 5
#>   term         estimate std.error statistic  p.value
#>   <chr>           <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)     0.610    0.417       1.46 1.50e- 1
#> 2 Sepal.Length    0.750    0.0630     11.9  6.30e-16
#> 

# a restructured grouped tibble
iris %>%
  group_by(Species) %>%
  group_modify(~ broom::tidy(lm(Petal.Length ~ Sepal.Length, data = .x)))
#> # A tibble: 6 × 6
#> # Groups:   Species [3]
#>   Species    term         estimate std.error statistic  p.value
#>   <fct>      <chr>           <dbl>     <dbl>     <dbl>    <dbl>
#> 1 setosa     (Intercept)     0.803    0.344      2.34  2.38e- 2
#> 2 setosa     Sepal.Length    0.132    0.0685     1.92  6.07e- 2
#> 3 versicolor (Intercept)     0.185    0.514      0.360 7.20e- 1
#> 4 versicolor Sepal.Length    0.686    0.0863     7.95  2.59e-10
#> 5 virginica  (Intercept)     0.610    0.417      1.46  1.50e- 1
#> 6 virginica  Sepal.Length    0.750    0.0630    11.9   6.30e-16

# a list of vectors
iris %>%
  group_by(Species) %>%
  group_map(~ quantile(.x$Petal.Length, probs = c(0.25, 0.5, 0.75)))
#> [[1]]
#>   25%   50%   75% 
#> 1.400 1.500 1.575 
#> 
#> [[2]]
#>  25%  50%  75% 
#> 4.00 4.35 4.60 
#> 
#> [[3]]
#>   25%   50%   75% 
#> 5.100 5.550 5.875 
#> 

# to use group_modify() the lambda must return a data frame
iris %>%
  group_by(Species) %>%
  group_modify(~ {
     quantile(.x$Petal.Length, probs = c(0.25, 0.5, 0.75)) %>%
     tibble::enframe(name = "prob", value = "quantile")
  })
#> # A tibble: 9 × 3
#> # Groups:   Species [3]
#>   Species    prob  quantile
#>   <fct>      <chr>    <dbl>
#> 1 setosa     25%       1.4 
#> 2 setosa     50%       1.5 
#> 3 setosa     75%       1.58
#> 4 versicolor 25%       4   
#> 5 versicolor 50%       4.35
#> 6 versicolor 75%       4.6 
#> 7 virginica  25%       5.1 
#> 8 virginica  50%       5.55
#> 9 virginica  75%       5.88

iris %>%
  group_by(Species) %>%
  group_modify(~ {
    .x %>%
      purrr::map_dfc(fivenum) %>%
      mutate(nms = c("min", "Q1", "median", "Q3", "max"))
  })
#> # A tibble: 15 × 6
#> # Groups:   Species [3]
#>    Species    Sepal.Length Sepal.Width Petal.Length Petal.Width nms   
#>    <fct>             <dbl>       <dbl>        <dbl>       <dbl> <chr> 
#>  1 setosa              4.3         2.3         1            0.1 min   
#>  2 setosa              4.8         3.2         1.4          0.2 Q1    
#>  3 setosa              5           3.4         1.5          0.2 median
#>  4 setosa              5.2         3.7         1.6          0.3 Q3    
#>  5 setosa              5.8         4.4         1.9          0.6 max   
#>  6 versicolor          4.9         2           3            1   min   
#>  7 versicolor          5.6         2.5         4            1.2 Q1    
#>  8 versicolor          5.9         2.8         4.35         1.3 median
#>  9 versicolor          6.3         3           4.6          1.5 Q3    
#> 10 versicolor          7           3.4         5.1          1.8 max   
#> 11 virginica           4.9         2.2         4.5          1.4 min   
#> 12 virginica           6.2         2.8         5.1          1.8 Q1    
#> 13 virginica           6.5         3           5.55         2   median
#> 14 virginica           6.9         3.2         5.9          2.3 Q3    
#> 15 virginica           7.9         3.8         6.9          2.5 max   

# group_walk() is for side effects
dir.create(temp <- tempfile())
iris %>%
  group_by(Species) %>%
  group_walk(~ write.csv(.x, file = file.path(temp, paste0(.y$Species, ".csv"))))
list.files(temp, pattern = "csv$")
#> [1] "setosa.csv"     "versicolor.csv" "virginica.csv" 
unlink(temp, recursive = TRUE)

# group_modify() and ungrouped data frames
mtcars %>%
  group_modify(~ head(.x, 2L))
#>               mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4

源代码：R/group-map.R

相关用法

注：本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Apply a function to each group。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。