R dplyr summarise 將每組匯總為一行

summarise() 創建一個新的 DataFrame 。它為分組變量的每個組合返回一行；如果沒有分組變量，輸出將有一行總結輸入中的所有觀察結果。它將包含每個分組變量的一列和您指定的每個匯總統計數據的一列。

summarise() 和 summarize() 是同義詞。

用法

summarise(.data, ..., .by = NULL, .groups = NULL)

summarize(.data, ..., .by = NULL, .groups = NULL)

參數

.data

數據幀、數據幀擴展(例如 tibble)或惰性數據幀(例如來自 dbplyr 或 dtplyr)。有關更多詳細信息，請參閱下麵的方法。

...

< data-masking > Name-value 匯總函數對。該名稱將是結果中變量的名稱。

該值可以是：

長度為 1 的向量，例如min(x)、n() 或 sum(is.na(y))。
DataFrame ，用於從單個表達式添加多個列。

從 1.1.0 開始，返回大小為 0 或 >1 的值已被棄用。請用reframe()為此。

.by

< tidy-select > (可選)僅針對此操作選擇要分組的列，作為 group_by() 的替代方案。有關詳細信息和示例，請參閱?dplyr_by。

.groups

結果的分組結構。

"drop_last"：刪除最後一級分組。這是 1.0.0 版本之前唯一受支持的選項。
"drop"：所有級別的分組均被刪除。
"keep"：與.data相同的分組結構。
"rowwise"：每一行都是它自己的組。

當未指定.groups時，根據結果的行數選擇：

如果所有結果都有 1 行，您將得到"drop_last"。
如果行數變化，您將得到 "keep" (請注意，不推薦返回可變行數，而改為 reframe() ，這也會無條件地刪除所有級別的分組)。

此外，一條消息會通知您該選擇，除非結果未分組，否則選項 "dplyr.summarise.inform" 設置為 FALSE ，或者當從包中的函數調用 summarise() 時。

值

通常與 .data 具有相同類型的對象。

這些行來自底層 group_keys() 。
這些列是分組鍵和您提供的摘要表達式的組合。
分組結構由.groups=參數控製，輸出可能是另一個grouped_df、tibble或rowwise數據幀。
DataFrame 屬性不會保留，因為 summarise() 從根本上創建了一個新的 DataFrame 。

有用的函數

中心：mean()、median()
傳播：sd()、IQR()、mad()
範圍：min()、max()、
位置：first()、last()、nth()、
計數：n()、n_distinct()
邏輯：any()、all()

後端變化

DataFrame 後端支持創建變量並在同一摘要中使用它。這意味著之前創建的摘要變量可以在摘要內進一步轉換或組合，如mutate() 中所示。但是，這也意味著與先前變量同名的匯總變量會覆蓋它們，使這些變量不可用於以後的匯總變量。

其他後端可能不支持此行為。為了避免出現意外結果，請考慮為匯總變量使用新名稱，尤其是在創建多個匯總時。

方法

該函數是泛型函數，這意味著包可以為其他類提供實現(方法)。有關額外參數和行為差異，請參閱各個方法的文檔。

加載的包中當前提供以下方法： dbplyr ( tbl_lazy )、dplyr ( data.frame 、 grouped_df 、 rowwise_df ) 。

也可以看看

其他單表動詞： arrange() 、 filter() 、 mutate() 、 reframe() 、 rename() 、 select() 、 slice()

例子

# A summary applied to ungrouped tbl returns a single row
mtcars %>%
  summarise(mean = mean(disp), n = n())
#>       mean  n
#> 1 230.7219 32

# Usually, you'll want to group first
mtcars %>%
  group_by(cyl) %>%
  summarise(mean = mean(disp), n = n())
#> # A tibble: 3 × 3
#>     cyl  mean     n
#>   <dbl> <dbl> <int>
#> 1     4  105.    11
#> 2     6  183.     7
#> 3     8  353.    14

# Each summary call removes one grouping level (since that group
# is now just a single row)
mtcars %>%
  group_by(cyl, vs) %>%
  summarise(cyl_n = n()) %>%
  group_vars()
#> `summarise()` has grouped output by 'cyl'. You can override using the
#> `.groups` argument.
#> [1] "cyl"

# BEWARE: reusing variables may lead to unexpected results
mtcars %>%
  group_by(cyl) %>%
  summarise(disp = mean(disp), sd = sd(disp))
#> # A tibble: 3 × 3
#>     cyl  disp    sd
#>   <dbl> <dbl> <dbl>
#> 1     4  105.    NA
#> 2     6  183.    NA
#> 3     8  353.    NA

# Refer to column names stored as strings with the `.data` pronoun:
var <- "mass"
summarise(starwars, avg = mean(.data[[var]], na.rm = TRUE))
#> # A tibble: 1 × 1
#>     avg
#>   <dbl>
#> 1  97.3
# Learn more in ?rlang::args_data_masking

# In dplyr 1.1.0, returning multiple rows per group was deprecated in favor
# of `reframe()`, which never messages and always returns an ungrouped
# result:
mtcars %>%
   group_by(cyl) %>%
   summarise(qs = quantile(disp, c(0.25, 0.75)), prob = c(0.25, 0.75))
#> Warning: Returning more (or less) than 1 row per `summarise()` group was
#> deprecated in dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that
#>   `reframe()` always returns an ungrouped data frame and adjust
#>   accordingly.
#> `summarise()` has grouped output by 'cyl'. You can override using the
#> `.groups` argument.
#> # A tibble: 6 × 3
#> # Groups:   cyl [3]
#>     cyl    qs  prob
#>   <dbl> <dbl> <dbl>
#> 1     4  78.8  0.25
#> 2     4 121.   0.75
#> 3     6 160    0.25
#> 4     6 196.   0.75
#> 5     8 302.   0.25
#> 6     8 390    0.75
# ->
mtcars %>%
   group_by(cyl) %>%
   reframe(qs = quantile(disp, c(0.25, 0.75)), prob = c(0.25, 0.75))
#> # A tibble: 6 × 3
#>     cyl    qs  prob
#>   <dbl> <dbl> <dbl>
#> 1     4  78.8  0.25
#> 2     4 121.   0.75
#> 3     6 160    0.25
#> 4     6 196.   0.75
#> 5     8 302.   0.25
#> 6     8 390    0.75

源代碼：R/summarise.R

相關用法

注：本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Summarise each group down to one row。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。