R tidyr gather 將列收集到鍵值對中

gather() 的開發已完成，對於新代碼，我們建議切換到 pivot_longer() ，它更易於使用，函數更強大，並且仍在積極開發中。 df %>% gather("key", "value", x, y, z) 相當於df %>% pivot_longer(c(x, y, z), names_to = "key", values_to = "value")

請參閱vignette("pivot") 中的更多詳細信息。

用法

gather(
  data,
  key = "key",
  value = "value",
  ...,
  na.rm = FALSE,
  convert = FALSE,
  factor_key = FALSE
)

參數

data

一個 DataFrame 。

key, value

新鍵和值列的名稱，作為字符串或符號。

該參數通過表達式傳遞並支持quasiquotation(您可以取消引用字符串和符號)。該名稱是從帶有 rlang::ensym() 的表達式中捕獲的(請注意，這種符號不代表實際對象的接口現在在 tidyverse 中不鼓勵使用；我們在這裏支持它是為了向後兼容)。

...

列的選擇。如果為空，則選擇所有變量。您可以提供裸變量名稱，使用 x:z 選擇 x 和 z 之間的所有變量，使用 -y 排除 y。有關更多選項，請參閱dplyr::select() 文檔。另請參閱下麵有關選擇規則的部分。

na.rm

如果 TRUE ，將從輸出中刪除值列為 NA 的行。

convert

如果TRUE會自動在鍵列上運行type.convert()。如果列類型實際上是數字、整數或邏輯，這非常有用。

factor_key

如果是FALSE(默認值)，則鍵值將存儲為字符向量。如果 TRUE ，將被存儲為一個因子，這會保留列的原始順序。

評選規則

選擇列的參數被傳遞到tidyselect::vars_select() 並進行特殊處理。與其他動詞不同，選擇函數嚴格區分數據表達式和上下文表達式。

數據表達式可以是像 x 這樣的裸名稱，也可以是像 x:y 或 c(x, y) 這樣的表達式。在數據表達式中，您隻能引用 DataFrame 中的列。
其他一切都是上下文表達式，您隻能在其中引用使用 <- 定義的對象。

例如，col1:col3 是引用數據列的數據表達式，而 seq(start, end) 是引用上下文中的對象的上下文表達式。

如果需要從數據表達式引用上下文對象，可以使用 all_of() 或 any_of() 。這些函數用於選擇名稱存儲在env-variable中的data-variables。例如， all_of(a) 選擇字符向量 a 中列出的變量。有關更多詳細信息，請參閱tidyselect::select_helpers() 文檔。

例子

# From https://stackoverflow.com/questions/1181060
stocks <- tibble(
  time = as.Date("2009-01-01") + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)

gather(stocks, "stock", "price", -time)
#> # A tibble: 30 × 3
#>    time       stock   price
#>    <date>     <chr>   <dbl>
#>  1 2009-01-01 X      0.0930
#>  2 2009-01-02 X      1.56  
#>  3 2009-01-03 X      0.169 
#>  4 2009-01-04 X     -0.587 
#>  5 2009-01-05 X     -0.356 
#>  6 2009-01-06 X      3.32  
#>  7 2009-01-07 X     -1.10  
#>  8 2009-01-08 X     -0.127 
#>  9 2009-01-09 X     -0.434 
#> 10 2009-01-10 X     -0.0206
#> # … with 20 more rows
stocks %>% gather("stock", "price", -time)
#> # A tibble: 30 × 3
#>    time       stock   price
#>    <date>     <chr>   <dbl>
#>  1 2009-01-01 X      0.0930
#>  2 2009-01-02 X      1.56  
#>  3 2009-01-03 X      0.169 
#>  4 2009-01-04 X     -0.587 
#>  5 2009-01-05 X     -0.356 
#>  6 2009-01-06 X      3.32  
#>  7 2009-01-07 X     -1.10  
#>  8 2009-01-08 X     -0.127 
#>  9 2009-01-09 X     -0.434 
#> 10 2009-01-10 X     -0.0206
#> # … with 20 more rows

# get first observation for each Species in iris data -- base R
mini_iris <- iris[c(1, 51, 101), ]
# gather Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
gather(mini_iris, key = "flower_att", value = "measurement",
       Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
#>       Species   flower_att measurement
#> 1      setosa Sepal.Length         5.1
#> 2  versicolor Sepal.Length         7.0
#> 3   virginica Sepal.Length         6.3
#> 4      setosa  Sepal.Width         3.5
#> 5  versicolor  Sepal.Width         3.2
#> 6   virginica  Sepal.Width         3.3
#> 7      setosa Petal.Length         1.4
#> 8  versicolor Petal.Length         4.7
#> 9   virginica Petal.Length         6.0
#> 10     setosa  Petal.Width         0.2
#> 11 versicolor  Petal.Width         1.4
#> 12  virginica  Petal.Width         2.5
# same result but less verbose
gather(mini_iris, key = "flower_att", value = "measurement", -Species)
#>       Species   flower_att measurement
#> 1      setosa Sepal.Length         5.1
#> 2  versicolor Sepal.Length         7.0
#> 3   virginica Sepal.Length         6.3
#> 4      setosa  Sepal.Width         3.5
#> 5  versicolor  Sepal.Width         3.2
#> 6   virginica  Sepal.Width         3.3
#> 7      setosa Petal.Length         1.4
#> 8  versicolor Petal.Length         4.7
#> 9   virginica Petal.Length         6.0
#> 10     setosa  Petal.Width         0.2
#> 11 versicolor  Petal.Width         1.4
#> 12  virginica  Petal.Width         2.5

源代碼：R/gather.R

相關用法

注：本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Gather columns into key-value pairs。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。