R dtplyr pivot_wider.dtplyr_step 將數據從長軸轉向寬軸

這是 tidyr pivot_wider() 泛型的方法。它被翻譯為data.table::dcast()

用法

# S3 method for dtplyr_step
pivot_wider(
  data,
  id_cols = NULL,
  names_from = name,
  names_prefix = "",
  names_sep = "_",
  names_glue = NULL,
  names_sort = FALSE,
  names_repair = "check_unique",
  values_from = value,
  values_fill = NULL,
  values_fn = NULL,
  ...
)

參數

data

一個lazy_dt()。

id_cols

< tidy-select > 唯一標識每個觀察值的一組列。通常在有冗餘變量(即其值與現有變量完全相關的變量)時使用。

默認為 data 中的所有列，但通過 names_from 和 values_from 指定的列除外。如果提供 tidyselect 表達式，則在刪除通過 names_from 和 values_from 指定的列後，將在 data 上對其求值。

names_from, values_from

< tidy-select > 一對參數，說明從哪一列(或多列)獲取輸出列的名稱 ( names_from )，以及從哪一列(或多列)獲取單元格值 ( values_from )。

如果values_from包含多個值，該值將被添加到輸出列的前麵。

names_prefix

添加到每個變量名稱開頭的字符串。如果 names_from 是數值向量並且您想要創建語法變量名稱，這尤其有用。

names_sep

如果 names_from 或 values_from 包含多個變量，這將用於將它們的值連接到單個字符串中以用作列名稱。

names_glue

您可以提供使用 names_from 列(和特殊的 .value )來創建自定義列名稱的粘合規範，而不是 names_sep 和 names_prefix 。

names_sort

列名應該排序嗎？如果是FALSE(默認值)，則列名稱按首次出現排序。

names_repair

如果輸出具有無效的列名稱，會發生什麽情況？默認情況下，如果列重複，"check_unique" 將出錯。使用 "minimal" 允許輸出中存在重複項，或使用 "unique" 通過添加數字後綴來消除重複項。有關更多選項，請參閱vctrs::vec_as_names()。

values_fill

(可選)一個(標量)值，指定每個 value 在缺失時應填充的內容。

如果您想將不同的填充值應用於不同的值列，這可以是命名列表。

values_fn

一個函數，默認為 length() 。請注意，這與 tidyr::pivot_wider() 的行為不同，tidyr::pivot_wider() 默認返回列表列。

...

傳遞給方法的附加參數。

例子

library(tidyr)

fish_encounters_dt <- lazy_dt(fish_encounters)
fish_encounters_dt
#> Source: local data table [114 x 3]
#> Call:   `_DT30`
#> 
#>   fish  station  seen
#>   <fct> <fct>   <int>
#> 1 4842  Release     1
#> 2 4842  I80_1       1
#> 3 4842  Lisbon      1
#> 4 4842  Rstr        1
#> 5 4842  Base_TD     1
#> 6 4842  BCE         1
#> # … with 108 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
fish_encounters_dt %>%
  pivot_wider(names_from = station, values_from = seen)
#> Source: local data table [19 x 12]
#> Call:   dcast(`_DT30`, formula = fish ~ station, value.var = "seen")
#> 
#>   fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE
#>   <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int>
#> 1 4842        1     1      1     1       1     1     1     1     1     1
#> 2 4843        1     1      1     1       1     1     1     1     1     1
#> 3 4844        1     1      1     1       1     1     1     1     1     1
#> 4 4845        1     1      1     1       1    NA    NA    NA    NA    NA
#> 5 4847        1     1      1    NA      NA    NA    NA    NA    NA    NA
#> 6 4848        1     1      1     1      NA    NA    NA    NA    NA    NA
#> # … with 13 more rows, and 1 more variable: MAW <int>
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
# Fill in missing values
fish_encounters_dt %>%
  pivot_wider(names_from = station, values_from = seen, values_fill = 0)
#> Source: local data table [19 x 12]
#> Call:   dcast(`_DT30`, formula = fish ~ station, value.var = "seen", 
#>     fill = 0)
#> 
#>   fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE
#>   <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int>
#> 1 4842        1     1      1     1       1     1     1     1     1     1
#> 2 4843        1     1      1     1       1     1     1     1     1     1
#> 3 4844        1     1      1     1       1     1     1     1     1     1
#> 4 4845        1     1      1     1       1     0     0     0     0     0
#> 5 4847        1     1      1     0       0     0     0     0     0     0
#> 6 4848        1     1      1     1       0     0     0     0     0     0
#> # … with 13 more rows, and 1 more variable: MAW <int>
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

# Generate column names from multiple variables
us_rent_income_dt <- lazy_dt(us_rent_income)
us_rent_income_dt
#> Source: local data table [104 x 5]
#> Call:   `_DT31`
#> 
#>   GEOID NAME    variable estimate   moe
#>   <chr> <chr>   <chr>       <dbl> <dbl>
#> 1 01    Alabama income      24476   136
#> 2 01    Alabama rent          747     3
#> 3 02    Alaska  income      32940   508
#> 4 02    Alaska  rent         1200    13
#> 5 04    Arizona income      27517   148
#> 6 04    Arizona rent          972     4
#> # … with 98 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
us_rent_income_dt %>%
  pivot_wider(names_from = variable, values_from = c(estimate, moe))
#> Source: local data table [52 x 6]
#> Call:   dcast(`_DT31`, formula = GEOID + NAME ~ variable, value.var = c("estimate", 
#> "moe"))
#> 
#>   GEOID NAME       estimate_income estimate_rent moe_income moe_rent
#>   <chr> <chr>                <dbl>         <dbl>      <dbl>    <dbl>
#> 1 01    Alabama              24476           747        136        3
#> 2 02    Alaska               32940          1200        508       13
#> 3 04    Arizona              27517           972        148        4
#> 4 05    Arkansas             23789           709        165        5
#> 5 06    California           29454          1358        109        3
#> 6 08    Colorado             32401          1125        109        5
#> # … with 46 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

# When there are multiple `names_from` or `values_from`, you can use
# use `names_sep` or `names_glue` to control the output variable names
us_rent_income_dt %>%
  pivot_wider(
    names_from = variable,
    names_sep = ".",
    values_from = c(estimate, moe)
  )
#> Source: local data table [52 x 6]
#> Call:   dcast(`_DT31`, formula = GEOID + NAME ~ variable, value.var = c("estimate", 
#> "moe"), sep = ".")
#> 
#>   GEOID NAME       estimate.income estimate.rent moe.income moe.rent
#>   <chr> <chr>                <dbl>         <dbl>      <dbl>    <dbl>
#> 1 01    Alabama              24476           747        136        3
#> 2 02    Alaska               32940          1200        508       13
#> 3 04    Arizona              27517           972        148        4
#> 4 05    Arkansas             23789           709        165        5
#> 5 06    California           29454          1358        109        3
#> 6 08    Colorado             32401          1125        109        5
#> # … with 46 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

# Can perform aggregation with values_fn
warpbreaks_dt <- lazy_dt(as_tibble(warpbreaks[c("wool", "tension", "breaks")]))
warpbreaks_dt
#> Source: local data table [54 x 3]
#> Call:   `_DT32`
#> 
#>   wool  tension breaks
#>   <fct> <fct>    <dbl>
#> 1 A     L           26
#> 2 A     L           30
#> 3 A     L           54
#> 4 A     L           25
#> 5 A     L           70
#> 6 A     L           52
#> # … with 48 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
warpbreaks_dt %>%
  pivot_wider(
    names_from = wool,
    values_from = breaks,
    values_fn = mean
  )
#> Source: local data table [3 x 3]
#> Call:   dcast(`_DT32`, formula = tension ~ wool, value.var = "breaks", 
#>     fun.aggregate = function (x, ...) 
#>     UseMethod("mean"))
#> 
#>   tension     A     B
#>   <fct>   <dbl> <dbl>
#> 1 L        44.6  28.2
#> 2 M        24    28.8
#> 3 H        24.6  18.8
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

源代碼：R/step-call-pivot_wider.R

相關用法

注：本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Pivot data from long to wide。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。