R tidyr unnest 將 DataFrame 的列表列解除嵌套為行和列

Unnest 將包含 DataFrame 的列表列擴展為行和列。

用法

unnest(
  data,
  cols,
  ...,
  keep_empty = FALSE,
  ptype = NULL,
  names_sep = NULL,
  names_repair = "check_unique",
  .drop = deprecated(),
  .id = deprecated(),
  .sep = deprecated(),
  .preserve = deprecated()
)

參數

data

一個 DataFrame 。

cols

< tidy-select > 要取消嵌套的列表列。

當選擇多列時，同一行的值將被回收到它們的共同大小。

...

: 以前你可以寫df %>% unnest(x, y, z)。轉換成df %>% unnest(c(x, y, z))。如果您之前創建了一個新變量unnest()你現在需要明確地這樣做mutate()。轉變df %>% unnest(y = fun(x, y, z))到df %>% mutate(y = fun(x, y, z)) %>% unnest(y).

keep_empty

默認情況下，對於要取消切割/取消嵌套的列表中的每個元素，您都會獲得一行輸出。這意味著如果存在大小為 0 的元素(例如 NULL 或空 DataFrame 或向量)，則整行將從輸出中刪除。如果要保留所有行，請使用 keep_empty = TRUE 將 size-0 元素替換為單行缺失值。

ptype

(可選)列 name-prototype 的命名列表對強製 cols，覆蓋通過組合各個值猜測的默認值。或者，可以提供單個空 ptype，它將應用於所有 cols 。

names_sep

如果默認為 NULL ，則外部名稱將來自內部名稱。如果是字符串，則外部名稱將通過將外部列名稱和內部列名稱粘貼在一起形成，並用 names_sep 分隔。

names_repair

用於檢查輸出數據幀是否具有有效名稱。必須是以下選項之一：

"minimal“：沒有名稱修複或檢查，超出基本存在，
"unique“：確保名稱唯一且不為空，
"check_unique"：(默認)，不進行名稱修複，但檢查它們是否唯一，
"universal“：使名稱具有唯一性和語法性
函數：應用自定義名稱修複。
tidyr_legacy ：使用 tidyr 0.8 中的名稱 Repair。
公式：purrr-style 匿名函數(參見rlang::as_function())

有關這些術語以及用於執行它們的策略的更多詳細信息，請參閱vctrs::vec_as_names()。

.drop, .preserve

：現在保留所有列表列；如果輸出中有任何您不想要的內容，請使用select()在取消嵌套之前將其刪除。

.id

：轉變df %>% unnest(x, .id = "id")到df %>% mutate(id = names(x)) %>% unnest(x)).

.sep

：使用names_sep反而。

新語法

tidyr 1.0.0 為 nest() 和 unnest() 引入了新語法，其設計與其他函數更加相似。轉換為新語法應該很簡單(由您將收到的消息引導)，但如果您隻需要運行舊分析，則可以使用 nest_legacy() 和 unnest_legacy() 輕鬆恢複到以前的行為，如下所示：

library(tidyr)
nest <- nest_legacy
unnest <- unnest_legacy

也可以看看

其他矩形：hoist()、unnest_longer()、unnest_wider()

例子

# unnest() is designed to work with lists of data frames
df <- tibble(
  x = 1:3,
  y = list(
    NULL,
    tibble(a = 1, b = 2),
    tibble(a = 1:3, b = 3:1, c = 4)
  )
)
# unnest() recycles input rows for each row of the list-column
# and adds a column for each column
df %>% unnest(y)
#> # A tibble: 4 × 4
#>       x     a     b     c
#>   <int> <dbl> <dbl> <dbl>
#> 1     2     1     2    NA
#> 2     3     1     3     4
#> 3     3     2     2     4
#> 4     3     3     1     4

# input rows with 0 rows in the list-column will usually disappear,
# but you can keep them (generating NAs) with keep_empty = TRUE:
df %>% unnest(y, keep_empty = TRUE)
#> # A tibble: 5 × 4
#>       x     a     b     c
#>   <int> <dbl> <dbl> <dbl>
#> 1     1    NA    NA    NA
#> 2     2     1     2    NA
#> 3     3     1     3     4
#> 4     3     2     2     4
#> 5     3     3     1     4

# Multiple columns ----------------------------------------------------------
# You can unnest multiple columns simultaneously
df <- tibble(
  x = 1:2,
  y = list(
    tibble(a = 1, b = 2),
    tibble(a = 3:4, b = 5:6)
  ),
  z = list(
    tibble(c = 1, d = 2),
    tibble(c = 3:4, d = 5:6)
  )
)
df %>% unnest(c(y, z))
#> # A tibble: 3 × 5
#>       x     a     b     c     d
#>   <int> <dbl> <dbl> <dbl> <dbl>
#> 1     1     1     2     1     2
#> 2     2     3     5     3     5
#> 3     2     4     6     4     6

# Compare with unnesting one column at a time, which generates
# the Cartesian product
df %>%
  unnest(y) %>%
  unnest(z)
#> # A tibble: 5 × 5
#>       x     a     b     c     d
#>   <int> <dbl> <dbl> <dbl> <dbl>
#> 1     1     1     2     1     2
#> 2     2     3     5     3     5
#> 3     2     3     5     4     6
#> 4     2     4     6     3     5
#> 5     2     4     6     4     6

源代碼：R/unnest.R

相關用法

注：本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Unnest a list-column of data frames into rows and columns。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。