R dtplyr expand.dtplyr_step 扩展 DataFrame 以包含所有可能的值组合。

这是 tidyr expand() 泛型的方法。它被翻译为 data.table::CJ() 。

用法

# S3 method for dtplyr_step
expand(data, ..., .name_repair = "check_unique")

参数

data

一个lazy_dt()。

...

要扩展的列规范。列可以是原子向量或列表。

要查找 x 、 y 和 z 的所有唯一组合(包括数据中不存在的组合)，请将每个变量作为单独的参数提供： expand(df, x, y, z) 。
要仅查找数据中出现的组合，请使用 nesting : expand(df, nesting(x, y, z)) 。
您可以将这两种形式结合起来。例如，expand(df, nesting(school_id, student_id), date) 将为所有可能日期的每个当前 school-student 组合生成一行。

与 data.frame 方法不同，此方法不使用完整的级别集，仅使用数据中出现的级别。

与连续变量一起使用时，您可能需要填充数据中未出现的值：为此，请使用 year = 2010:2020 或 year = full_seq(year,1) 等表达式。

.name_repair

有问题的列名的处理：

"minimal"：没有名称修复或检查，超出基本存在，
"unique" ：确保名称唯一且不为空，
"check_unique" ：(默认值)，没有名称修复，但检查它们是 unique ，
"universal" ：命名为 unique 和语法
函数：应用自定义名称修复(例如，.name_repair = make.names 用于基本 R 样式的名称)。
purrr-style 匿名函数，请参阅rlang::as_function()

此参数作为 repair 传递到 vctrs::vec_as_names() 。有关这些条款以及用于执行这些条款的策略的更多详细信息，请参阅此处。

例子

library(tidyr)

fruits <- lazy_dt(tibble(
  type   = c("apple", "orange", "apple", "orange", "orange", "orange"),
  year   = c(2010, 2010, 2012, 2010, 2010, 2012),
  size  =  factor(
    c("XS", "S",  "M", "S", "S", "M"),
    levels = c("XS", "S", "M", "L")
  ),
  weights = rnorm(6, as.numeric(size) + 2)
))

# All possible combinations ---------------------------------------
# Note that only present levels of the factor variable `size` are retained.
fruits %>% expand(type)
#> Source: local data table [2 x 1]
#> Call:   `_DT9`[, CJ(type = type, unique = TRUE)]
#> 
#>   type  
#>   <chr> 
#> 1 apple 
#> 2 orange
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
fruits %>% expand(type, size)
#> Source: local data table [6 x 2]
#> Call:   `_DT9`[, CJ(type = type, size = size, unique = TRUE)]
#> 
#>   type   size 
#>   <chr>  <fct>
#> 1 apple  XS   
#> 2 apple  S    
#> 3 apple  M    
#> 4 orange XS   
#> 5 orange S    
#> 6 orange M    
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

# This is different from the data frame behaviour:
fruits %>% dplyr::collect() %>% expand(type, size)
#> # A tibble: 8 × 2
#>   type   size 
#>   <chr>  <fct>
#> 1 apple  XS   
#> 2 apple  S    
#> 3 apple  M    
#> 4 apple  L    
#> 5 orange XS   
#> 6 orange S    
#> 7 orange M    
#> 8 orange L    

# Other uses -------------------------------------------------------
fruits %>% expand(type, size, 2010:2012)
#> Source: local data table [18 x 3]
#> Call:   `_DT9`[, CJ(type = type, size = size, V3 = 2010:2012, unique = TRUE)]
#> 
#>   type  size     V3
#>   <chr> <fct> <int>
#> 1 apple XS     2010
#> 2 apple XS     2011
#> 3 apple XS     2012
#> 4 apple S      2010
#> 5 apple S      2011
#> 6 apple S      2012
#> # … with 12 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

# Use `anti_join()` to determine which observations are missing
all <- fruits %>% expand(type, size, year)
all
#> Source: local data table [12 x 3]
#> Call:   `_DT9`[, CJ(type = type, size = size, year = year, unique = TRUE)]
#> 
#>   type  size   year
#>   <chr> <fct> <dbl>
#> 1 apple XS     2010
#> 2 apple XS     2012
#> 3 apple S      2010
#> 4 apple S      2012
#> 5 apple M      2010
#> 6 apple M      2012
#> # … with 6 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
all %>% dplyr::anti_join(fruits)
#> Joining, by = c("type", "size", "year")
#> Source: local data table [8 x 3]
#> Call:   `_DT9`[, CJ(type = type, size = size, year = year, unique = TRUE)][!`_DT9`, 
#>     on = .(type, size, year)]
#> 
#>   type   size   year
#>   <chr>  <fct> <dbl>
#> 1 apple  XS     2012
#> 2 apple  S      2010
#> 3 apple  S      2012
#> 4 apple  M      2010
#> 5 orange XS     2010
#> 6 orange XS     2012
#> # … with 2 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

# Use with `right_join()` to fill in missing rows
fruits %>% dplyr::right_join(all)
#> Joining, by = c("type", "year", "size")
#> Source: local data table [14 x 4]
#> Call:   `_DT9`[`_DT9`[, CJ(type = type, size = size, year = year, unique = TRUE)], 
#>     on = .(type, year, size), allow.cartesian = TRUE]
#> 
#>   type   year size  weights
#>   <chr> <dbl> <fct>   <dbl>
#> 1 apple  2010 XS       1.78
#> 2 apple  2012 XS      NA   
#> 3 apple  2010 S       NA   
#> 4 apple  2012 S       NA   
#> 5 apple  2010 M       NA   
#> 6 apple  2012 M        4.81
#> # … with 8 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

源代码：R/step-subset-expand.R

相关用法

注：本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Expand data frame to include all possible combinations of values.。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。