當前位置: 首頁>>編程示例 >>用法及示例精選 >>正文


R dtplyr expand.dtplyr_step 擴展 DataFrame 以包含所有可能的值組合。

這是 tidyr expand() 泛型的方法。它被翻譯為 data.table::CJ()

用法

# S3 method for dtplyr_step
expand(data, ..., .name_repair = "check_unique")

參數

data

一個lazy_dt()

...

要擴展的列規範。列可以是原子向量或列表。

  • 要查找 xyz 的所有唯一組合(包括數據中不存在的組合),請將每個變量作為單獨的參數提供: expand(df, x, y, z)

  • 要僅查找數據中出現的組合,請使用 nesting : expand(df, nesting(x, y, z))

  • 您可以將這兩種形式結合起來。例如,expand(df, nesting(school_id, student_id), date) 將為所有可能日期的每個當前 school-student 組合生成一行。

與 data.frame 方法不同,此方法不使用完整的級別集,僅使用數據中出現的級別。

與連續變量一起使用時,您可能需要填充數據中未出現的值:為此,請使用 year = 2010:2020year = full_seq(year,1) 等表達式。

.name_repair

有問題的列名的處理:

  • "minimal":沒有名稱修複或檢查,超出基本存在,

  • "unique" :確保名稱唯一且不為空,

  • "check_unique" :(默認值),沒有名稱修複,但檢查它們是 unique

  • "universal" :命名為 unique 和語法

  • 函數:應用自定義名稱修複(例如,.name_repair = make.names 用於基本 R 樣式的名稱)。

  • purrr-style 匿名函數,請參閱rlang::as_function()

此參數作為 repair 傳遞到 vctrs::vec_as_names() 。有關這些條款以及用於執行這些條款的策略的更多詳細信息,請參閱此處。

例子

library(tidyr)

fruits <- lazy_dt(tibble(
  type   = c("apple", "orange", "apple", "orange", "orange", "orange"),
  year   = c(2010, 2010, 2012, 2010, 2010, 2012),
  size  =  factor(
    c("XS", "S",  "M", "S", "S", "M"),
    levels = c("XS", "S", "M", "L")
  ),
  weights = rnorm(6, as.numeric(size) + 2)
))

# All possible combinations ---------------------------------------
# Note that only present levels of the factor variable `size` are retained.
fruits %>% expand(type)
#> Source: local data table [2 x 1]
#> Call:   `_DT9`[, CJ(type = type, unique = TRUE)]
#> 
#>   type  
#>   <chr> 
#> 1 apple 
#> 2 orange
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
fruits %>% expand(type, size)
#> Source: local data table [6 x 2]
#> Call:   `_DT9`[, CJ(type = type, size = size, unique = TRUE)]
#> 
#>   type   size 
#>   <chr>  <fct>
#> 1 apple  XS   
#> 2 apple  S    
#> 3 apple  M    
#> 4 orange XS   
#> 5 orange S    
#> 6 orange M    
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

# This is different from the data frame behaviour:
fruits %>% dplyr::collect() %>% expand(type, size)
#> # A tibble: 8 × 2
#>   type   size 
#>   <chr>  <fct>
#> 1 apple  XS   
#> 2 apple  S    
#> 3 apple  M    
#> 4 apple  L    
#> 5 orange XS   
#> 6 orange S    
#> 7 orange M    
#> 8 orange L    

# Other uses -------------------------------------------------------
fruits %>% expand(type, size, 2010:2012)
#> Source: local data table [18 x 3]
#> Call:   `_DT9`[, CJ(type = type, size = size, V3 = 2010:2012, unique = TRUE)]
#> 
#>   type  size     V3
#>   <chr> <fct> <int>
#> 1 apple XS     2010
#> 2 apple XS     2011
#> 3 apple XS     2012
#> 4 apple S      2010
#> 5 apple S      2011
#> 6 apple S      2012
#> # … with 12 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

# Use `anti_join()` to determine which observations are missing
all <- fruits %>% expand(type, size, year)
all
#> Source: local data table [12 x 3]
#> Call:   `_DT9`[, CJ(type = type, size = size, year = year, unique = TRUE)]
#> 
#>   type  size   year
#>   <chr> <fct> <dbl>
#> 1 apple XS     2010
#> 2 apple XS     2012
#> 3 apple S      2010
#> 4 apple S      2012
#> 5 apple M      2010
#> 6 apple M      2012
#> # … with 6 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
all %>% dplyr::anti_join(fruits)
#> Joining, by = c("type", "size", "year")
#> Source: local data table [8 x 3]
#> Call:   `_DT9`[, CJ(type = type, size = size, year = year, unique = TRUE)][!`_DT9`, 
#>     on = .(type, size, year)]
#> 
#>   type   size   year
#>   <chr>  <fct> <dbl>
#> 1 apple  XS     2012
#> 2 apple  S      2010
#> 3 apple  S      2012
#> 4 apple  M      2010
#> 5 orange XS     2010
#> 6 orange XS     2012
#> # … with 2 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

# Use with `right_join()` to fill in missing rows
fruits %>% dplyr::right_join(all)
#> Joining, by = c("type", "year", "size")
#> Source: local data table [14 x 4]
#> Call:   `_DT9`[`_DT9`[, CJ(type = type, size = size, year = year, unique = TRUE)], 
#>     on = .(type, year, size), allow.cartesian = TRUE]
#> 
#>   type   year size  weights
#>   <chr> <dbl> <fct>   <dbl>
#> 1 apple  2010 XS       1.78
#> 2 apple  2012 XS      NA   
#> 3 apple  2010 S       NA   
#> 4 apple  2012 S       NA   
#> 5 apple  2010 M       NA   
#> 6 apple  2012 M        4.81
#> # … with 8 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

相關用法


注:本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Expand data frame to include all possible combinations of values.。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。