R dtplyr left_join.dtplyr_step 連接數據表

這些是 dplyr 泛型 left_join() 、 right_join() 、 inner_join() 、 full_join() 、 anti_join() 和 semi_join() 的方法。左、右、內和反連接被轉換為 [.data.table 等效的完全連接到 data.table::merge.data.table() 。在某些情況下，左連接、右連接和全連接後麵會調用 data.table::setcolorder() 和 data.table::setnames()，以確保列順序和名稱與 dplyr 約定匹配。半連接沒有直接的 data.table 等效項。

用法

# S3 method for dtplyr_step
left_join(x, y, ..., by = NULL, copy = FALSE, suffix = c(".x", ".y"))

參數

x, y

一對lazy_dt()。

...

傳遞給方法的其他參數。

by

使用 join_by() 創建的連接規範，或要連接的變量的字符向量。

如果 NULL (默認值)，*_join() 將使用 x 和 y 之間的所有共同變量執行自然連接。一條消息列出了變量，以便您可以檢查它們是否正確；通過顯式提供 by 來抑製該消息。

要連接 x 和 y 之間的不同變量，請使用 join_by() 規範。例如， join_by(a == b) 將匹配 x$a 到 y$b 。

要連接多個變量，請使用帶有多個表達式的 join_by() 規範。例如， join_by(a == b, c == d) 將 x$a 與 y$b 匹配，將 x$c 與 y$d 匹配。如果 x 和 y 之間的列名稱相同，您可以通過僅列出變量名稱來縮短列名稱，例如 join_by(a, c) 。

join_by() 還可用於執行不等式連接、滾動連接和重疊連接。有關這些類型的連接的詳細信息，請參閱?join_by 中的文檔。

對於簡單的等式連接，您也可以指定要連接的變量名稱的字符向量。例如， by = c("a", "b") 將 x$a 連接到 y$a 並將 x$b 連接到 y$b 。如果 x 和 y 之間的變量名稱不同，請使用命名字符向量，例如 by = c("x_a" = "y_a", "x_b" = "y_b") 。

要執行交叉聯接，生成 x 和 y 的所有組合，請參閱 cross_join() 。

copy

如果 x 和 y 不是來自同一個數據源，並且 copy 是 TRUE ，則 y 將被複製到與 x 相同的源中。這允許您跨 src 連接表，但這是一項潛在昂貴的操作，因此您必須選擇它。

suffix

如果 x 和 y 中存在未連接的重複變量，這些後綴將添加到輸出中以消除它們的歧義。應該是長度為 2 的字符向量。

例子

library(dplyr, warn.conflicts = FALSE)

band_dt <- lazy_dt(dplyr::band_members)
instrument_dt <- lazy_dt(dplyr::band_instruments)

band_dt %>% left_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [3 x 3]
#> Call:   setcolorder(`_DT23`[`_DT22`, on = .(name), allow.cartesian = TRUE], 
#>     c(1L, 3L, 2L))
#> 
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 Mick  Stones  NA    
#> 2 John  Beatles guitar
#> 3 Paul  Beatles bass  
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
band_dt %>% right_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [3 x 3]
#> Call:   `_DT22`[`_DT23`, on = .(name), allow.cartesian = TRUE]
#> 
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 John  Beatles guitar
#> 2 Paul  Beatles bass  
#> 3 Keith NA      guitar
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
band_dt %>% inner_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [2 x 3]
#> Call:   `_DT22`[`_DT23`, on = .(name), nomatch = NULL, allow.cartesian = TRUE]
#> 
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 John  Beatles guitar
#> 2 Paul  Beatles bass  
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
band_dt %>% full_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [4 x 3]
#> Call:   merge(`_DT22`, `_DT23`, all = TRUE, by.x = "name", by.y = "name", 
#>     allow.cartesian = TRUE)
#> 
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 John  Beatles guitar
#> 2 Keith NA      guitar
#> 3 Mick  Stones  NA    
#> 4 Paul  Beatles bass  
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

band_dt %>% semi_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [2 x 2]
#> Call:   `_DT22`[unique(`_DT22`[`_DT23`, which = TRUE, nomatch = NULL, 
#>     on = .(name)])]
#> 
#>   name  band   
#>   <chr> <chr>  
#> 1 John  Beatles
#> 2 Paul  Beatles
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
band_dt %>% anti_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [1 x 2]
#> Call:   `_DT22`[!`_DT23`, on = .(name)]
#> 
#>   name  band  
#>   <chr> <chr> 
#> 1 Mick  Stones
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

源代碼：R/step-join.R

相關用法

注：本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Join data tables。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。