SQL 表上的所有數據操作都是惰性的:除非您要求,否則它們實際上不會運行查詢或檢索數據:它們都返回一個新的 tbl_dbi
對象。使用 compute()
運行查詢並將結果保存在數據庫中的臨時文件中,或使用 collect()
將結果檢索到 R。您可以使用 show_query()
查看查詢。
參數
- src
-
由
DBI::dbConnect()
生成的DBIConnection
對象。 - from
-
字符串(給出表名)、由
in_schema()
創建的完全限定表名或文字sql()
字符串。 - ...
-
傳遞給
tbl_sql()
例子
library(dplyr)
# Connect to a temporary in-memory SQLite database
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
# Add some data
copy_to(con, mtcars)
DBI::dbListTables(con)
#> [1] "mtcars" "sqlite_stat1" "sqlite_stat4"
# To retrieve a single table from a source, use `tbl()`
con %>% tbl("mtcars")
#> # Source: table<mtcars> [?? x 11]
#> # Database: sqlite 3.41.2 [:memory:]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ℹ more rows
# Use `in_schema()` for fully qualified table names
con %>% tbl(in_schema("temp", "mtcars")) %>% head(1)
#> # Source: SQL [1 x 11]
#> # Database: sqlite 3.41.2 [:memory:]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
# You can also use pass raw SQL if you want a more sophisticated query
con %>% tbl(sql("SELECT * FROM mtcars WHERE cyl = 8"))
#> # Source: SQL [?? x 11]
#> # Database: sqlite 3.41.2 [:memory:]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 2 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 3 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
#> 4 17.3 8 276. 180 3.07 3.73 17.6 0 0 3 3
#> 5 15.2 8 276. 180 3.07 3.78 18 0 0 3 3
#> 6 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
#> 7 10.4 8 460 215 3 5.42 17.8 0 0 3 4
#> 8 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
#> 9 15.5 8 318 150 2.76 3.52 16.9 0 0 3 2
#> 10 15.2 8 304 150 3.15 3.44 17.3 0 0 3 2
#> # ℹ more rows
# If you just want a temporary in-memory database, use src_memdb()
src2 <- src_memdb()
# To show off the full features of dplyr's database integration,
# we'll use the Lahman database. lahman_sqlite() takes care of
# creating the database.
if (requireNamespace("Lahman", quietly = TRUE)) {
batting <- copy_to(con, Lahman::Batting)
batting
# Basic data manipulation verbs work in the same way as with a tibble
batting %>% filter(yearID > 2005, G > 130)
batting %>% select(playerID:lgID)
batting %>% arrange(playerID, desc(yearID))
batting %>% summarise(G = mean(G), n = n())
# There are a few exceptions. For example, databases give integer results
# when dividing one integer by another. Multiply by 1 to fix the problem
batting %>%
select(playerID:lgID, AB, R, G) %>%
mutate(
R_per_game1 = R / G,
R_per_game2 = R * 1.0 / G
)
# All operations are lazy: they don't do anything until you request the
# data, either by `print()`ing it (which shows the first ten rows),
# or by `collect()`ing the results locally.
system.time(recent <- filter(batting, yearID > 2010))
system.time(collect(recent))
# You can see the query that dplyr creates with show_query()
batting %>%
filter(G > 0) %>%
group_by(playerID) %>%
summarise(n = n()) %>%
show_query()
}
#> <SQL>
#> SELECT `playerID`, COUNT(*) AS `n`
#> FROM `Lahman::Batting`
#> WHERE (`G` > 0.0)
#> GROUP BY `playerID`
相關用法
- R dbplyr tbl_lazy 創建本地惰性 tibble
- R dbplyr translate_sql 將表達式轉換為 SQL
- R dbplyr backend-teradata 後端:Teradata
- R dbplyr escape 轉義/引用字符串。
- R dbplyr expand.tbl_lazy 擴展 SQL 表以包含所有可能的值組合
- R dbplyr distinct.tbl_lazy 子集不同/唯一行
- R dbplyr backend-sqlite 後端:SQLite
- R dbplyr pivot_wider.tbl_lazy 將數據從長軸轉向寬軸
- R dbplyr build_sql 構建 SQL 字符串。
- R dbplyr mutate.tbl_lazy 創建、修改和刪除列
- R dbplyr collapse.tbl_sql 計算查詢的結果
- R dbplyr sql_expr 從 R 表達式生成 SQL
- R dbplyr get_returned_rows 提取並檢查返回的行
- R dbplyr dbplyr_uncount “計數”數據庫表
- R dbplyr count.tbl_lazy 按組計數觀察值
- R dbplyr backend-odbc 後端:ODBC
- R dbplyr head.tbl_lazy 對第一行進行子集化
- R dbplyr db-quote SQL 轉義/引用泛型
- R dbplyr copy_inline 在 dbplyr 查詢中使用本地 DataFrame
- R dbplyr backend-oracle 後端:甲骨文
- R dbplyr backend-snowflake 後端:雪花
- R dbplyr lahman 緩存並檢索 Lahman 棒球數據庫的 src_sqlite。
- R dbplyr backend-redshift 後端:紅移
- R dbplyr partial_eval 部分評估表達式。
- R dbplyr group_by.tbl_lazy 按一個或多個變量分組
注:本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Use dplyr verbs with a remote database table。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。