SQL 表上的所有数据操作都是惰性的:除非您要求,否则它们实际上不会运行查询或检索数据:它们都返回一个新的 tbl_dbi
对象。使用 compute()
运行查询并将结果保存在数据库中的临时文件中,或使用 collect()
将结果检索到 R。您可以使用 show_query()
查看查询。
参数
- src
-
由
DBI::dbConnect()
生成的DBIConnection
对象。 - from
-
字符串(给出表名)、由
in_schema()
创建的完全限定表名或文字sql()
字符串。 - ...
-
传递给
tbl_sql()
例子
library(dplyr)
# Connect to a temporary in-memory SQLite database
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
# Add some data
copy_to(con, mtcars)
DBI::dbListTables(con)
#> [1] "mtcars" "sqlite_stat1" "sqlite_stat4"
# To retrieve a single table from a source, use `tbl()`
con %>% tbl("mtcars")
#> # Source: table<mtcars> [?? x 11]
#> # Database: sqlite 3.41.2 [:memory:]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ℹ more rows
# Use `in_schema()` for fully qualified table names
con %>% tbl(in_schema("temp", "mtcars")) %>% head(1)
#> # Source: SQL [1 x 11]
#> # Database: sqlite 3.41.2 [:memory:]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
# You can also use pass raw SQL if you want a more sophisticated query
con %>% tbl(sql("SELECT * FROM mtcars WHERE cyl = 8"))
#> # Source: SQL [?? x 11]
#> # Database: sqlite 3.41.2 [:memory:]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 2 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 3 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
#> 4 17.3 8 276. 180 3.07 3.73 17.6 0 0 3 3
#> 5 15.2 8 276. 180 3.07 3.78 18 0 0 3 3
#> 6 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
#> 7 10.4 8 460 215 3 5.42 17.8 0 0 3 4
#> 8 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
#> 9 15.5 8 318 150 2.76 3.52 16.9 0 0 3 2
#> 10 15.2 8 304 150 3.15 3.44 17.3 0 0 3 2
#> # ℹ more rows
# If you just want a temporary in-memory database, use src_memdb()
src2 <- src_memdb()
# To show off the full features of dplyr's database integration,
# we'll use the Lahman database. lahman_sqlite() takes care of
# creating the database.
if (requireNamespace("Lahman", quietly = TRUE)) {
batting <- copy_to(con, Lahman::Batting)
batting
# Basic data manipulation verbs work in the same way as with a tibble
batting %>% filter(yearID > 2005, G > 130)
batting %>% select(playerID:lgID)
batting %>% arrange(playerID, desc(yearID))
batting %>% summarise(G = mean(G), n = n())
# There are a few exceptions. For example, databases give integer results
# when dividing one integer by another. Multiply by 1 to fix the problem
batting %>%
select(playerID:lgID, AB, R, G) %>%
mutate(
R_per_game1 = R / G,
R_per_game2 = R * 1.0 / G
)
# All operations are lazy: they don't do anything until you request the
# data, either by `print()`ing it (which shows the first ten rows),
# or by `collect()`ing the results locally.
system.time(recent <- filter(batting, yearID > 2010))
system.time(collect(recent))
# You can see the query that dplyr creates with show_query()
batting %>%
filter(G > 0) %>%
group_by(playerID) %>%
summarise(n = n()) %>%
show_query()
}
#> <SQL>
#> SELECT `playerID`, COUNT(*) AS `n`
#> FROM `Lahman::Batting`
#> WHERE (`G` > 0.0)
#> GROUP BY `playerID`
相关用法
- R dbplyr tbl_lazy 创建本地惰性 tibble
- R dbplyr translate_sql 将表达式转换为 SQL
- R dbplyr backend-teradata 后端:Teradata
- R dbplyr escape 转义/引用字符串。
- R dbplyr expand.tbl_lazy 扩展 SQL 表以包含所有可能的值组合
- R dbplyr distinct.tbl_lazy 子集不同/唯一行
- R dbplyr backend-sqlite 后端:SQLite
- R dbplyr pivot_wider.tbl_lazy 将数据从长轴转向宽轴
- R dbplyr build_sql 构建 SQL 字符串。
- R dbplyr mutate.tbl_lazy 创建、修改和删除列
- R dbplyr collapse.tbl_sql 计算查询的结果
- R dbplyr sql_expr 从 R 表达式生成 SQL
- R dbplyr get_returned_rows 提取并检查返回的行
- R dbplyr dbplyr_uncount “计数”数据库表
- R dbplyr count.tbl_lazy 按组计数观察值
- R dbplyr backend-odbc 后端:ODBC
- R dbplyr head.tbl_lazy 对第一行进行子集化
- R dbplyr db-quote SQL 转义/引用泛型
- R dbplyr copy_inline 在 dbplyr 查询中使用本地 DataFrame
- R dbplyr backend-oracle 后端:甲骨文
- R dbplyr backend-snowflake 后端:雪花
- R dbplyr lahman 缓存并检索 Lahman 棒球数据库的 src_sqlite。
- R dbplyr backend-redshift 后端:红移
- R dbplyr partial_eval 部分评估表达式。
- R dbplyr group_by.tbl_lazy 按一个或多个变量分组
注:本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Use dplyr verbs with a remote database table。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。