R googlesheets4 range_read 将工作表读入 DataFrame 中

这是 googlesheets4 包的主要 "read" 函数。它有两个名称，因为我们希望它在两个上下文中有意义：

read_sheet() 唤起其他 table-reading 函数，例如 readr::read_csv() 和 readxl::read_excel() 。本例中的 sheet 指的是 Google(电子表格)表。
根据 googlesheets4 包中使用的命名约定，range_read() 是正确的名称。

read_sheet() 和 range_read() 是同义词，您可以使用其中之一。

用法

range_read(
  ss,
  sheet = NULL,
  range = NULL,
  col_names = TRUE,
  col_types = NULL,
  na = "",
  trim_ws = TRUE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  .name_repair = "unique"
)

read_sheet(
  ss,
  sheet = NULL,
  range = NULL,
  col_names = TRUE,
  col_types = NULL,
  na = "",
  trim_ws = TRUE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  .name_repair = "unique"
)

参数

ss

识别 Google 表格的内容：

其文件 ID 作为字符串或 drive_id
我们可以从中恢复 id 的 URL
one-row dribble ，这就是 googledrive 表示云端硬盘文件的方式
googlesheets4_spreadsheet 的实例，这就是 gs4_get() 返回的内容

通过 as_sheets_id() 处理。

sheet

要阅读的表，在 "worksheet" 或 "tab" 的意义上。您可以通过名称(使用字符串)或位置(使用数字)来标识工作表。如果通过 range 指定工作表，则忽略。如果两个参数都没有指定工作表，则默认为第一个可见工作表。

range

要读取的单元格范围。如果是 NULL ，则读取所有非空单元格。否则，请按照 Sheets A1 notation 中的说明指定 range 或使用 cell-specification 中记录的帮助程序。 Sheets 使用相当标准的电子表格范围表示法，尽管与 Excel 略有不同。有效范围示例： "Sheet1!A1:B2" 、 "Sheet1!A:A" 、 "Sheet1!1:2" 、 "Sheet1!A5:A" 、 "A1:B2" 、 "Sheet1" 。严格解释，即使范围强制包含前导、尾随或嵌入的空行或列。优先于 skip 、 n_max 和 sheet 。注意 range 可以是命名范围，如 "sales_data" ，没有任何单元格引用。

col_names

TRUE 使用第一行作为列名称，FALSE 获取默认名称，或使用字符向量直接提供列名称。如果用户提供 col_types ，col_names 每列可以有一个条目，或者每个未跳过的列有一个条目。

col_types

列类型。从电子表格中猜测所有内容的 NULL 或一串 readr-style 短代码，每列一个字符或代码。如果恰好指定了一个col_type，则将其回收。有关更多信息，请参阅色谱柱规格。

na

要解释为缺失值的字符串的字符向量。默认情况下，空白单元格被视为缺失数据。

trim_ws

逻辑性强。是否应该从单元格内容中删除前导和尾随空格？

skip

在读取任何内容(无论是列名还是数据)之前要跳过的最小行数。前导空行会自动跳过，因此这是一个下限。如果给出range，则忽略。

n_max

要解析为返回的 tibble 的最大数据行数。尾随空行将被自动跳过，因此这是结果中行数的上限。如果给出range，则忽略。 n_max 在读取所有非空单元格后在本地强制执行，因此，如果速度是一个问题，最好使用 range 。

guess_max

用于猜测列类型的最大数据行数。

.name_repair

列名的处理。默认情况下，googlesheets4 确保列名称不为空并且是唯一的。完全支持 .name_repair，如 tibble::tibble() 中所述。

值

tibble

色谱柱规格

列类型必须在 readr-style 短代码的单个字符串中指定，例如"cci?l"表示“字符、字符、整数、猜测、逻辑”。这不是 googlesheets4 的 col 规范最终的结果，但它以与 readr 一致的方式滚动，并且不会重新发明任何轮子。

列类型的简码：

_ 或 - ：跳过。跳过的列中的数据仍然从 API 请求(此包中的高级函数是rectangle-oriented)，但不会解析到数据帧输出中。
?：猜猜。猜测每个单元格的类型，然后为该列选择一致类型。如果没有原子类型适合所有单元格，则会创建一个列表列，其中每个单元格都会转换为 "best" 类型的 R 对象。如果未指定列类型，即 col_types = NULL ，则猜测所有类型。
l：逻辑。
i：整数。这种类型永远不会从数据中猜测出来，因为工作表没有正式的整数单元格类型。
d 或 n ：数字，即"double"。
D：日期。这种类型永远不会从数据中猜测出来，因为日期单元格只是带有 "date" 格式的串行日期时间。
t：一天中的时间。这种类型永远不会从数据中猜测出来，因为时间单元只是带有 "time" 格式的串行日期时间。尚未实施；返回 POSIXct。
T：日期时间，特别是 POSIXct。
c：角色。
C：单元格。这种类型是 googlesheets4 所独有的。这将返回原始单元格数据，作为 R 列表，其中包含 Sheets API 为该单元格发送的所有内容。具有 "CELL_SOMETHING" 和 "SHEETS_CELL" 的 S3 类型。大多数在内部有用，但向那些想要直接访问(例如公式和格式)的人公开。
L：列表，如"list-column"。每个单元都是其发现类型的长度为 1 的原子向量。
后续内容：持续时间(代码为 : )和因子(代码为 f )。

例子

ss <- gs4_example("deaths")
read_sheet(ss, range = "A5:F15")
#> ✔ Reading from deaths.
#> ✔ Range A5:F15.
#> # A tibble: 10 × 6
#>    Name               Profession   Age `Has kids` `Date of birth`    
#>    <chr>              <chr>      <dbl> <lgl>      <dttm>             
#>  1 David Bowie        musician      69 TRUE       1947-01-08 00:00:00
#>  2 Carrie Fisher      actor         60 TRUE       1956-10-21 00:00:00
#>  3 Chuck Berry        musician      90 TRUE       1926-10-18 00:00:00
#>  4 Bill Paxton        actor         61 TRUE       1955-05-17 00:00:00
#>  5 Prince             musician      57 TRUE       1958-06-07 00:00:00
#>  6 Alan Rickman       actor         69 FALSE      1946-02-21 00:00:00
#>  7 Florence Henderson actor         82 TRUE       1934-02-14 00:00:00
#>  8 Harper Lee         author        89 FALSE      1926-04-28 00:00:00
#>  9 Zsa Zsa Gábor      actor         99 TRUE       1917-02-06 00:00:00
#> 10 George Michael     musician      53 FALSE      1963-06-25 00:00:00
#> # ℹ 1 more variable: `Date of death` <dttm>
read_sheet(ss, range = "other!A5:F15", col_types = "ccilDD")
#> ✔ Reading from deaths.
#> ✔ Range ''other'!A5:F15'.
#> # A tibble: 10 × 6
#>    Name        Profession   Age `Has kids` `Date of birth` `Date of death`
#>    <chr>       <chr>      <int> <lgl>      <date>          <date>         
#>  1 Vera Rubin  scientist     88 TRUE       1928-07-23      2016-12-25     
#>  2 Mohamed Ali athlete       74 TRUE       1942-01-17      2016-06-03     
#>  3 Morley Saf… journalist    84 TRUE       1931-11-08      2016-05-19     
#>  4 Fidel Cast… politician    90 TRUE       1926-08-13      2016-11-25     
#>  5 Antonin Sc… lawyer        79 TRUE       1936-03-11      2016-02-13     
#>  6 Jo Cox      politician    41 TRUE       1974-06-22      2016-06-16     
#>  7 Janet Reno  lawyer        78 FALSE      1938-07-21      2016-11-07     
#>  8 Gwen Ifill  journalist    61 FALSE      1955-09-29      2016-11-14     
#>  9 John Glenn  astronaut     95 TRUE       1921-07-28      2016-12-08     
#> 10 Pat Summit  coach         64 TRUE       1952-06-14      2016-06-28     
read_sheet(ss, range = "arts_data", col_types = "ccilDD")
#> ✔ Reading from deaths.
#> ✔ Range arts_data.
#> # A tibble: 10 × 6
#>    Name        Profession   Age `Has kids` `Date of birth` `Date of death`
#>    <chr>       <chr>      <int> <lgl>      <date>          <date>         
#>  1 David Bowie musician      69 TRUE       1947-01-08      2016-01-10     
#>  2 Carrie Fis… actor         60 TRUE       1956-10-21      2016-12-27     
#>  3 Chuck Berry musician      90 TRUE       1926-10-18      2017-03-18     
#>  4 Bill Paxton actor         61 TRUE       1955-05-17      2017-02-25     
#>  5 Prince      musician      57 TRUE       1958-06-07      2016-04-21     
#>  6 Alan Rickm… actor         69 FALSE      1946-02-21      2016-01-14     
#>  7 Florence H… actor         82 TRUE       1934-02-14      2016-11-24     
#>  8 Harper Lee  author        89 FALSE      1926-04-28      2016-02-19     
#>  9 Zsa Zsa Gá… actor         99 TRUE       1917-02-06      2016-12-18     
#> 10 George Mic… musician      53 FALSE      1963-06-25      2016-12-25     

read_sheet(gs4_example("mini-gap"))
#> ✔ Reading from mini-gap.
#> ✔ Range Africa.
#> # A tibble: 5 × 6
#>   country      continent  year lifeExp     pop gdpPercap
#>   <chr>        <chr>     <dbl>   <dbl>   <dbl>     <dbl>
#> 1 Algeria      Africa     1952    43.1 9279525     2449.
#> 2 Angola       Africa     1952    30.0 4232095     3521.
#> 3 Benin        Africa     1952    38.2 1738315     1063.
#> 4 Botswana     Africa     1952    47.6  442308      851.
#> 5 Burkina Faso Africa     1952    32.0 4469979      543.
read_sheet(
  gs4_example("mini-gap"),
  sheet = "Europe",
  range = "A:D",
  col_types = "ccid"
)
#> ✔ Reading from mini-gap.
#> ✔ Range ''Europe'!A:D'.
#> # A tibble: 5 × 4
#>   country                continent  year lifeExp
#>   <chr>                  <chr>     <int>   <dbl>
#> 1 Albania                Europe     1952    55.2
#> 2 Austria                Europe     1952    66.8
#> 3 Belgium                Europe     1952    68  
#> 4 Bosnia and Herzegovina Europe     1952    53.8
#> 5 Bulgaria               Europe     1952    59.6

源代码：R/range_read.R

相关用法

注：本文由纯净天空筛选整理自Jennifer Bryan等大神的英文原创作品 Read a Sheet into a data frame。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。