当前位置: 首页>>代码示例 >>用法及示例精选 >>正文


R readxl read_excel 读取 xls 和 xlsx 文件


读取 xls 和 xlsx 文件

read_excel() 调用 excel_format() 根据文件扩展名和文件本身按顺序确定 path 是 xls 还是 xlsx。如果您更了解并且想防止这种猜测,请直接使用read_xls()read_xlsx()

用法

read_excel(
  path,
  sheet = NULL,
  range = NULL,
  col_names = TRUE,
  col_types = NULL,
  na = "",
  trim_ws = TRUE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  progress = readxl_progress(),
  .name_repair = "unique"
)

read_xls(
  path,
  sheet = NULL,
  range = NULL,
  col_names = TRUE,
  col_types = NULL,
  na = "",
  trim_ws = TRUE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  progress = readxl_progress(),
  .name_repair = "unique"
)

read_xlsx(
  path,
  sheet = NULL,
  range = NULL,
  col_names = TRUE,
  col_types = NULL,
  na = "",
  trim_ws = TRUE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  progress = readxl_progress(),
  .name_repair = "unique"
)

参数

path

xls/xlsx 文件的路径。

sheet

要阅读的表。字符串(工作表的名称)或整数(工作表的位置)。如果通过 range 指定工作表,则忽略。如果两个参数都没有指定工作表,则默认为第一个工作表。

range

要读取的单元格范围,如 cell-specification 中所述。包括典型的 Excel 范围(如 "B3:D87"),可能包括工作表名称(如 "Budget!B2:G14")等。严格解释,即使范围强制包含前导或尾随空行或列。优先于 skipn_maxsheet

col_names

TRUE 使用第一行作为列名称,FALSE 获取默认名称,或为每列提供名称的字符向量。如果用户提供 col_types 作为向量,则 col_names 每列可以有一个条目,即与 col_types 具有相同的长度,或者每个未跳过的列有一个条目。

col_types

NULL 从电子表格中猜测所有内容,或者使用以下选项中每列包含一个条目的字符向量:"skip"、"guess"、"logical"、"numeric"、"date"、"text" 或 "list" 。如果恰好指定了一个col_type,它将被回收。永远不会读取跳过列中单元格的内容,并且该列不会出现在数据帧输出中。列表单元格将列加载为长度为 1 的向量列表,这些向量使用 col_types = NULL 中的类型猜测逻辑进行键入,但基于 cell-by-cell 。

na

要解释为缺失值的字符串的字符向量。默认情况下,readxl 将空白单元格视为缺失数据。

trim_ws

是否应该修剪前导和尾随空格?

skip

在读取任何内容(无论是列名还是数据)之前要跳过的最小行数。前导空行会自动跳过,因此这是一个下限。如果给出range,则忽略。

n_max

要读取的最大数据行数。尾随空行会自动跳过,因此这是返回的 tibble 中行数的上限。如果给出range,则忽略。

guess_max

用于猜测列类型的最大数据行数。

progress

显示进度旋转器?默认情况下,微调器仅出现在交互式会话中、在编写文档的上下文之外,并且当调用可能运行几秒钟或更长时间时。有关更多详细信息,请参阅readxl_progress()

.name_repair

列名的处理。传递给tibble::as_tibble()。 readxl 的默认值是`.name_repair = "unique",这确保列名不为空并且是唯一的。

tibble

也可以看看

cell-specification 了解有关使用 range 参数定位单元格的更多详细信息

例子

datasets <- readxl_example("datasets.xlsx")
read_excel(datasets)
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows

# Specify sheet either by position or by name
read_excel(datasets, 2)
#> # A tibble: 32 × 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ℹ 22 more rows
read_excel(datasets, "mtcars")
#> # A tibble: 32 × 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ℹ 22 more rows

# Skip rows and use default column names
read_excel(datasets, skip = 148, col_names = FALSE)
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> • `` -> `...3`
#> • `` -> `...4`
#> • `` -> `...5`
#> # A tibble: 3 × 5
#>    ...1  ...2  ...3  ...4 ...5     
#>   <dbl> <dbl> <dbl> <dbl> <chr>    
#> 1   6.5   3     5.2   2   virginica
#> 2   6.2   3.4   5.4   2.3 virginica
#> 3   5.9   3     5.1   1.8 virginica

# Recycle a single column type
read_excel(datasets, col_types = "text")
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>    <chr>        <chr>       <chr>        <chr>       <chr>  
#>  1 5.1          3.5         1.4          0.2         setosa 
#>  2 4.9          3           1.4          0.2         setosa 
#>  3 4.7          3.2         1.3          0.2         setosa 
#>  4 4.6          3.1         1.5          0.2         setosa 
#>  5 5            3.6         1.4          0.2         setosa 
#>  6 5.4          3.9         1.7          0.4         setosa 
#>  7 4.6          3.4         1.4          0.3         setosa 
#>  8 5            3.4         1.5          0.2         setosa 
#>  9 4.4          2.9         1.4          0.2         setosa 
#> 10 4.9          3.1         1.5          0.1         setosa 
#> # ℹ 140 more rows

# Specify some col_types and guess others
read_excel(datasets, col_types = c("text", "guess", "numeric", "guess", "guess"))
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>    <chr>              <dbl>        <dbl>       <dbl> <chr>  
#>  1 5.1                  3.5          1.4         0.2 setosa 
#>  2 4.9                  3            1.4         0.2 setosa 
#>  3 4.7                  3.2          1.3         0.2 setosa 
#>  4 4.6                  3.1          1.5         0.2 setosa 
#>  5 5                    3.6          1.4         0.2 setosa 
#>  6 5.4                  3.9          1.7         0.4 setosa 
#>  7 4.6                  3.4          1.4         0.3 setosa 
#>  8 5                    3.4          1.5         0.2 setosa 
#>  9 4.4                  2.9          1.4         0.2 setosa 
#> 10 4.9                  3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows

# Accomodate a column with disparate types via col_type = "list"
df <- read_excel(readxl_example("clippy.xlsx"), col_types = c("text", "list"))
df
#> # A tibble: 4 × 2
#>   name                 value     
#>   <chr>                <list>    
#> 1 Name                 <chr [1]> 
#> 2 Species              <chr [1]> 
#> 3 Approx date of death <dttm [1]>
#> 4 Weight in grams      <dbl [1]> 
df$value
#> [[1]]
#> [1] "Clippy"
#> 
#> [[2]]
#> [1] "paperclip"
#> 
#> [[3]]
#> [1] "2007-01-01 UTC"
#> 
#> [[4]]
#> [1] 0.9
#> 
sapply(df$value, class)
#> [[1]]
#> [1] "character"
#> 
#> [[2]]
#> [1] "character"
#> 
#> [[3]]
#> [1] "POSIXct" "POSIXt" 
#> 
#> [[4]]
#> [1] "numeric"
#> 

# Limit the number of data rows read
read_excel(datasets, n_max = 3)
#> # A tibble: 3 × 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1          5.1         3.5          1.4         0.2 setosa 
#> 2          4.9         3            1.4         0.2 setosa 
#> 3          4.7         3.2          1.3         0.2 setosa 

# Read from an Excel range using A1 or R1C1 notation
read_excel(datasets, range = "C1:E7")
#> # A tibble: 6 × 3
#>   Petal.Length Petal.Width Species
#>          <dbl>       <dbl> <chr>  
#> 1          1.4         0.2 setosa 
#> 2          1.4         0.2 setosa 
#> 3          1.3         0.2 setosa 
#> 4          1.5         0.2 setosa 
#> 5          1.4         0.2 setosa 
#> 6          1.7         0.4 setosa 
read_excel(datasets, range = "R1C2:R2C5")
#> # A tibble: 1 × 4
#>   Sepal.Width Petal.Length Petal.Width Species
#>         <dbl>        <dbl>       <dbl> <chr>  
#> 1         3.5          1.4         0.2 setosa 

# Specify the sheet as part of the range
read_excel(datasets, range = "mtcars!B1:D5")
#> # A tibble: 4 × 3
#>     cyl  disp    hp
#>   <dbl> <dbl> <dbl>
#> 1     6   160   110
#> 2     6   160   110
#> 3     4   108    93
#> 4     6   258   110

# Read only specific rows or columns
read_excel(datasets, range = cell_rows(102:151), col_names = FALSE)
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> • `` -> `...3`
#> • `` -> `...4`
#> • `` -> `...5`
#> # A tibble: 50 × 5
#>     ...1  ...2  ...3  ...4 ...5     
#>    <dbl> <dbl> <dbl> <dbl> <chr>    
#>  1   6.3   3.3   6     2.5 virginica
#>  2   5.8   2.7   5.1   1.9 virginica
#>  3   7.1   3     5.9   2.1 virginica
#>  4   6.3   2.9   5.6   1.8 virginica
#>  5   6.5   3     5.8   2.2 virginica
#>  6   7.6   3     6.6   2.1 virginica
#>  7   4.9   2.5   4.5   1.7 virginica
#>  8   7.3   2.9   6.3   1.8 virginica
#>  9   6.7   2.5   5.8   1.8 virginica
#> 10   7.2   3.6   6.1   2.5 virginica
#> # ℹ 40 more rows
read_excel(datasets, range = cell_cols("B:D"))
#> # A tibble: 150 × 3
#>    Sepal.Width Petal.Length Petal.Width
#>          <dbl>        <dbl>       <dbl>
#>  1         3.5          1.4         0.2
#>  2         3            1.4         0.2
#>  3         3.2          1.3         0.2
#>  4         3.1          1.5         0.2
#>  5         3.6          1.4         0.2
#>  6         3.9          1.7         0.4
#>  7         3.4          1.4         0.3
#>  8         3.4          1.5         0.2
#>  9         2.9          1.4         0.2
#> 10         3.1          1.5         0.1
#> # ℹ 140 more rows

# Get a preview of column names
names(read_excel(readxl_example("datasets.xlsx"), n_max = 0))
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
#> [5] "Species"     

# exploit full .name_repair flexibility from tibble

# "universal" names are unique and syntactic
read_excel(
  readxl_example("deaths.xlsx"),
  range = "arts!A5:F15",
  .name_repair = "universal"
)
#> New names:
#> • `Has kids` -> `Has.kids`
#> • `Date of birth` -> `Date.of.birth`
#> • `Date of death` -> `Date.of.death`
#> # A tibble: 10 × 6
#>    Name  Profession   Age Has.kids Date.of.birth       Date.of.death      
#>    <chr> <chr>      <dbl> <lgl>    <dttm>              <dttm>             
#>  1 Davi… musician      69 TRUE     1947-01-08 00:00:00 2016-01-10 00:00:00
#>  2 Carr… actor         60 TRUE     1956-10-21 00:00:00 2016-12-27 00:00:00
#>  3 Chuc… musician      90 TRUE     1926-10-18 00:00:00 2017-03-18 00:00:00
#>  4 Bill… actor         61 TRUE     1955-05-17 00:00:00 2017-02-25 00:00:00
#>  5 Prin… musician      57 TRUE     1958-06-07 00:00:00 2016-04-21 00:00:00
#>  6 Alan… actor         69 FALSE    1946-02-21 00:00:00 2016-01-14 00:00:00
#>  7 Flor… actor         82 TRUE     1934-02-14 00:00:00 2016-11-24 00:00:00
#>  8 Harp… author        89 FALSE    1926-04-28 00:00:00 2016-02-19 00:00:00
#>  9 Zsa … actor         99 TRUE     1917-02-06 00:00:00 2016-12-18 00:00:00
#> 10 Geor… musician      53 FALSE    1963-06-25 00:00:00 2016-12-25 00:00:00

# specify name repair as a built-in function
read_excel(readxl_example("clippy.xlsx"), .name_repair = toupper)
#> # A tibble: 4 × 2
#>   NAME                 VALUE    
#>   <chr>                <chr>    
#> 1 Name                 Clippy   
#> 2 Species              paperclip
#> 3 Approx date of death 39083    
#> 4 Weight in grams      0.9      

# specify name repair as a custom function
my_custom_name_repair <- function(nms) tolower(gsub("[.]", "_", nms))
read_excel(
  readxl_example("datasets.xlsx"),
  .name_repair = my_custom_name_repair
)
#> # A tibble: 150 × 5
#>    sepal_length sepal_width petal_length petal_width species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows

# specify name repair as an anonymous function
read_excel(
  readxl_example("datasets.xlsx"),
  sheet = "chickwts",
  .name_repair = ~ substr(.x, start = 1, stop = 3)
)
#> # A tibble: 71 × 2
#>      wei fee      
#>    <dbl> <chr>    
#>  1   179 horsebean
#>  2   160 horsebean
#>  3   136 horsebean
#>  4   227 horsebean
#>  5   217 horsebean
#>  6   168 horsebean
#>  7   108 horsebean
#>  8   124 horsebean
#>  9   143 horsebean
#> 10   140 horsebean
#> # ℹ 61 more rows
源代码:R/read_excel.R

相关用法


注:本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Read xls and xlsx files。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。