R readxl read_excel 讀取 xls 和 xlsx 文件

讀取 xls 和 xlsx 文件

read_excel() 調用 excel_format() 根據文件擴展名和文件本身按順序確定 path 是 xls 還是 xlsx。如果您更了解並且想防止這種猜測，請直接使用read_xls()和read_xlsx()。

用法

read_excel(
  path,
  sheet = NULL,
  range = NULL,
  col_names = TRUE,
  col_types = NULL,
  na = "",
  trim_ws = TRUE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  progress = readxl_progress(),
  .name_repair = "unique"
)

read_xls(
  path,
  sheet = NULL,
  range = NULL,
  col_names = TRUE,
  col_types = NULL,
  na = "",
  trim_ws = TRUE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  progress = readxl_progress(),
  .name_repair = "unique"
)

read_xlsx(
  path,
  sheet = NULL,
  range = NULL,
  col_names = TRUE,
  col_types = NULL,
  na = "",
  trim_ws = TRUE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  progress = readxl_progress(),
  .name_repair = "unique"
)

參數

path: xls/xlsx 文件的路徑。
sheet: 要閱讀的表。字符串(工作表的名稱)或整數(工作表的位置)。如果通過 range 指定工作表，則忽略。如果兩個參數都沒有指定工作表，則默認為第一個工作表。
range: 要讀取的單元格範圍，如 cell-specification 中所述。包括典型的 Excel 範圍(如 "B3:D87")，可能包括工作表名稱(如 "Budget!B2:G14")等。嚴格解釋，即使範圍強製包含前導或尾隨空行或列。優先於 skip 、 n_max 和 sheet 。
col_names: TRUE 使用第一行作為列名稱，FALSE 獲取默認名稱，或為每列提供名稱的字符向量。如果用戶提供 col_types 作為向量，則 col_names 每列可以有一個條目，即與 col_types 具有相同的長度，或者每個未跳過的列有一個條目。
col_types: NULL 從電子表格中猜測所有內容，或者使用以下選項中每列包含一個條目的字符向量："skip"、"guess"、"logical"、"numeric"、"date"、"text" 或 "list" 。如果恰好指定了一個col_type，它將被回收。永遠不會讀取跳過列中單元格的內容，並且該列不會出現在數據幀輸出中。列表單元格將列加載為長度為 1 的向量列表，這些向量使用 col_types = NULL 中的類型猜測邏輯進行鍵入，但基於 cell-by-cell 。
na: 要解釋為缺失值的字符串的字符向量。默認情況下，readxl 將空白單元格視為缺失數據。
trim_ws: 是否應該修剪前導和尾隨空格？
skip: 在讀取任何內容(無論是列名還是數據)之前要跳過的最小行數。前導空行會自動跳過，因此這是一個下限。如果給出range，則忽略。
n_max: 要讀取的最大數據行數。尾隨空行會自動跳過，因此這是返回的 tibble 中行數的上限。如果給出range，則忽略。
guess_max: 用於猜測列類型的最大數據行數。
progress: 顯示進度旋轉器？默認情況下，微調器僅出現在交互式會話中、在編寫文檔的上下文之外，並且當調用可能運行幾秒鍾或更長時間時。有關更多詳細信息，請參閱readxl_progress()。
.name_repair: 列名的處理。傳遞給tibble::as_tibble()。 readxl 的默認值是`.name_repair = "unique"，這確保列名不為空並且是唯一的。

值

tibble

也可以看看

cell-specification 了解有關使用 range 參數定位單元格的更多詳細信息

例子

datasets <- readxl_example("datasets.xlsx")
read_excel(datasets)
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows

# Specify sheet either by position or by name
read_excel(datasets, 2)
#> # A tibble: 32 × 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ℹ 22 more rows
read_excel(datasets, "mtcars")
#> # A tibble: 32 × 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ℹ 22 more rows

# Skip rows and use default column names
read_excel(datasets, skip = 148, col_names = FALSE)
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> • `` -> `...3`
#> • `` -> `...4`
#> • `` -> `...5`
#> # A tibble: 3 × 5
#>    ...1  ...2  ...3  ...4 ...5     
#>   <dbl> <dbl> <dbl> <dbl> <chr>    
#> 1   6.5   3     5.2   2   virginica
#> 2   6.2   3.4   5.4   2.3 virginica
#> 3   5.9   3     5.1   1.8 virginica

# Recycle a single column type
read_excel(datasets, col_types = "text")
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>    <chr>        <chr>       <chr>        <chr>       <chr>  
#>  1 5.1          3.5         1.4          0.2         setosa 
#>  2 4.9          3           1.4          0.2         setosa 
#>  3 4.7          3.2         1.3          0.2         setosa 
#>  4 4.6          3.1         1.5          0.2         setosa 
#>  5 5            3.6         1.4          0.2         setosa 
#>  6 5.4          3.9         1.7          0.4         setosa 
#>  7 4.6          3.4         1.4          0.3         setosa 
#>  8 5            3.4         1.5          0.2         setosa 
#>  9 4.4          2.9         1.4          0.2         setosa 
#> 10 4.9          3.1         1.5          0.1         setosa 
#> # ℹ 140 more rows

# Specify some col_types and guess others
read_excel(datasets, col_types = c("text", "guess", "numeric", "guess", "guess"))
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>    <chr>              <dbl>        <dbl>       <dbl> <chr>  
#>  1 5.1                  3.5          1.4         0.2 setosa 
#>  2 4.9                  3            1.4         0.2 setosa 
#>  3 4.7                  3.2          1.3         0.2 setosa 
#>  4 4.6                  3.1          1.5         0.2 setosa 
#>  5 5                    3.6          1.4         0.2 setosa 
#>  6 5.4                  3.9          1.7         0.4 setosa 
#>  7 4.6                  3.4          1.4         0.3 setosa 
#>  8 5                    3.4          1.5         0.2 setosa 
#>  9 4.4                  2.9          1.4         0.2 setosa 
#> 10 4.9                  3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows

# Accomodate a column with disparate types via col_type = "list"
df <- read_excel(readxl_example("clippy.xlsx"), col_types = c("text", "list"))
df
#> # A tibble: 4 × 2
#>   name                 value     
#>   <chr>                <list>    
#> 1 Name                 <chr [1]> 
#> 2 Species              <chr [1]> 
#> 3 Approx date of death <dttm [1]>
#> 4 Weight in grams      <dbl [1]> 
df$value
#> [[1]]
#> [1] "Clippy"
#> 
#> [[2]]
#> [1] "paperclip"
#> 
#> [[3]]
#> [1] "2007-01-01 UTC"
#> 
#> [[4]]
#> [1] 0.9
#> 
sapply(df$value, class)
#> [[1]]
#> [1] "character"
#> 
#> [[2]]
#> [1] "character"
#> 
#> [[3]]
#> [1] "POSIXct" "POSIXt" 
#> 
#> [[4]]
#> [1] "numeric"
#> 

# Limit the number of data rows read
read_excel(datasets, n_max = 3)
#> # A tibble: 3 × 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1          5.1         3.5          1.4         0.2 setosa 
#> 2          4.9         3            1.4         0.2 setosa 
#> 3          4.7         3.2          1.3         0.2 setosa 

# Read from an Excel range using A1 or R1C1 notation
read_excel(datasets, range = "C1:E7")
#> # A tibble: 6 × 3
#>   Petal.Length Petal.Width Species
#>          <dbl>       <dbl> <chr>  
#> 1          1.4         0.2 setosa 
#> 2          1.4         0.2 setosa 
#> 3          1.3         0.2 setosa 
#> 4          1.5         0.2 setosa 
#> 5          1.4         0.2 setosa 
#> 6          1.7         0.4 setosa 
read_excel(datasets, range = "R1C2:R2C5")
#> # A tibble: 1 × 4
#>   Sepal.Width Petal.Length Petal.Width Species
#>         <dbl>        <dbl>       <dbl> <chr>  
#> 1         3.5          1.4         0.2 setosa 

# Specify the sheet as part of the range
read_excel(datasets, range = "mtcars!B1:D5")
#> # A tibble: 4 × 3
#>     cyl  disp    hp
#>   <dbl> <dbl> <dbl>
#> 1     6   160   110
#> 2     6   160   110
#> 3     4   108    93
#> 4     6   258   110

# Read only specific rows or columns
read_excel(datasets, range = cell_rows(102:151), col_names = FALSE)
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> • `` -> `...3`
#> • `` -> `...4`
#> • `` -> `...5`
#> # A tibble: 50 × 5
#>     ...1  ...2  ...3  ...4 ...5     
#>    <dbl> <dbl> <dbl> <dbl> <chr>    
#>  1   6.3   3.3   6     2.5 virginica
#>  2   5.8   2.7   5.1   1.9 virginica
#>  3   7.1   3     5.9   2.1 virginica
#>  4   6.3   2.9   5.6   1.8 virginica
#>  5   6.5   3     5.8   2.2 virginica
#>  6   7.6   3     6.6   2.1 virginica
#>  7   4.9   2.5   4.5   1.7 virginica
#>  8   7.3   2.9   6.3   1.8 virginica
#>  9   6.7   2.5   5.8   1.8 virginica
#> 10   7.2   3.6   6.1   2.5 virginica
#> # ℹ 40 more rows
read_excel(datasets, range = cell_cols("B:D"))
#> # A tibble: 150 × 3
#>    Sepal.Width Petal.Length Petal.Width
#>          <dbl>        <dbl>       <dbl>
#>  1         3.5          1.4         0.2
#>  2         3            1.4         0.2
#>  3         3.2          1.3         0.2
#>  4         3.1          1.5         0.2
#>  5         3.6          1.4         0.2
#>  6         3.9          1.7         0.4
#>  7         3.4          1.4         0.3
#>  8         3.4          1.5         0.2
#>  9         2.9          1.4         0.2
#> 10         3.1          1.5         0.1
#> # ℹ 140 more rows

# Get a preview of column names
names(read_excel(readxl_example("datasets.xlsx"), n_max = 0))
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
#> [5] "Species"     

# exploit full .name_repair flexibility from tibble

# "universal" names are unique and syntactic
read_excel(
  readxl_example("deaths.xlsx"),
  range = "arts!A5:F15",
  .name_repair = "universal"
)
#> New names:
#> • `Has kids` -> `Has.kids`
#> • `Date of birth` -> `Date.of.birth`
#> • `Date of death` -> `Date.of.death`
#> # A tibble: 10 × 6
#>    Name  Profession   Age Has.kids Date.of.birth       Date.of.death      
#>    <chr> <chr>      <dbl> <lgl>    <dttm>              <dttm>             
#>  1 Davi… musician      69 TRUE     1947-01-08 00:00:00 2016-01-10 00:00:00
#>  2 Carr… actor         60 TRUE     1956-10-21 00:00:00 2016-12-27 00:00:00
#>  3 Chuc… musician      90 TRUE     1926-10-18 00:00:00 2017-03-18 00:00:00
#>  4 Bill… actor         61 TRUE     1955-05-17 00:00:00 2017-02-25 00:00:00
#>  5 Prin… musician      57 TRUE     1958-06-07 00:00:00 2016-04-21 00:00:00
#>  6 Alan… actor         69 FALSE    1946-02-21 00:00:00 2016-01-14 00:00:00
#>  7 Flor… actor         82 TRUE     1934-02-14 00:00:00 2016-11-24 00:00:00
#>  8 Harp… author        89 FALSE    1926-04-28 00:00:00 2016-02-19 00:00:00
#>  9 Zsa … actor         99 TRUE     1917-02-06 00:00:00 2016-12-18 00:00:00
#> 10 Geor… musician      53 FALSE    1963-06-25 00:00:00 2016-12-25 00:00:00

# specify name repair as a built-in function
read_excel(readxl_example("clippy.xlsx"), .name_repair = toupper)
#> # A tibble: 4 × 2
#>   NAME                 VALUE    
#>   <chr>                <chr>    
#> 1 Name                 Clippy   
#> 2 Species              paperclip
#> 3 Approx date of death 39083    
#> 4 Weight in grams      0.9      

# specify name repair as a custom function
my_custom_name_repair <- function(nms) tolower(gsub("[.]", "_", nms))
read_excel(
  readxl_example("datasets.xlsx"),
  .name_repair = my_custom_name_repair
)
#> # A tibble: 150 × 5
#>    sepal_length sepal_width petal_length petal_width species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows

# specify name repair as an anonymous function
read_excel(
  readxl_example("datasets.xlsx"),
  sheet = "chickwts",
  .name_repair = ~ substr(.x, start = 1, stop = 3)
)
#> # A tibble: 71 × 2
#>      wei fee      
#>    <dbl> <chr>    
#>  1   179 horsebean
#>  2   160 horsebean
#>  3   136 horsebean
#>  4   227 horsebean
#>  5   217 horsebean
#>  6   168 horsebean
#>  7   108 horsebean
#>  8   124 horsebean
#>  9   143 horsebean
#> 10   140 horsebean
#> # ℹ 61 more rows

源代碼：R/read_excel.R

相關用法

注：本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Read xls and xlsx files。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。