R tidyr separate 使用正則表達式或數字位置將字符列分成多列

separate() 已被 separate_wider_position() 和 separate_wider_delim() 取代，因為這兩個函數使兩者的用途更加明顯，API 更加完善，問題處理也更好。被取代的函數不會消失，但隻會收到關鍵的錯誤修複。

給定正則表達式或字符位置向量，separate() 將單個字符列轉換為多個列。

用法

separate(
  data,
  col,
  into,
  sep = "[^[:alnum:]]+",
  remove = TRUE,
  convert = FALSE,
  extra = "warn",
  fill = "warn",
  ...
)

參數

data

一個 DataFrame 。

col

< tidy-select > 要展開的列。

into

要創建為字符向量的新變量的名稱。使用 NA 省略輸出中的變量。

sep

列之間的分隔符。

如果是字符，sep 被解釋為正則表達式。默認值是匹配任何非字母數字值序列的正則表達式。

如果是數字，sep 被解釋為要分割的字符位置。正值從字符串最左邊的 1 開始；負值從字符串最右側的 -1 開始。 sep 的長度應比 into 少 1。

remove

如果 TRUE ，從輸出數據幀中刪除輸入列。

convert

如果 TRUE ，將在新列上運行 type.convert() 和 as.is = TRUE。如果組件列是整數、數字或邏輯，這非常有用。

注意：這將導致字符串 "NA" 轉換為 NA 。

extra

如果 sep 是一個字符向量，則它控製當片段太多時會發生什麽。共有三個有效選項：

"warn"(默認值)：發出警告並刪除額外的值。
"drop" ：刪除任何額外的值而不發出警告。
"merge" ：最多僅分裂length(into)次

fill

如果 sep 是字符向量，則它控製當沒有足夠的塊時會發生什麽。共有三個有效選項：

"warn"(默認)：發出警告並從右側填充
"right" ：在右側填充缺失值
"left" ：填充左側缺失值

...

傳遞給方法的附加參數。

也可以看看

unite() ，補碼，extract() 使用正則表達式捕獲組。

例子

# If you want to split by any non-alphanumeric value (the default):
df <- tibble(x = c(NA, "x.y", "x.z", "y.z"))
df %>% separate(x, c("A", "B"))
#> # A tibble: 4 × 2
#>   A     B    
#>   <chr> <chr>
#> 1 NA    NA   
#> 2 x     y    
#> 3 x     z    
#> 4 y     z    

# If you just want the second variable:
df %>% separate(x, c(NA, "B"))
#> # A tibble: 4 × 1
#>   B    
#>   <chr>
#> 1 NA   
#> 2 y    
#> 3 z    
#> 4 z    

# We now recommend separate_wider_delim() instead:
df %>% separate_wider_delim(x, ".", names = c("A", "B"))
#> # A tibble: 4 × 2
#>   A     B    
#>   <chr> <chr>
#> 1 NA    NA   
#> 2 x     y    
#> 3 x     z    
#> 4 y     z    
df %>% separate_wider_delim(x, ".", names = c(NA, "B"))
#> # A tibble: 4 × 1
#>   B    
#>   <chr>
#> 1 NA   
#> 2 y    
#> 3 z    
#> 4 z    

# Controlling uneven splits -------------------------------------------------
# If every row doesn't split into the same number of pieces, use
# the extra and fill arguments to control what happens:
df <- tibble(x = c("x", "x y", "x y z", NA))
df %>% separate(x, c("a", "b"))
#> Warning: Expected 2 pieces. Additional pieces discarded in 1 rows [3].
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1].
#> # A tibble: 4 × 2
#>   a     b    
#>   <chr> <chr>
#> 1 x     NA   
#> 2 x     y    
#> 3 x     y    
#> 4 NA    NA   
# The same behaviour as previous, but drops the c without warnings:
df %>% separate(x, c("a", "b"), extra = "drop", fill = "right")
#> # A tibble: 4 × 2
#>   a     b    
#>   <chr> <chr>
#> 1 x     NA   
#> 2 x     y    
#> 3 x     y    
#> 4 NA    NA   
# Opposite of previous, keeping the c and filling left:
df %>% separate(x, c("a", "b"), extra = "merge", fill = "left")
#> # A tibble: 4 × 2
#>   a     b    
#>   <chr> <chr>
#> 1 NA    x    
#> 2 x     y    
#> 3 x     y z  
#> 4 NA    NA   
# Or you can keep all three:
df %>% separate(x, c("a", "b", "c"))
#> Warning: Expected 3 pieces. Missing pieces filled with `NA` in 2 rows [1, 2].
#> # A tibble: 4 × 3
#>   a     b     c    
#>   <chr> <chr> <chr>
#> 1 x     NA    NA   
#> 2 x     y     NA   
#> 3 x     y     z    
#> 4 NA    NA    NA   

# To only split a specified number of times use extra = "merge":
df <- tibble(x = c("x: 123", "y: error: 7"))
df %>% separate(x, c("key", "value"), ": ", extra = "merge")
#> # A tibble: 2 × 2
#>   key   value   
#>   <chr> <chr>   
#> 1 x     123     
#> 2 y     error: 7

# Controlling column types --------------------------------------------------
# convert = TRUE detects column classes:
df <- tibble(x = c("x:1", "x:2", "y:4", "z", NA))
df %>% separate(x, c("key", "value"), ":") %>% str()
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [4].
#> tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
#>  $ key  : chr [1:5] "x" "x" "y" "z" ...
#>  $ value: chr [1:5] "1" "2" "4" NA ...
df %>% separate(x, c("key", "value"), ":", convert = TRUE) %>% str()
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [4].
#> tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
#>  $ key  : chr [1:5] "x" "x" "y" "z" ...
#>  $ value: int [1:5] 1 2 4 NA NA

源代碼：R/separate.R

相關用法

注：本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Separate a character column into multiple columns with a regular expression or numeric locations。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。