R dtplyr separate.dtplyr_step 使用正则表达式或数字位置将字符列分成多列

这是tidyr::separate() 泛型的方法。它在 [.data.table 的 j 参数中转换为 data.table::tstrsplit() 。

用法

# S3 method for dtplyr_step
separate(
  data,
  col,
  into,
  sep = "[^[:alnum:]]+",
  remove = TRUE,
  convert = FALSE,
  ...
)

参数

data

一个lazy_dt()。

col

列名称或位置。

该参数通过表达式传递并支持准引号(您可以取消引用列名称或列位置)。

into

要创建为字符向量的新变量的名称。使用 NA 省略输出中的变量。

sep

列之间的分隔符。默认值是匹配任何非字母数字值序列的正则表达式。

remove

如果为 TRUE，则从输出 DataFrame 中删除输入列。

convert

如果为 TRUE，将在新列上运行 type.convert()，其中 as.is = TRUE。如果组件列是整数、数字或逻辑，这非常有用。

注意：这将导致字符串 "NA"s 转换为 NA。

...

传递给方法的参数

例子

library(tidyr)
# If you want to split by any non-alphanumeric value (the default):
df <- lazy_dt(data.frame(x = c(NA, "x.y", "x.z", "y.z")), "DT")
df %>% separate(x, c("A", "B"))
#> Source: local data table [4 x 2]
#> Call:   copy(DT)[, `:=`(c("A", "B"), tstrsplit(x, split = "[^[:alnum:]]+"))][, 
#>     `:=`("x", NULL)]
#> 
#>   A     B    
#>   <chr> <chr>
#> 1 NA    NA   
#> 2 x     y    
#> 3 x     z    
#> 4 y     z    
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

# If you just want the second variable:
df %>% separate(x, c(NA, "B"))
#> Source: local data table [4 x 1]
#> Call:   copy(DT)[, `:=`("B", tstrsplit(x, split = "[^[:alnum:]]+", keep = 2L))][, 
#>     `:=`("x", NULL)]
#> 
#>   B    
#>   <chr>
#> 1 NA   
#> 2 y    
#> 3 z    
#> 4 z    
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

# Use regular expressions to separate on multiple characters:
df <- lazy_dt(data.frame(x = c(NA, "x?y", "x.z", "y:z")), "DT")
df %>% separate(x, c("A","B"), sep = "([.?:])")
#> Source: local data table [4 x 2]
#> Call:   copy(DT)[, `:=`(c("A", "B"), tstrsplit(x, split = "([.?:])"))][, 
#>     `:=`("x", NULL)]
#> 
#>   A     B    
#>   <chr> <chr>
#> 1 NA    NA   
#> 2 x     y    
#> 3 x     z    
#> 4 y     z    
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

# convert = TRUE detects column classes:
df <- lazy_dt(data.frame(x = c("x:1", "x:2", "y:4", "z", NA)), "DT")
df %>% separate(x, c("key","value"), ":") %>% str
#> List of 12
#>  $ parent         :List of 12
#>   ..$ parent         :List of 8
#>   .. ..$ parent       :Classes ‘data.table’ and 'data.frame':	5 obs. of  1 variable:
#>   .. .. ..$ x: chr [1:5] "x:1" "x:2" "y:4" "z" ...
#>   .. .. ..- attr(*, ".internal.selfref")=<externalptr> 
#>   .. ..$ vars         : chr "x"
#>   .. ..$ groups       : chr(0) 
#>   .. ..$ locals       : list()
#>   .. ..$ implicit_copy: logi FALSE
#>   .. ..$ needs_copy   : logi FALSE
#>   .. ..$ env          :<environment: 0x55df0fded700> 
#>   .. ..$ name         : symbol DT
#>   .. ..- attr(*, "class")= chr [1:2] "dtplyr_step_first" "dtplyr_step"
#>   ..$ vars           : chr [1:3] "x" "key" "value"
#>   ..$ groups         : chr(0) 
#>   ..$ locals         : list()
#>   ..$ implicit_copy  : logi TRUE
#>   ..$ needs_copy     : logi TRUE
#>   ..$ env            :<environment: 0x55df0fded700> 
#>   ..$ arrange        : NULL
#>   ..$ i              : NULL
#>   ..$ j              : language `:=`(c("key", "value"), tstrsplit(x, split = ":"))
#>   ..$ on             : chr(0) 
#>   ..$ allow_cartesian: NULL
#>   ..- attr(*, "class")= chr [1:2] "dtplyr_step_subset" "dtplyr_step"
#>  $ vars           : chr [1:2] "key" "value"
#>  $ groups         : chr(0) 
#>  $ locals         : list()
#>  $ implicit_copy  : logi TRUE
#>  $ needs_copy     : logi TRUE
#>  $ env            :<environment: 0x55df0fded700> 
#>  $ arrange        : NULL
#>  $ i              : NULL
#>  $ j              : language `:=`("x", NULL)
#>  $ on             : chr(0) 
#>  $ allow_cartesian: NULL
#>  - attr(*, "class")= chr [1:2] "dtplyr_step_subset" "dtplyr_step"
df %>% separate(x, c("key","value"), ":", convert = TRUE) %>% str
#> List of 12
#>  $ parent         :List of 12
#>   ..$ parent         :List of 8
#>   .. ..$ parent       :Classes ‘data.table’ and 'data.frame':	5 obs. of  1 variable:
#>   .. .. ..$ x: chr [1:5] "x:1" "x:2" "y:4" "z" ...
#>   .. .. ..- attr(*, ".internal.selfref")=<externalptr> 
#>   .. ..$ vars         : chr "x"
#>   .. ..$ groups       : chr(0) 
#>   .. ..$ locals       : list()
#>   .. ..$ implicit_copy: logi FALSE
#>   .. ..$ needs_copy   : logi FALSE
#>   .. ..$ env          :<environment: 0x55df0fded700> 
#>   .. ..$ name         : symbol DT
#>   .. ..- attr(*, "class")= chr [1:2] "dtplyr_step_first" "dtplyr_step"
#>   ..$ vars           : chr [1:3] "x" "key" "value"
#>   ..$ groups         : chr(0) 
#>   ..$ locals         : list()
#>   ..$ implicit_copy  : logi TRUE
#>   ..$ needs_copy     : logi TRUE
#>   ..$ env            :<environment: 0x55df0fded700> 
#>   ..$ arrange        : NULL
#>   ..$ i              : NULL
#>   ..$ j              : language `:=`(c("key", "value"), tstrsplit(x, split = ":", type.convert = TRUE))
#>   ..$ on             : chr(0) 
#>   ..$ allow_cartesian: NULL
#>   ..- attr(*, "class")= chr [1:2] "dtplyr_step_subset" "dtplyr_step"
#>  $ vars           : chr [1:2] "key" "value"
#>  $ groups         : chr(0) 
#>  $ locals         : list()
#>  $ implicit_copy  : logi TRUE
#>  $ needs_copy     : logi TRUE
#>  $ env            :<environment: 0x55df0fded700> 
#>  $ arrange        : NULL
#>  $ i              : NULL
#>  $ j              : language `:=`("x", NULL)
#>  $ on             : chr(0) 
#>  $ allow_cartesian: NULL
#>  - attr(*, "class")= chr [1:2] "dtplyr_step_subset" "dtplyr_step"

源代码：R/step-subset-separate.R

相关用法

注：本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Separate a character column into multiple columns with a regular expression or numeric locations。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。