str_extract()
从每个字符串中提取第一个完整匹配项,str_extract_all()
从每个字符串中提取所有匹配项。
参数
- string
-
输入向量。或者是一个字符向量,或者是可强制转换为一个的东西。
- pattern
-
要寻找的模式。
默认解释是正则表达式,如
vignette("regular-expressions")
中所述。使用regex()
可以更好地控制匹配行为。使用
fixed()
匹配固定字符串(即仅比较字节)。这很快,但是是近似值。一般来说,为了匹配人类文本,您需要coll()
,它尊重指定区域设置的字符匹配规则。将字符、单词、行和句子边界与
boundary()
匹配。空模式“”相当于boundary("character")
。 - group
-
如果提供,将返回指定捕获组中的匹配文本,而不是返回完整的匹配项。
- simplify
-
一个布尔值。
-
FALSE
(默认值):返回字符向量列表。 -
TRUE
:返回字符矩阵。
-
也可以看看
str_match()
提取匹配组; stringi::stri_extract()
用于底层实现。
例子
shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
str_extract(shopping_list, "\\d")
#> [1] "4" NA NA "2"
str_extract(shopping_list, "[a-z]+")
#> [1] "apples" "bag" "bag" "milk"
str_extract(shopping_list, "[a-z]{1,4}")
#> [1] "appl" "bag" "bag" "milk"
str_extract(shopping_list, "\\b[a-z]{1,4}\\b")
#> [1] NA "bag" "bag" "milk"
str_extract(shopping_list, "([a-z]+) of ([a-z]+)")
#> [1] NA "bag of flour" "bag of sugar" NA
str_extract(shopping_list, "([a-z]+) of ([a-z]+)", group = 1)
#> [1] NA "bag" "bag" NA
str_extract(shopping_list, "([a-z]+) of ([a-z]+)", group = 2)
#> [1] NA "flour" "sugar" NA
# Extract all matches
str_extract_all(shopping_list, "[a-z]+")
#> [[1]]
#> [1] "apples" "x"
#>
#> [[2]]
#> [1] "bag" "of" "flour"
#>
#> [[3]]
#> [1] "bag" "of" "sugar"
#>
#> [[4]]
#> [1] "milk" "x"
#>
str_extract_all(shopping_list, "\\b[a-z]+\\b")
#> [[1]]
#> [1] "apples"
#>
#> [[2]]
#> [1] "bag" "of" "flour"
#>
#> [[3]]
#> [1] "bag" "of" "sugar"
#>
#> [[4]]
#> [1] "milk"
#>
str_extract_all(shopping_list, "\\d")
#> [[1]]
#> [1] "4"
#>
#> [[2]]
#> character(0)
#>
#> [[3]]
#> character(0)
#>
#> [[4]]
#> [1] "2"
#>
# Simplify results into character matrix
str_extract_all(shopping_list, "\\b[a-z]+\\b", simplify = TRUE)
#> [,1] [,2] [,3]
#> [1,] "apples" "" ""
#> [2,] "bag" "of" "flour"
#> [3,] "bag" "of" "sugar"
#> [4,] "milk" "" ""
str_extract_all(shopping_list, "\\d", simplify = TRUE)
#> [,1]
#> [1,] "4"
#> [2,] ""
#> [3,] ""
#> [4,] "2"
# Extract all words
str_extract_all("This is, suprisingly, a sentence.", boundary("word"))
#> [[1]]
#> [1] "This" "is" "suprisingly" "a" "sentence"
#>
相关用法
- R stringr str_escape 转义正则表达式元字符
- R stringr str_equal 判断两个字符串是否相等
- R stringr str_which 查找匹配索引
- R stringr str_subset 查找匹配元素
- R stringr str_trim 删除空格
- R stringr str_sub 使用子字符串的位置获取和设置子字符串
- R stringr str_replace_na 把NA变成“NA”
- R stringr str_trunc 将字符串截断至最大宽度
- R stringr str_match 从匹配中提取组件(捕获组)
- R stringr str_like 以与 SQL 的 LIKE 运算符相同的方式检测模式
- R stringr str_length 计算长度/宽度
- R stringr str_detect 检测是否存在匹配
- R stringr str_count 计算匹配次数
- R stringr str_split 将字符串分成几段
- R stringr str_unique 删除重复的字符串
- R stringr str_remove 删除匹配的模式
- R stringr str_pad 将字符串填充到最小宽度
- R stringr str_view 查看字符串和匹配项
- R stringr str_glue 用胶水插补
- R stringr str_conv 指定字符串的编码
- R stringr str_order 对字符向量进行排序、排名或排序
- R stringr str_starts 检测开始/结束时是否存在匹配
- R stringr str_c 将多个字符串连接成一个字符串
- R stringr str_wrap 将单词包装成格式良好的段落
- R stringr str_dup 复制字符串
注:本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Extract the complete match。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。