str_extract()
從每個字符串中提取第一個完整匹配項,str_extract_all()
從每個字符串中提取所有匹配項。
參數
- string
-
輸入向量。或者是一個字符向量,或者是可強製轉換為一個的東西。
- pattern
-
要尋找的模式。
默認解釋是正則表達式,如
vignette("regular-expressions")
中所述。使用regex()
可以更好地控製匹配行為。使用
fixed()
匹配固定字符串(即僅比較字節)。這很快,但是是近似值。一般來說,為了匹配人類文本,您需要coll()
,它尊重指定區域設置的字符匹配規則。將字符、單詞、行和句子邊界與
boundary()
匹配。空模式“”相當於boundary("character")
。 - group
-
如果提供,將返回指定捕獲組中的匹配文本,而不是返回完整的匹配項。
- simplify
-
一個布爾值。
-
FALSE
(默認值):返回字符向量列表。 -
TRUE
:返回字符矩陣。
-
也可以看看
str_match()
提取匹配組; stringi::stri_extract()
用於底層實現。
例子
shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
str_extract(shopping_list, "\\d")
#> [1] "4" NA NA "2"
str_extract(shopping_list, "[a-z]+")
#> [1] "apples" "bag" "bag" "milk"
str_extract(shopping_list, "[a-z]{1,4}")
#> [1] "appl" "bag" "bag" "milk"
str_extract(shopping_list, "\\b[a-z]{1,4}\\b")
#> [1] NA "bag" "bag" "milk"
str_extract(shopping_list, "([a-z]+) of ([a-z]+)")
#> [1] NA "bag of flour" "bag of sugar" NA
str_extract(shopping_list, "([a-z]+) of ([a-z]+)", group = 1)
#> [1] NA "bag" "bag" NA
str_extract(shopping_list, "([a-z]+) of ([a-z]+)", group = 2)
#> [1] NA "flour" "sugar" NA
# Extract all matches
str_extract_all(shopping_list, "[a-z]+")
#> [[1]]
#> [1] "apples" "x"
#>
#> [[2]]
#> [1] "bag" "of" "flour"
#>
#> [[3]]
#> [1] "bag" "of" "sugar"
#>
#> [[4]]
#> [1] "milk" "x"
#>
str_extract_all(shopping_list, "\\b[a-z]+\\b")
#> [[1]]
#> [1] "apples"
#>
#> [[2]]
#> [1] "bag" "of" "flour"
#>
#> [[3]]
#> [1] "bag" "of" "sugar"
#>
#> [[4]]
#> [1] "milk"
#>
str_extract_all(shopping_list, "\\d")
#> [[1]]
#> [1] "4"
#>
#> [[2]]
#> character(0)
#>
#> [[3]]
#> character(0)
#>
#> [[4]]
#> [1] "2"
#>
# Simplify results into character matrix
str_extract_all(shopping_list, "\\b[a-z]+\\b", simplify = TRUE)
#> [,1] [,2] [,3]
#> [1,] "apples" "" ""
#> [2,] "bag" "of" "flour"
#> [3,] "bag" "of" "sugar"
#> [4,] "milk" "" ""
str_extract_all(shopping_list, "\\d", simplify = TRUE)
#> [,1]
#> [1,] "4"
#> [2,] ""
#> [3,] ""
#> [4,] "2"
# Extract all words
str_extract_all("This is, suprisingly, a sentence.", boundary("word"))
#> [[1]]
#> [1] "This" "is" "suprisingly" "a" "sentence"
#>
相關用法
- R stringr str_escape 轉義正則表達式元字符
- R stringr str_equal 判斷兩個字符串是否相等
- R stringr str_which 查找匹配索引
- R stringr str_subset 查找匹配元素
- R stringr str_trim 刪除空格
- R stringr str_sub 使用子字符串的位置獲取和設置子字符串
- R stringr str_replace_na 把NA變成“NA”
- R stringr str_trunc 將字符串截斷至最大寬度
- R stringr str_match 從匹配中提取組件(捕獲組)
- R stringr str_like 以與 SQL 的 LIKE 運算符相同的方式檢測模式
- R stringr str_length 計算長度/寬度
- R stringr str_detect 檢測是否存在匹配
- R stringr str_count 計算匹配次數
- R stringr str_split 將字符串分成幾段
- R stringr str_unique 刪除重複的字符串
- R stringr str_remove 刪除匹配的模式
- R stringr str_pad 將字符串填充到最小寬度
- R stringr str_view 查看字符串和匹配項
- R stringr str_glue 用膠水插補
- R stringr str_conv 指定字符串的編碼
- R stringr str_order 對字符向量進行排序、排名或排序
- R stringr str_starts 檢測開始/結束時是否存在匹配
- R stringr str_c 將多個字符串連接成一個字符串
- R stringr str_wrap 將單詞包裝成格式良好的段落
- R stringr str_dup 複製字符串
注:本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Extract the complete match。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。