提取由未命名 (pattern)
和命名 (?<name>pattern)
捕獲組定義的任意數量的匹配項。
如果您需要覆蓋默認操作優先級但不想捕獲結果,請使用非捕獲組 (?:pattern)
。
參數
- string
-
輸入向量。或者是一個字符向量,或者是可強製轉換為一個的東西。
- pattern
-
與其他 stringr 函數不同,
str_match()
僅支持正則表達式,如vignette("regular-expressions")
中所述。該模式應至少包含一個捕獲組。
值
-
str_match()
:行數與string
/pattern
長度相同的字符矩陣。第一列是完整匹配,後麵是每個捕獲組的一列。如果您使用“命名捕獲組”,即(?<name>pattern')
,這些列將被命名。 -
str_match_all()
:與string
/pattern
長度相同的列表,包含字符矩陣。每個矩陣都有如上所述的列和每個匹配的一行。
也可以看看
str_extract()
用於提取完整匹配,stringi::stri_match()
用於底層實現。
例子
strings <- c(" 219 733 8965", "329-293-8753 ", "banana", "595 794 7569",
"387 287 6718", "apple", "233.398.9187 ", "482 952 3315",
"239 923 8115 and 842 566 4692", "Work: 579-499-7527", "$1000",
"Home: 543.355.3679")
phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"
str_extract(strings, phone)
#> [1] "219 733 8965" "329-293-8753" NA "595 794 7569"
#> [5] "387 287 6718" NA "233.398.9187" "482 952 3315"
#> [9] "239 923 8115" "579-499-7527" NA "543.355.3679"
str_match(strings, phone)
#> [,1] [,2] [,3] [,4]
#> [1,] "219 733 8965" "219" "733" "8965"
#> [2,] "329-293-8753" "329" "293" "8753"
#> [3,] NA NA NA NA
#> [4,] "595 794 7569" "595" "794" "7569"
#> [5,] "387 287 6718" "387" "287" "6718"
#> [6,] NA NA NA NA
#> [7,] "233.398.9187" "233" "398" "9187"
#> [8,] "482 952 3315" "482" "952" "3315"
#> [9,] "239 923 8115" "239" "923" "8115"
#> [10,] "579-499-7527" "579" "499" "7527"
#> [11,] NA NA NA NA
#> [12,] "543.355.3679" "543" "355" "3679"
# Extract/match all
str_extract_all(strings, phone)
#> [[1]]
#> [1] "219 733 8965"
#>
#> [[2]]
#> [1] "329-293-8753"
#>
#> [[3]]
#> character(0)
#>
#> [[4]]
#> [1] "595 794 7569"
#>
#> [[5]]
#> [1] "387 287 6718"
#>
#> [[6]]
#> character(0)
#>
#> [[7]]
#> [1] "233.398.9187"
#>
#> [[8]]
#> [1] "482 952 3315"
#>
#> [[9]]
#> [1] "239 923 8115" "842 566 4692"
#>
#> [[10]]
#> [1] "579-499-7527"
#>
#> [[11]]
#> character(0)
#>
#> [[12]]
#> [1] "543.355.3679"
#>
str_match_all(strings, phone)
#> [[1]]
#> [,1] [,2] [,3] [,4]
#> [1,] "219 733 8965" "219" "733" "8965"
#>
#> [[2]]
#> [,1] [,2] [,3] [,4]
#> [1,] "329-293-8753" "329" "293" "8753"
#>
#> [[3]]
#> [,1] [,2] [,3] [,4]
#>
#> [[4]]
#> [,1] [,2] [,3] [,4]
#> [1,] "595 794 7569" "595" "794" "7569"
#>
#> [[5]]
#> [,1] [,2] [,3] [,4]
#> [1,] "387 287 6718" "387" "287" "6718"
#>
#> [[6]]
#> [,1] [,2] [,3] [,4]
#>
#> [[7]]
#> [,1] [,2] [,3] [,4]
#> [1,] "233.398.9187" "233" "398" "9187"
#>
#> [[8]]
#> [,1] [,2] [,3] [,4]
#> [1,] "482 952 3315" "482" "952" "3315"
#>
#> [[9]]
#> [,1] [,2] [,3] [,4]
#> [1,] "239 923 8115" "239" "923" "8115"
#> [2,] "842 566 4692" "842" "566" "4692"
#>
#> [[10]]
#> [,1] [,2] [,3] [,4]
#> [1,] "579-499-7527" "579" "499" "7527"
#>
#> [[11]]
#> [,1] [,2] [,3] [,4]
#>
#> [[12]]
#> [,1] [,2] [,3] [,4]
#> [1,] "543.355.3679" "543" "355" "3679"
#>
# You can also name the groups to make further manipulation easier
phone <- "(?<area>[2-9][0-9]{2})[- .](?<phone>[0-9]{3}[- .][0-9]{4})"
str_match(strings, phone)
#> area phone
#> [1,] "219 733 8965" "219" "733 8965"
#> [2,] "329-293-8753" "329" "293-8753"
#> [3,] NA NA NA
#> [4,] "595 794 7569" "595" "794 7569"
#> [5,] "387 287 6718" "387" "287 6718"
#> [6,] NA NA NA
#> [7,] "233.398.9187" "233" "398.9187"
#> [8,] "482 952 3315" "482" "952 3315"
#> [9,] "239 923 8115" "239" "923 8115"
#> [10,] "579-499-7527" "579" "499-7527"
#> [11,] NA NA NA
#> [12,] "543.355.3679" "543" "355.3679"
x <- c("<a> <b>", "<a> <>", "<a>", "", NA)
str_match(x, "<(.*?)> <(.*?)>")
#> [,1] [,2] [,3]
#> [1,] "<a> <b>" "a" "b"
#> [2,] "<a> <>" "a" ""
#> [3,] NA NA NA
#> [4,] NA NA NA
#> [5,] NA NA NA
str_match_all(x, "<(.*?)>")
#> [[1]]
#> [,1] [,2]
#> [1,] "<a>" "a"
#> [2,] "<b>" "b"
#>
#> [[2]]
#> [,1] [,2]
#> [1,] "<a>" "a"
#> [2,] "<>" ""
#>
#> [[3]]
#> [,1] [,2]
#> [1,] "<a>" "a"
#>
#> [[4]]
#> [,1] [,2]
#>
#> [[5]]
#> [,1] [,2]
#> [1,] NA NA
#>
str_extract(x, "<.*?>")
#> [1] "<a>" "<a>" "<a>" NA NA
str_extract_all(x, "<.*?>")
#> [[1]]
#> [1] "<a>" "<b>"
#>
#> [[2]]
#> [1] "<a>" "<>"
#>
#> [[3]]
#> [1] "<a>"
#>
#> [[4]]
#> character(0)
#>
#> [[5]]
#> [1] NA
#>
相關用法
- R stringr str_which 查找匹配索引
- R stringr str_extract 提取完整的匹配項
- R stringr str_subset 查找匹配元素
- R stringr str_escape 轉義正則表達式元字符
- R stringr str_trim 刪除空格
- R stringr str_sub 使用子字符串的位置獲取和設置子字符串
- R stringr str_replace_na 把NA變成“NA”
- R stringr str_trunc 將字符串截斷至最大寬度
- R stringr str_like 以與 SQL 的 LIKE 運算符相同的方式檢測模式
- R stringr str_length 計算長度/寬度
- R stringr str_detect 檢測是否存在匹配
- R stringr str_count 計算匹配次數
- R stringr str_split 將字符串分成幾段
- R stringr str_unique 刪除重複的字符串
- R stringr str_remove 刪除匹配的模式
- R stringr str_pad 將字符串填充到最小寬度
- R stringr str_equal 判斷兩個字符串是否相等
- R stringr str_view 查看字符串和匹配項
- R stringr str_glue 用膠水插補
- R stringr str_conv 指定字符串的編碼
- R stringr str_order 對字符向量進行排序、排名或排序
- R stringr str_starts 檢測開始/結束時是否存在匹配
- R stringr str_c 將多個字符串連接成一個字符串
- R stringr str_wrap 將單詞包裝成格式良好的段落
- R stringr str_dup 複製字符串
注:本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Extract components (capturing groups) from a match。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。