當前位置: 首頁>>代碼示例 >>用法及示例精選 >>正文


R stringr str_match 從匹配中提取組件(捕獲組)


提取由未命名 (pattern) 和命名 (?<name>pattern) 捕獲組定義的任意數量的匹配項。

如果您需要覆蓋默認操作優先級但不想捕獲結果,請使用非捕獲組 (?:pattern)

用法

str_match(string, pattern)

str_match_all(string, pattern)

參數

string

輸入向量。或者是一個字符向量,或者是可強製轉換為一個的東西。

pattern

與其他 stringr 函數不同,str_match() 僅支持正則表達式,如 vignette("regular-expressions") 中所述。該模式應至少包含一個捕獲組。

  • str_match() :行數與 string /pattern 長度相同的字符矩陣。第一列是完整匹配,後麵是每個捕獲組的一列。如果您使用“命名捕獲組”,即 (?<name>pattern') ,這些列將被命名。

  • str_match_all() :與string /pattern 長度相同的列表,包含字符矩陣。每個矩陣都有如上所述的列和每個匹配的一行。

也可以看看

str_extract()用於提取完整匹配,stringi::stri_match()用於底層實現。

例子

strings <- c(" 219 733 8965", "329-293-8753 ", "banana", "595 794 7569",
  "387 287 6718", "apple", "233.398.9187  ", "482 952 3315",
  "239 923 8115 and 842 566 4692", "Work: 579-499-7527", "$1000",
  "Home: 543.355.3679")
phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"

str_extract(strings, phone)
#>  [1] "219 733 8965" "329-293-8753" NA             "595 794 7569"
#>  [5] "387 287 6718" NA             "233.398.9187" "482 952 3315"
#>  [9] "239 923 8115" "579-499-7527" NA             "543.355.3679"
str_match(strings, phone)
#>       [,1]           [,2]  [,3]  [,4]  
#>  [1,] "219 733 8965" "219" "733" "8965"
#>  [2,] "329-293-8753" "329" "293" "8753"
#>  [3,] NA             NA    NA    NA    
#>  [4,] "595 794 7569" "595" "794" "7569"
#>  [5,] "387 287 6718" "387" "287" "6718"
#>  [6,] NA             NA    NA    NA    
#>  [7,] "233.398.9187" "233" "398" "9187"
#>  [8,] "482 952 3315" "482" "952" "3315"
#>  [9,] "239 923 8115" "239" "923" "8115"
#> [10,] "579-499-7527" "579" "499" "7527"
#> [11,] NA             NA    NA    NA    
#> [12,] "543.355.3679" "543" "355" "3679"

# Extract/match all
str_extract_all(strings, phone)
#> [[1]]
#> [1] "219 733 8965"
#> 
#> [[2]]
#> [1] "329-293-8753"
#> 
#> [[3]]
#> character(0)
#> 
#> [[4]]
#> [1] "595 794 7569"
#> 
#> [[5]]
#> [1] "387 287 6718"
#> 
#> [[6]]
#> character(0)
#> 
#> [[7]]
#> [1] "233.398.9187"
#> 
#> [[8]]
#> [1] "482 952 3315"
#> 
#> [[9]]
#> [1] "239 923 8115" "842 566 4692"
#> 
#> [[10]]
#> [1] "579-499-7527"
#> 
#> [[11]]
#> character(0)
#> 
#> [[12]]
#> [1] "543.355.3679"
#> 
str_match_all(strings, phone)
#> [[1]]
#>      [,1]           [,2]  [,3]  [,4]  
#> [1,] "219 733 8965" "219" "733" "8965"
#> 
#> [[2]]
#>      [,1]           [,2]  [,3]  [,4]  
#> [1,] "329-293-8753" "329" "293" "8753"
#> 
#> [[3]]
#>      [,1] [,2] [,3] [,4]
#> 
#> [[4]]
#>      [,1]           [,2]  [,3]  [,4]  
#> [1,] "595 794 7569" "595" "794" "7569"
#> 
#> [[5]]
#>      [,1]           [,2]  [,3]  [,4]  
#> [1,] "387 287 6718" "387" "287" "6718"
#> 
#> [[6]]
#>      [,1] [,2] [,3] [,4]
#> 
#> [[7]]
#>      [,1]           [,2]  [,3]  [,4]  
#> [1,] "233.398.9187" "233" "398" "9187"
#> 
#> [[8]]
#>      [,1]           [,2]  [,3]  [,4]  
#> [1,] "482 952 3315" "482" "952" "3315"
#> 
#> [[9]]
#>      [,1]           [,2]  [,3]  [,4]  
#> [1,] "239 923 8115" "239" "923" "8115"
#> [2,] "842 566 4692" "842" "566" "4692"
#> 
#> [[10]]
#>      [,1]           [,2]  [,3]  [,4]  
#> [1,] "579-499-7527" "579" "499" "7527"
#> 
#> [[11]]
#>      [,1] [,2] [,3] [,4]
#> 
#> [[12]]
#>      [,1]           [,2]  [,3]  [,4]  
#> [1,] "543.355.3679" "543" "355" "3679"
#> 

# You can also name the groups to make further manipulation easier
phone <- "(?<area>[2-9][0-9]{2})[- .](?<phone>[0-9]{3}[- .][0-9]{4})"
str_match(strings, phone)
#>                      area  phone     
#>  [1,] "219 733 8965" "219" "733 8965"
#>  [2,] "329-293-8753" "329" "293-8753"
#>  [3,] NA             NA    NA        
#>  [4,] "595 794 7569" "595" "794 7569"
#>  [5,] "387 287 6718" "387" "287 6718"
#>  [6,] NA             NA    NA        
#>  [7,] "233.398.9187" "233" "398.9187"
#>  [8,] "482 952 3315" "482" "952 3315"
#>  [9,] "239 923 8115" "239" "923 8115"
#> [10,] "579-499-7527" "579" "499-7527"
#> [11,] NA             NA    NA        
#> [12,] "543.355.3679" "543" "355.3679"

x <- c("<a> <b>", "<a> <>", "<a>", "", NA)
str_match(x, "<(.*?)> <(.*?)>")
#>      [,1]      [,2] [,3]
#> [1,] "<a> <b>" "a"  "b" 
#> [2,] "<a> <>"  "a"  ""  
#> [3,] NA        NA   NA  
#> [4,] NA        NA   NA  
#> [5,] NA        NA   NA  
str_match_all(x, "<(.*?)>")
#> [[1]]
#>      [,1]  [,2]
#> [1,] "<a>" "a" 
#> [2,] "<b>" "b" 
#> 
#> [[2]]
#>      [,1]  [,2]
#> [1,] "<a>" "a" 
#> [2,] "<>"  ""  
#> 
#> [[3]]
#>      [,1]  [,2]
#> [1,] "<a>" "a" 
#> 
#> [[4]]
#>      [,1] [,2]
#> 
#> [[5]]
#>      [,1] [,2]
#> [1,] NA   NA  
#> 

str_extract(x, "<.*?>")
#> [1] "<a>" "<a>" "<a>" NA    NA   
str_extract_all(x, "<.*?>")
#> [[1]]
#> [1] "<a>" "<b>"
#> 
#> [[2]]
#> [1] "<a>" "<>" 
#> 
#> [[3]]
#> [1] "<a>"
#> 
#> [[4]]
#> character(0)
#> 
#> [[5]]
#> [1] NA
#> 
源代碼:R/match.R

相關用法


注:本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Extract components (capturing groups) from a match。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。