R stringr modifiers 使用修飾符函數控製匹配行為

修飾符函數控製 stringr 函數的 pattern 參數的含義：

boundary()：匹配事物之間的邊界。
coll() ：使用標準 Unicode 排序規則比較字符串。
fixed() ：比較文字字節。
regex()(默認值)：使用 ICU 正則表達式。

用法

fixed(pattern, ignore_case = FALSE)

coll(pattern, ignore_case = FALSE, locale = "en", ...)

regex(
  pattern,
  ignore_case = FALSE,
  multiline = FALSE,
  comments = FALSE,
  dotall = FALSE,
  ...
)

boundary(
  type = c("character", "line_break", "sentence", "word"),
  skip_word_none = NA,
  ...
)

參數

pattern

修改行為的模式。

ignore_case

比賽中是否應該忽略大小寫差異？對於 fixed() ，這使用了一種簡單的算法，該算法假設大寫字母和小寫字母之間存在一對一的映射。

locale

用於比較的區域設置。有關所有可能的選項，請參閱stringi::stri_locale_list()。默認為 "en"(英語)，以確保默認行為在不同平台上保持一致。

...

其他不太常用的參數傳遞給 stringi::stri_opts_collator() 、 stringi::stri_opts_regex() 或 stringi::stri_opts_brkiter()

multiline

如果 TRUE 、 $ 和 ^ 匹配每行的開頭和結尾。如果是 FALSE ，默認情況下，僅匹配輸入的開始和結束。

comments

如果 TRUE ，則忽略以 # 開頭的空格和注釋。使用 \\ 轉義文字空格。

dotall

如果 TRUE ，. 也將匹配行終止符。

type

要檢測的邊界類型。

character: 每個字符都是一個邊界。
line_break: 邊界是當前語言環境中可以接受換行的地方。
sentence: 句子的開頭和結尾是邊界，使用智能規則來避免計算縮寫(details)。
word: 單詞的開頭和結尾是邊界。

skip_word_none

忽略不包含任何字符或數字(即標點符號)的"words"。默認情況下NA僅在word邊界上分割時才會跳過此類"words"。

值

stringr 修飾符對象，即具有父 S3 類 stringr_pattern 的字符向量。

例子

pattern <- "a.b"
strings <- c("abb", "a.b")
str_detect(strings, pattern)
#> [1] TRUE TRUE
str_detect(strings, fixed(pattern))
#> [1] FALSE  TRUE
str_detect(strings, coll(pattern))
#> [1] FALSE  TRUE

# coll() is useful for locale-aware case-insensitive matching
i <- c("I", "\u0130", "i")
i
#> [1] "I" "İ" "i"
str_detect(i, fixed("i", TRUE))
#> [1]  TRUE FALSE  TRUE
str_detect(i, coll("i", TRUE))
#> [1]  TRUE FALSE  TRUE
str_detect(i, coll("i", TRUE, locale = "tr"))
#> [1] FALSE  TRUE  TRUE

# Word boundaries
words <- c("These are   some words.")
str_count(words, boundary("word"))
#> [1] 4
str_split(words, " ")[[1]]
#> [1] "These"  "are"    ""       ""       "some"   "words."
str_split(words, boundary("word"))[[1]]
#> [1] "These" "are"   "some"  "words"

# Regular expression variations
str_extract_all("The Cat in the Hat", "[a-z]+")
#> [[1]]
#> [1] "he"  "at"  "in"  "the" "at" 
#> 
str_extract_all("The Cat in the Hat", regex("[a-z]+", TRUE))
#> [[1]]
#> [1] "The" "Cat" "in"  "the" "Hat"
#> 

str_extract_all("a\nb\nc", "^.")
#> [[1]]
#> [1] "a"
#> 
str_extract_all("a\nb\nc", regex("^.", multiline = TRUE))
#> [[1]]
#> [1] "a" "b" "c"
#> 

str_extract_all("a\nb\nc", "a.")
#> [[1]]
#> character(0)
#> 
str_extract_all("a\nb\nc", regex("a.", dotall = TRUE))
#> [[1]]
#> [1] "a\n"
#>

源代碼：R/modifiers.R

相關用法

注：本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Control matching behaviour with modifier functions。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。