R regmatches 提取或替換匹配的子字符串

R語言 regmatches 位於 base 包(package)。

說明

從 regexpr 、 gregexpr 、 regexec 或 gregexec 獲得的匹配數據中提取或替換匹配的子字符串。

用法

regmatches(x, m, invert = FALSE)
regmatches(x, m, invert = FALSE) <- value

參數

`x`	一個字符向量
`m`	具有匹配數據的對象
`invert`	邏輯：如果 `TRUE` ，則提取或替換不匹配的子字符串。
`value`	具有匹配或不匹配子字符串的合適替換值的對象(請參閱 `Details` )。

細節

如果invert 是FALSE(默認)，則regmatches 提取匹配數據指定的匹配子字符串。對於向量匹配數據(從 regexpr 獲得)，空匹配將被刪除；對於列表匹配數據，空匹配給出空組件(零長度字符向量)。

如果invert為TRUE，則regmatches提取未匹配的子串，即按照類似strsplit的匹配進行分割(對於向量匹配數據，最多進行一次分割)。

如果 invert 是 NA ，則 regmatches 提取不匹配和匹配的子字符串，始終以不匹配開始和結束(如果匹配分別發生在開頭或結尾，則為空)。

請注意，匹配數據可以通過對具有相同字符數的修改版本x進行正則表達式匹配來獲得。

替換函數可用於替換匹配或不匹配的子字符串。對於向量匹配數據，如果 invert 是 FALSE ，則 value 應該是長度為 m 中匹配元素數量的字符向量。否則，它應該是與 m 長度相同的字符向量列表，每個字符向量與所需的替換數量一樣長。替換將值強製為字符或列表，並根據需要慷慨地回收值。不允許缺少替換值。

值

對於 regmatches ，如果 m 是向量且 invert 是 FALSE ，則為具有匹配子字符串的字符向量。否則，包含匹配或/和不匹配子字符串的列表。

對於 regmatches<- ，更新後的字符向量。

例子

x <- c("A and B", "A, B and C", "A, B, C and D", "foobar")
pattern <- "[[:space:]]*(,|and)[[:space:]]"
## Match data from regexpr()
m <- regexpr(pattern, x)
regmatches(x, m)
regmatches(x, m, invert = TRUE)
## Match data from gregexpr()
m <- gregexpr(pattern, x)
regmatches(x, m)
regmatches(x, m, invert = TRUE)

## Consider
x <- "John (fishing, hunting), Paul (hiking, biking)"
## Suppose we want to split at the comma (plus spaces) between the
## persons, but not at the commas in the parenthesized hobby lists.
## One idea is to "blank out" the parenthesized parts to match the
## parts to be used for splitting, and extract the persons as the
## non-matched parts.
## First, match the parenthesized hobby lists.
m <- gregexpr("\\([^)]*\\)", x)
## Create blank strings with given numbers of characters.
blanks <- function(n) strrep(" ", n)
## Create a copy of x with the parenthesized parts blanked out.
s <- x
regmatches(s, m) <- Map(blanks, lapply(regmatches(s, m), nchar))
s
## Compute the positions of the split matches (note that we cannot call
## strsplit() on x with match data from s).
m <- gregexpr(", *", s)
## And finally extract the non-matched parts.
regmatches(x, m, invert = TRUE)

## regexec() and gregexec() return overlapping ranges because the
## first match is the full match.  This conflicts with regmatches()<-
## and regmatches(..., invert=TRUE).  We can work-around by dropping
## the first match.
drop_first <- function(x) {
    if(!anyNA(x) && all(x > 0)) {
        ml <- attr(x, 'match.length')
        if(is.matrix(x)) x <- x[-1,] else x <- x[-1]
        attr(x, 'match.length') <- if(is.matrix(ml)) ml[-1,] else ml[-1]
    }
    x
}
m <- gregexec("(\\w+) \\(((?:\\w+(?:, )?)+)\\)", x)
regmatches(x, m)
try(regmatches(x, m, invert=TRUE))
regmatches(x, lapply(m, drop_first))
## invert=TRUE loses matrix structure because we are retrieving what
## is in between every sub-match
regmatches(x, lapply(m, drop_first), invert=TRUE)
y <- z <- x
## Notice **list**(...) on the RHS
regmatches(y, lapply(m, drop_first)) <- list(c("<NAME>", "<HOBBY-LIST>"))
y
regmatches(z, lapply(m, drop_first), invert=TRUE) <-
    list(sprintf("<%d>", 1:5))
z

## With `perl = TRUE` and `invert = FALSE` capture group names
## are preserved.  Collect functions and arguments in calls:
NEWS <- head(readLines(file.path(R.home(), 'doc', 'NEWS.2')), 100)
m <- gregexec("(?<fun>\\w+)\\((?<args>[^)]*)\\)", NEWS, perl = TRUE)
y <- regmatches(NEWS, m)
y[[16]]
## Make tabular, adding original line numbers
mdat <- as.data.frame(t(do.call(cbind, y)))
mdat <- cbind(mdat, line=rep(seq_along(y), lengths(y) / ncol(mdat)))
head(mdat)
NEWS[head(mdat[['line']])]

相關用法

注：本文由純淨天空篩選整理自R-devel大神的英文原創作品 Extract or Replace Matched Substrings。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。