R match 價值匹配 - 純淨天空

R語言 match 位於 base 包(package)。

說明

match 返回第一個參數在第二個參數中的(第一個)匹配位置的向量。

%in% 是一個更直觀的二元運算符接口，它返回一個邏輯向量，指示其左操作數是否匹配。

用法

match(x, table, nomatch = NA_integer_, incomparables = NULL)

x %in% table

參數

`x`	矢量或`NULL`：要匹配的值。支持Long vectors。
`table`	矢量或`NULL`：要匹配的值。不支持Long vectors。
`nomatch`	未找到匹配項時返回的值。請注意，它被強製為 `integer` 。
`incomparables`	無法匹配的值向量。 `x` 中與該向量中的值匹配的任何值都被分配為`nomatch` 值。由於曆史原因， `FALSE` 相當於 `NULL` 。

細節

%in%目前定義為
"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0

因子、原始向量和列表轉換為字符向量，內部分類對象通過以下方式轉換mtfrm，進而x和table被強製轉換為通用類型(中的兩種類型中的後者)R的排序，邏輯<整數<數字<複雜<字符)在匹配之前。如果incomparables具有正長度，它被強製為普通類型。

列表匹配可能非常慢，最好避免，除非是簡單的情況。

確切地說，什麽與什麽相匹配在某種程度上是一個定義問題。對於所有類型，NA 與 NA 匹配，且不匹配其他值。對於實數和複數值， NaN 值被視為匹配任何其他 NaN 值，但不匹配 NA ，其中對於複數 x ，實部和虛部必須兩者匹配(除非至少包含一個 NA)。

如果任何輸入被標記為 "bytes" ，則字符串將作為字節序列進行比較，否則，如果它們采用不同的編碼，但在轉換為 UTF-8 時會一致，則被視為相等(請參閱 Encoding )。

%in% 永遠不會返回 NA，這使得它在 if 條件下特別有用。

值

與 x 長度相同的向量。

match ：如果存在匹配，則給出第一個匹配項在 table 中的位置的整數向量，否則為 nomatch 。

如果發現 x[i] 等於 table[j] ，則返回值的第 i 位置返回的值是 j ，對於可能的最小 j 。如果未找到匹配項，則值為 nomatch 。

%in% ：邏輯向量，指示是否為 x 的每個元素找到匹配項：因此值為 TRUE 或 FALSE 而不是 NA 。

例子

## The intersection of two sets can be defined via match():
## Simple version:
## intersect <- function(x, y) y[match(x, y, nomatch = 0)]
intersect # the R function in base is slightly more careful
intersect(1:10, 7:20)

1:10 %in% c(1,3,5,9)
sstr <- c("c","ab","B","bba","c",NA,"@","bla","a","Ba","%")
sstr[sstr %in% c(letters, LETTERS)]

"%w/o%" <- function(x, y) x[!x %in% y] #--  x without y
(1:10) %w/o% c(3,7,12)
## Note that setdiff() is very similar and typically makes more sense:
        c(1:6,7:2) %w/o% c(3,7,12)  # -> keeps duplicates
setdiff(c(1:6,7:2),      c(3,7,12)) # -> unique values

## Illuminating example about NA matching
r <- c(1, NA, NaN)
zN <- c(complex(real = NA , imaginary =  r ), complex(real =  r , imaginary = NA ),
        complex(real =  r , imaginary = NaN), complex(real = NaN, imaginary =  r ))
zM <- cbind(Re=Re(zN), Im=Im(zN), match = match(zN, zN))
rownames(zM) <- format(zN)
zM ##--> many "NA's" (= 1) and the four non-NA's (3 different ones, at 7,9,10)

length(zN) # 12
unique(zN) # the "NA" and the 3 different non-NA NaN's
stopifnot(identical(unique(zN), zN[c(1, 7,9,10)]))

## very strict equality would have 4 duplicates (of 12):
symnum(outer(zN, zN, Vectorize(identical,c("x","y")),
                     FALSE,FALSE,FALSE,FALSE))
## removing "(very strictly) duplicates",
i <- c(5,8,11,12)  # we get 8 pairwise non-identicals :
Ixy <- outer(zN[-i], zN[-i], Vectorize(identical,c("x","y")),
                     FALSE,FALSE,FALSE,FALSE)
stopifnot(identical(Ixy, diag(8) == 1))

參考

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

也可以看看

pmatch 和 charmatch 用於(部分)字符串匹配， match.arg 等用於函數參數匹配。 findInterval 類似地返回位置向量，但查找區間內的數字，而不是精確匹配。

is.element 相當於 %in% 的 S-compatible 。

unique (和 duplicated )使用與 match() 相同的 “match” 或 “equality” 定義，並且這些定義不如 == 嚴格，例如，對於數字或數字中的 NA 和 NaN複雜的向量，或具有不同編碼的字符串，另請參見上文。

相關用法

注：本文由純淨天空篩選整理自R-devel大神的英文原創作品 Value Matching。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。