R ks.test 柯尔莫哥洛夫-斯米尔诺夫检验

R语言 ks.test 位于 stats 包(package)。

说明

执行一个或两个样本Kolmogorov-Smirnov 测试。

用法

ks.test(x, ...)
## Default S3 method:
ks.test(x, y, ...,
        alternative = c("two.sided", "less", "greater"),
        exact = NULL, simulate.p.value = FALSE, B = 2000)
## S3 method for class 'formula'
ks.test(formula, data, subset, na.action, ...)

参数

`x`	数据值的数值向量。
`y`	数据值的数值向量，或者命名累积分布函数的字符串或实际的累积分布函数，例如 `pnorm` 。只有连续 CDF 才有效。
`...`	对于默认方法，由 `y` 指定的分布参数(作为字符串)。否则，将进一步向方法传递参数或从方法传递更多参数。
`alternative`	表示备择假设，并且必须是 `"two.sided"` (默认)、`"less"` 或 `"greater"` 之一。您可以仅指定值的首字母，但必须给出完整的参数名称。有关可能值的含义，请参阅“详细信息”。
`exact`	`NULL` 或指示是否应计算精确 p 值的逻辑。有关 `NULL` 的含义，请参阅“详细信息”。
`simulate.p.value`	指示是否通过蒙特卡罗模拟计算 p 值的逻辑。 (单样本测试忽略。)
`B`	一个整数，指定蒙特卡罗测试中使用的重复次数。
`formula`	`lhs ~ rhs` 形式的公式，其中 `lhs` 是给出数据值的数值变量，而 `rhs` 可以是用于单样本测试的 `1`，也可以是具有两个水平的因子，用于给出双样本测试的相应组。
`data`	包含公式 `formula` 中的变量的可选矩阵或 DataFrame (或类似的：请参阅 `model.frame` )。默认情况下，变量取自`environment(formula)`。
`subset`	一个可选向量，指定要使用的观测子集。
`na.action`	一个函数，指示当数据包含 `NA` 时应该发生什么。默认为 `getOption("na.action")` 。

细节

如果 y 为数值，则执行原假设的双样本 (Smirnov) 检验，即 x 和 y 来自同一分布。

或者，y可以是命名连续(累积)分布函数或此类函数的字符串。在这种情况下，将生成x 的分布函数作为分布y(其参数由... 指定)进行零样本(柯尔莫哥洛夫)检验。在单样本情况下，联系的存在总是会产生警告，因为连续分布不会产生联系。如果因舍入而产生平局，则测试可能大致有效，但即使是适度的舍入也会对计算的统计数据产生重大影响。

x 和(在两个样本的情况下)y 中默默地省略了缺失值。

alternative 的可能值 "two.sided" 、 "less" 和 "greater" 指定原假设，即 x 的真实累积分布函数 (CDF) 等于、不小于或不大于假设的 CDF (分别是单样本情况)或 y(两样本情况)的 CDF。该测试比较 CDF，以最大差异作为测试统计量，"greater" 替代方案中的统计量为 D^+ = \max_u [ F_x(u) - F_y(u) ] 。因此，在两个样本的情况下，alternative = "greater" 包括 x 随机小于 y 的分布(x 的 CDF 位于 y 的 CDF 上方，因此位于 y 的左侧)，与 t.test 或wilcox.test 。

对于存在联系的单样本情况，无法获得精确的 p 值。如果是exact = NULL(默认值)，如果在单样本情况下样本大小小于 100 并且没有平局，并且如果在单样本情况下样本大小的乘积小于 10000，则计算精确的 p 值。两个样本的情况，有或没有联系(使用 Schröer 和 Trenkler，1995 中说明的算法)。否则，如果 simulate.p.value 是 TRUE ，则在两个样本情况下通过蒙特卡罗模拟计算 p 值，否则使用渐近分布，其近似值在小样本中可能不准确。在单样本两侧的情况下，按照 Marsaglia、Tsang & Wang (2003) 中的说明获得精确的 p 值(但不使用右尾中的可选近似值，因此对于较小的 p 值，这可能会很慢) 。 Birnbaum & Tingey (1951) 的公式用于单样本单面情况。

如果使用单样本检验，则 ... 中指定的参数必须预先指定，而不是根据数据估计。对于带有估计参数的 KS 检验有一些更完善的分布理论(参见 Durbin，1973)，但在 ks.test 中并未实现。

值

继承自类 "ks.test" 和 "htest" 的列表，包含以下组件：

`statistic`	检验统计量的值。
`p.value`	检验的 p 值。
`alternative`	说明备择假设的字符串。
`method`	指示执行什么类型的测试的字符串。
`data.name`	给出数据名称的字符串。

例子

require("graphics")

x <- rnorm(50)
y <- runif(30)
# Do x and y come from the same distribution?
ks.test(x, y)
# Does x come from a shifted gamma distribution with shape 3 and rate 2?
ks.test(x+2, "pgamma", 3, 2) # two-sided, exact
ks.test(x+2, "pgamma", 3, 2, exact = FALSE)
ks.test(x+2, "pgamma", 3, 2, alternative = "gr")

# test if x is stochastically larger than x2
x2 <- rnorm(50, -1)
plot(ecdf(x), xlim = range(c(x, x2)))
plot(ecdf(x2), add = TRUE, lty = "dashed")
t.test(x, x2, alternative = "g")
wilcox.test(x, x2, alternative = "g")
ks.test(x, x2, alternative = "l")

# with ties, example from Schröer and Trenkler (1995)
# D = 3/7, p = 8/33 = 0.242424..
ks.test(c(1, 2, 2, 3, 3),
        c(1, 2,    3, 3, 4, 5, 6))# -> exact

# formula interface, see ?wilcox.test
ks.test(Ozone ~ Month, data = airquality,
        subset = Month %in% c(5, 8))

来源

双边单样本分布来自 Marsaglia、Tsang 和 Wang (2003)。

双样本 (Smirnov) 检验的精确分布是通过 Schröer (1991) 和 Schröer & Trenkler (1995) 提出的算法使用 Viehmann (2021) 的数值改进来计算的。

参考

Z. W. Birnbaum and Fred H. Tingey (1951). One-sided confidence contours for probability distribution functions. The Annals of Mathematical Statistics, 22/4, 592-596. doi:10.1214/aoms/1177729550.

William J. Conover (1971). Practical Nonparametric Statistics. New York: John Wiley & Sons. Pages 295-301 (one-sample Kolmogorov test), 309-314 (two-sample Smirnov test).

Durbin, J. (1973). Distribution theory for tests based on the sample distribution function. SIAM.

W. Feller (1948). On the Kolmogorov-Smirnov limit theorems for empirical distributions. The Annals of Mathematical Statistics, 19(2), 177-189. doi:10.1214/aoms/1177730243.

George Marsaglia, Wai Wan Tsang and Jingbo Wang (2003). Evaluating Kolmogorov's distribution. Journal of Statistical Software, 8/18. doi:10.18637/jss.v008.i18.

Gunar Schröer (1991). Computergestützte statistische Inferenz am Beispiel der Kolmogorov-Smirnov Tests. Diplomarbeit Universität Osnabrück.

Gunar Schröer and Dietrich Trenkler (1995). Exact and Randomization Distributions of Kolmogorov-Smirnov Tests for Two or Three Samples. Computational Statistics & Data Analysis, 20(2), 185-202. doi:10.1016/0167-9473(94)00040-P.

Thomas Viehmann (2021). Numerically more stable computation of the p-values for the two-sample Kolmogorov-Smirnov test. https://arxiv.org/abs/2102.08037.

也可以看看

psmirnov 。

shapiro.test 执行 Shapiro-Wilk 正态性测试。

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 Kolmogorov-Smirnov Tests。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。