Python SciPy contingency.chi2_contingency用法及代码示例

本文简要介绍 python 语言中 scipy.stats.contingency.chi2_contingency 的用法。

用法: scipy.stats.contingency.chi2_contingency(observed, correction=True, lambda_=None)#

列联表中变量独立性的卡方检验。

此函数计算列联表中观测频率独立性假设检验的卡方统计量和 p 值[1] 观察到的.预期频率是在独立假设下基于边际总和计算的；看scipy.stats.contingency.expected_freq.自由度数为(使用 numpy 函数和属性表示)：

dof = observed.size - sum(observed.shape) + observed.ndim - 1

参数：：

observed： array_like: 列联表。该表包含每个类别中观察到的频率(即出现次数)。在二维情况下，该表通常被说明为“R x C 表”。
correction：布尔型，可选: 如果为 True，并且自由度为 1，则应用 Yates 的连续性校正。校正的效果是将每个观察值向相应的预期值调整 0.5。
lambda_： float 或 str，可选: 默认情况下，此测试中计算的统计量是 Pearson 的卡方统计量[2].lambda_允许来自 Cressie-Read 功率发散族的统计数据[3]改为使用。看scipy.stats.power_divergence详情。

res： Chi2应急结果

包含属性的对象：

统计浮点数: 检验统计量。
p值浮点数: 检验的 p 值。
自由度 int: 自由度。
expected_freq ndarray，形状与观察到的: 预期频率，基于表格的边际总和。

注意：

该计算的有效性经常被引用的准则是，只有在每个单元中观察到的和预期的频率至少为 5 时才应使用该测试。

这是对不同类 other 口的独立性的测试。只有当被观察的维度是两个或更多时，测试才有意义。将测试应用于一维表将始终导致预期等于观察到且卡方统计量等于 0。

此函数不处理掩码数组，因为缺少值的计算没有意义。

与 scipy.stats.chisquare 一样，该函数计算卡方统计量；该函数提供的便利是从给定的列联表中计算出预期的频率和自由度。如果这些是已知的，并且不需要耶茨的修正，则可以使用 scipy.stats.chisquare 。也就是说，如果有人调用：

res = chi2_contingency(obs, correction=False)

那么以下是正确的：

(res.statistic, res.pvalue) == stats.chisquare(obs.ravel(),
                                               f_exp=ex.ravel(),
                                               ddof=obs.size - 1 - dof)

lambda_ 参数是在 scipy 的 0.13.0 版本中添加的。

参考：

[1]

“Contingency table”、https://en.wikipedia.org/wiki/Contingency_table

[2]

“皮尔逊卡方检验”，https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test

[3]

Cressie, N. 和 Read, T. R. C.，“多项式 Goodness-of-Fit 测试”，J. Royal Stat。社会党。 B系列，卷。 46，第 3 期(1984 年)，第 440-464 页。

[4]

杰弗里·S·伯杰等人。 “阿司匹林用于女性和男性心血管事件的一级预防：Sex-Specific Meta-analysis 随机对照试验。” 《美国医学会杂志》，295(3)：306-313，DOI:10.1001/jama.295.3.306，2006 年。

例子：

在[4]中，研究了使用阿司匹林预防女性和男性心血管事件的情况。该研究特别得出结论：

…aspirin therapy reduced the risk of a composite of cardiovascular events due to its effect on reducing the risk of ischemic stroke in women […]

文章列出了各种心血管事件的研究。让我们关注女性缺血性中风。

下表总结了参与者多年来定期服用阿司匹林或安慰剂的实验结果。记录了缺血性中风病例：

Aspirin   Control/Placebo
Ischemic stroke     176           230
No stroke         21035         21018

有证据表明阿司匹林可以降低缺血性中风的风险吗？我们首先制定原假设\(H_0\)：

The effect of aspirin is equivalent to that of placebo.

让我们通过卡方检验来评估这个假设的合理性。

>>> import numpy as np
>>> from scipy.stats import chi2_contingency
>>> table = np.array([[176, 230], [21035, 21018]])
>>> res = chi2_contingency(table)
>>> res.statistic
6.892569132546561
>>> res.pvalue
0.008655478161175739

使用 5% 的显著性水平，我们会拒绝零假设，转而支持备择假设：“阿司匹林的效果不等于安慰剂的效果”。因为scipy.stats.contingency.chi2_contingency进行双边检验时，备择假设并不表明效应的方向。我们可以用stats.contingency.odds_ratio支持阿司匹林的结论减少缺血性中风的风险。

下面是进一步的示例，展示了如何测试更大的列联表。

双向示例 (2 x 3)：

>>> obs = np.array([[10, 10, 20], [20, 20, 20]])
>>> res = chi2_contingency(obs)
>>> res.statistic
2.7777777777777777
>>> res.pvalue
0.24935220877729619
>>> res.dof
2
>>> res.expected_freq
array([[ 12.,  12.,  16.],
       [ 18.,  18.,  24.]])

使用对数似然比(即 “G-test”)而不是 Pearson 的卡方统计量执行检验。

>>> res = chi2_contingency(obs, lambda_="log-likelihood")
>>> res.statistic
2.7688587616781319
>>> res.pvalue
0.25046668010954165

four-way 示例(2 x 2 x 2 x 2)：

>>> obs = np.array(
...     [[[[12, 17],
...        [11, 16]],
...       [[11, 12],
...        [15, 16]]],
...      [[[23, 15],
...        [30, 22]],
...       [[14, 17],
...        [15, 16]]]])
>>> res = chi2_contingency(obs)
>>> res.statistic
8.7584514426741897
>>> res.pvalue
0.64417725029295503

相关用法

注：本文由纯净天空筛选整理自scipy.org大神的英文原创作品 scipy.stats.contingency.chi2_contingency。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。

用法:

参数 ：：

返回 ：：

注意：

参考：

例子：

参数：：

返回：：