Python SciPy stats.chi2_contingency用法及代碼示例

本文簡要介紹 python 語言中 scipy.stats.chi2_contingency 的用法。

用法: scipy.stats.chi2_contingency(observed, correction=True, lambda_=None)#

列聯表中變量獨立性的卡方檢驗。

此函數計算列聯表中觀測頻率獨立性假設檢驗的卡方統計量和 p 值[1] 觀察到的.預期頻率是在獨立假設下基於邊際總和計算的；看scipy.stats.contingency.expected_freq.自由度數為(使用 numpy 函數和屬性表示)：

dof = observed.size - sum(observed.shape) + observed.ndim - 1

參數：：

observed： array_like: 列聯表。該表包含每個類別中觀察到的頻率(即出現次數)。在二維情況下，該表通常被說明為“R x C 表”。
correction：布爾型，可選: 如果為 True，並且自由度為 1，則應用 Yates 的連續性校正。校正的效果是將每個觀察值向相應的預期值調整 0.5。
lambda_： float 或 str，可選: 默認情況下，此測試中計算的統計量是 Pearson 的卡方統計量[2].lambda_允許來自 Cressie-Read 功率發散族的統計數據[3]改為使用。看scipy.stats.power_divergence詳情。

res： Chi2應急結果

包含屬性的對象：

統計浮點數: 檢驗統計量。
p值浮點數: 檢驗的 p 值。
自由度 int: 自由度。
expected_freq ndarray，形狀與觀察到的: 預期頻率，基於表格的邊際總和。

注意：

該計算的有效性經常被引用的準則是，隻有在每個單元中觀察到的和預期的頻率至少為 5 時才應使用該測試。

這是對不同類 other 口的獨立性的測試。隻有當被觀察的維度是兩個或更多時，測試才有意義。將測試應用於一維表將始終導致預期等於觀察到且卡方統計量等於 0。

此函數不處理掩碼數組，因為缺少值的計算沒有意義。

與 scipy.stats.chisquare 一樣，該函數計算卡方統計量；該函數提供的便利是從給定的列聯表中計算出預期的頻率和自由度。如果這些是已知的，並且不需要耶茨的修正，則可以使用 scipy.stats.chisquare 。也就是說，如果有人調用：

res = chi2_contingency(obs, correction=False)

那麽以下是正確的：

(res.statistic, res.pvalue) == stats.chisquare(obs.ravel(),
                                               f_exp=ex.ravel(),
                                               ddof=obs.size - 1 - dof)

lambda_ 參數是在 scipy 的 0.13.0 版本中添加的。

參考：

[1]

“Contingency table”、https://en.wikipedia.org/wiki/Contingency_table

[2]

“皮爾遜卡方檢驗”，https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test

[3]

Cressie, N. 和 Read, T. R. C.，“多項式 Goodness-of-Fit 測試”，J. Royal Stat。社會黨。 B係列，卷。 46，第 3 期(1984 年)，第 440-464 頁。

[4]

傑弗裏·S·伯傑等人。 “阿司匹林用於女性和男性心血管事件的一級預防：Sex-Specific Meta-analysis 隨機對照試驗。” 《美國醫學會雜誌》，295(3)：306-313，DOI:10.1001/jama.295.3.306，2006 年。

例子：

在[4]中，研究了使用阿司匹林預防女性和男性心血管事件的情況。該研究特別得出結論：

…aspirin therapy reduced the risk of a composite of cardiovascular events due to its effect on reducing the risk of ischemic stroke in women […]

文章列出了各種心血管事件的研究。讓我們關注女性缺血性中風。

下表總結了參與者多年來定期服用阿司匹林或安慰劑的實驗結果。記錄了缺血性中風病例：

Aspirin   Control/Placebo
Ischemic stroke     176           230
No stroke         21035         21018

有證據表明阿司匹林可以降低缺血性中風的風險嗎？我們首先製定原假設\(H_0\)：

The effect of aspirin is equivalent to that of placebo.

讓我們通過卡方檢驗來評估這個假設的合理性。

>>> import numpy as np
>>> from scipy.stats import chi2_contingency
>>> table = np.array([[176, 230], [21035, 21018]])
>>> res = chi2_contingency(table)
>>> res.statistic
6.892569132546561
>>> res.pvalue
0.008655478161175739

使用 5% 的顯著性水平，我們會拒絕零假設，轉而支持備擇假設：“阿司匹林的效果不等於安慰劑的效果”。因為scipy.stats.contingency.chi2_contingency進行雙邊檢驗時，備擇假設並不表明效應的方向。我們可以用stats.contingency.odds_ratio支持阿司匹林的結論減少缺血性中風的風險。

下麵是進一步的示例，展示了如何測試更大的列聯表。

雙向示例 (2 x 3)：

>>> obs = np.array([[10, 10, 20], [20, 20, 20]])
>>> res = chi2_contingency(obs)
>>> res.statistic
2.7777777777777777
>>> res.pvalue
0.24935220877729619
>>> res.dof
2
>>> res.expected_freq
array([[ 12.,  12.,  16.],
       [ 18.,  18.,  24.]])

使用對數似然比(即 “G-test”)而不是 Pearson 的卡方統計量執行檢驗。

>>> res = chi2_contingency(obs, lambda_="log-likelihood")
>>> res.statistic
2.7688587616781319
>>> res.pvalue
0.25046668010954165

four-way 示例(2 x 2 x 2 x 2)：

>>> obs = np.array(
...     [[[[12, 17],
...        [11, 16]],
...       [[11, 12],
...        [15, 16]]],
...      [[[23, 15],
...        [30, 22]],
...       [[14, 17],
...        [15, 16]]]])
>>> res = chi2_contingency(obs)
>>> res.statistic
8.7584514426741897
>>> res.pvalue
0.64417725029295503

相關用法

注：本文由純淨天空篩選整理自scipy.org大神的英文原創作品 scipy.stats.chi2_contingency。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

用法:

參數 ：：

返回 ：：

注意：

參考：

例子：

參數：：

返回：：