R influence.measures 回歸刪除診斷

R語言 influence.measures 位於 stats 包(package)。

說明

這套函數可用於計算 Belsley、Kuh 和 Welsch (1980)、Cook 和 Weisberg (1982) 等中討論的線性和廣義線性模型的一些回歸(留一刪除)診斷。

用法

influence.measures(model, infl = influence(model))

rstandard(model, ...)
## S3 method for class 'lm'
rstandard(model, infl = lm.influence(model, do.coef = FALSE),
          sd = sqrt(deviance(model)/df.residual(model)),
          type = c("sd.1", "predictive"), ...)
## S3 method for class 'glm'
rstandard(model, infl = influence(model, do.coef = FALSE),
          type = c("deviance", "pearson"), ...)

rstudent(model, ...)
## S3 method for class 'lm'
rstudent(model, infl = lm.influence(model, do.coef = FALSE),
         res = infl$wt.res, ...)
## S3 method for class 'glm'
rstudent(model, infl = influence(model, do.coef = FALSE), ...)

dffits(model, infl = , res = )

dfbeta(model, ...)
## S3 method for class 'lm'
dfbeta(model, infl = lm.influence(model, do.coef = TRUE), ...)

dfbetas(model, ...)
## S3 method for class 'lm'
dfbetas(model, infl = lm.influence(model, do.coef = TRUE), ...)

covratio(model, infl = lm.influence(model, do.coef = FALSE),
         res = weighted.residuals(model))

cooks.distance(model, ...)
## S3 method for class 'lm'
cooks.distance(model, infl = lm.influence(model, do.coef = FALSE),
               res = weighted.residuals(model),
               sd = sqrt(deviance(model)/df.residual(model)),
               hat = infl$hat, ...)
## S3 method for class 'glm'
cooks.distance(model, infl = influence(model, do.coef = FALSE),
               res = infl$pear.res,
               dispersion = summary(model)$dispersion,
               hat = infl$hat, ...)

hatvalues(model, ...)
## S3 method for class 'lm'
hatvalues(model, infl = lm.influence(model, do.coef = FALSE), ...)

hat(x, intercept = TRUE)

參數

`model`	一個R對象，通常由`lm`或者`glm`.
`infl`	`lm.influence` 或 `influence` 返回的影響結構(後者僅適用於 `rstudent` 和 `cooks.distance` 的 `glm` 方法)。
`res`	(可能加權)殘差，具有適當的默認值。
`sd`	要使用的標準差，請參閱默認值。
`dispersion`	要使用的色散(對於 `glm` 對象)，請參閱默認值。
`hat`	帽子值 `H_{ii}` ，請參閱默認值。
`type`	`rstandard` 的殘差類型，`lm` 和 `glm` 具有不同的選項和含義。可以縮寫。
`x`	`X` 或設計矩陣。
`intercept`	是否應該在 `x` 前麵添加截距列？
`...`	傳入或傳出其他方法的進一步參數。

細節

主要的高級函數是 influence.measures，它生成一個類 "infl" 對象表格顯示，顯示每個模型變量的 DFBETAS、DFFITS、協方差比率、Cook 距離和帽子矩陣的對角元素。對任何這些措施有影響的案例都標有星號。

函數dfbetas、dffits、covratio 和cooks.distance 提供對相應診斷量的直接訪問。函數rstandard 和rstudent 分別給出標準化殘差和學生化殘差。 (這些方法分別使用誤差方差的總體測量和留一測量，將殘差重新歸一化為具有單位方差。)

請注意，對於多元 lm() 模型(屬於 "mlm" 類)，這些函數返回 3d 數組而不是矩陣，或者返回矩陣而不是向量。

廣義線性模型的值是近似值，如 Williams (1987) 中所述(除了 Cook 距離縮放為 F 而不是卡方值)。當某些情況影響較大時，近似值可能會很差。

可選的 infl 、 res 和 sd 參數是為了鼓勵使用這些直接訪問函數，例如，在底層基本影響度量(來自 lm.influence 或通用 influence )是已經可用。

請注意，帶有 weights == 0 的案例將從所有這些函數中刪除，但如果已使用 na.action = na.exclude 擬合線性模型，則會為擬合過程中排除的案例填寫合適的值。

對於線性模型，rstandard(*, type = "predictive") 提供留一法交叉驗證殘差，模型model 的 “PRESS” 統計量(預測平方和，與 CV 分數相同)為

   PRESS <- sum(rstandard(model, type="pred")^2)

函數hat()的存在主要是為了S(版本2)兼容性；我們建議改用hatvalues()。

注意

對於 hatvalues 、 dfbeta 和 dfbetas ，線性模型的方法也適用於廣義線性模型。

例子

require(graphics)

## Analysis of the life-cycle savings data
## given in Belsley, Kuh and Welsch.
lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)

inflm.SR <- influence.measures(lm.SR)
which(apply(inflm.SR$is.inf, 1, any))
# which observations 'are' influential
summary(inflm.SR) # only these
inflm.SR          # all
plot(rstudent(lm.SR) ~ hatvalues(lm.SR)) # recommended by some
plot(lm.SR, which = 5) # an enhanced version of that via plot(<lm>)

## The 'infl' argument is not needed, but avoids recomputation:
rs <- rstandard(lm.SR)
iflSR <- influence(lm.SR)
all.equal(rs, rstandard(lm.SR, infl = iflSR), tolerance = 1e-10)
## to "see" the larger values:
1000 * round(dfbetas(lm.SR, infl = iflSR), 3)
cat("PRESS :"); (PRESS <- sum( rstandard(lm.SR, type = "predictive")^2 ))
stopifnot(all.equal(PRESS, sum( (residuals(lm.SR) / (1 - iflSR$hat))^2)))

## Show that "PRE-residuals"  ==  L.O.O. Crossvalidation (CV) errors:
X <- model.matrix(lm.SR)
y <- model.response(model.frame(lm.SR))
## Leave-one-out CV least-squares prediction errors (relatively fast)
rCV <- vapply(seq_len(nrow(X)), function(i)
              y[i] - X[i,] %*% .lm.fit(X[-i,], y[-i])$coefficients,
              numeric(1))
## are the same as the *faster* rstandard(*, "pred") :
stopifnot(all.equal(rCV, unname(rstandard(lm.SR, type = "predictive"))))


## Huber's data [Atkinson 1985]
xh <- c(-4:0, 10)
yh <- c(2.48, .73, -.04, -1.44, -1.32, 0)
lmH <- lm(yh ~ xh)
summary(lmH)
im <- influence.measures(lmH)
 im 
is.inf <- apply(im$is.inf, 1, any)
plot(xh,yh, main = "Huber's data: L.S. line and influential obs.")
abline(lmH); points(xh[is.inf], yh[is.inf], pch = 20, col = 2)

## Irwin's data [Williams 1987]
xi <- 1:5
yi <- c(0,2,14,19,30)    # number of mice responding to dose xi
mi <- rep(40, 5)         # number of mice exposed
glmI <- glm(cbind(yi, mi -yi) ~ xi, family = binomial)
summary(glmI)
signif(cooks.distance(glmI), 3)   # ~= Ci in Table 3, p.184
imI <- influence.measures(glmI)
 imI 
stopifnot(all.equal(imI$infmat[,"cook.d"],
          cooks.distance(glmI)))

作者

Several R core team members and John Fox, originally in his ‘car’ package.

參考

Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.

Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. London: Chapman and Hall.

Williams, D. A. (1987). Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics, 36, 181-191. doi:10.2307/2347550.

Fox, J. (1997). Applied Regression, Linear Models, and Related Methods. Sage.

Fox, J. (2002) An R and S-Plus Companion to Applied Regression. Sage Publ.

Fox, J. and Weisberg, S. (2011). An R Companion to Applied Regression, second edition. Sage Publ; https://socialsciences.mcmaster.ca/jfox/Books/Companion/.

也可以看看

influence(包含lm.influence)。

‘plotmath’用於在繪圖注釋中使用hat。

相關用法

注：本文由純淨天空篩選整理自R-devel大神的英文原創作品 Regression Deletion Diagnostics。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。