R influence.measures 回归删除诊断

R语言 influence.measures 位于 stats 包(package)。

说明

这套函数可用于计算 Belsley、Kuh 和 Welsch (1980)、Cook 和 Weisberg (1982) 等中讨论的线性和广义线性模型的一些回归(留一删除)诊断。

用法

influence.measures(model, infl = influence(model))

rstandard(model, ...)
## S3 method for class 'lm'
rstandard(model, infl = lm.influence(model, do.coef = FALSE),
          sd = sqrt(deviance(model)/df.residual(model)),
          type = c("sd.1", "predictive"), ...)
## S3 method for class 'glm'
rstandard(model, infl = influence(model, do.coef = FALSE),
          type = c("deviance", "pearson"), ...)

rstudent(model, ...)
## S3 method for class 'lm'
rstudent(model, infl = lm.influence(model, do.coef = FALSE),
         res = infl$wt.res, ...)
## S3 method for class 'glm'
rstudent(model, infl = influence(model, do.coef = FALSE), ...)

dffits(model, infl = , res = )

dfbeta(model, ...)
## S3 method for class 'lm'
dfbeta(model, infl = lm.influence(model, do.coef = TRUE), ...)

dfbetas(model, ...)
## S3 method for class 'lm'
dfbetas(model, infl = lm.influence(model, do.coef = TRUE), ...)

covratio(model, infl = lm.influence(model, do.coef = FALSE),
         res = weighted.residuals(model))

cooks.distance(model, ...)
## S3 method for class 'lm'
cooks.distance(model, infl = lm.influence(model, do.coef = FALSE),
               res = weighted.residuals(model),
               sd = sqrt(deviance(model)/df.residual(model)),
               hat = infl$hat, ...)
## S3 method for class 'glm'
cooks.distance(model, infl = influence(model, do.coef = FALSE),
               res = infl$pear.res,
               dispersion = summary(model)$dispersion,
               hat = infl$hat, ...)

hatvalues(model, ...)
## S3 method for class 'lm'
hatvalues(model, infl = lm.influence(model, do.coef = FALSE), ...)

hat(x, intercept = TRUE)

参数

`model`	一个R对象，通常由`lm`或者`glm`.
`infl`	`lm.influence` 或 `influence` 返回的影响结构(后者仅适用于 `rstudent` 和 `cooks.distance` 的 `glm` 方法)。
`res`	(可能加权)残差，具有适当的默认值。
`sd`	要使用的标准差，请参阅默认值。
`dispersion`	要使用的色散(对于 `glm` 对象)，请参阅默认值。
`hat`	帽子值 `H_{ii}` ，请参阅默认值。
`type`	`rstandard` 的残差类型，`lm` 和 `glm` 具有不同的选项和含义。可以缩写。
`x`	`X` 或设计矩阵。
`intercept`	是否应该在 `x` 前面添加截距列？
`...`	传入或传出其他方法的进一步参数。

细节

主要的高级函数是 influence.measures，它生成一个类 "infl" 对象表格显示，显示每个模型变量的 DFBETAS、DFFITS、协方差比率、Cook 距离和帽子矩阵的对角元素。对任何这些措施有影响的案例都标有星号。

函数dfbetas、dffits、covratio 和cooks.distance 提供对相应诊断量的直接访问。函数rstandard 和rstudent 分别给出标准化残差和学生化残差。 (这些方法分别使用误差方差的总体测量和留一测量，将残差重新归一化为具有单位方差。)

请注意，对于多元 lm() 模型(属于 "mlm" 类)，这些函数返回 3d 数组而不是矩阵，或者返回矩阵而不是向量。

广义线性模型的值是近似值，如 Williams (1987) 中所述(除了 Cook 距离缩放为 F 而不是卡方值)。当某些情况影响较大时，近似值可能会很差。

可选的 infl 、 res 和 sd 参数是为了鼓励使用这些直接访问函数，例如，在底层基本影响度量(来自 lm.influence 或通用 influence )是已经可用。

请注意，带有 weights == 0 的案例将从所有这些函数中删除，但如果已使用 na.action = na.exclude 拟合线性模型，则会为拟合过程中排除的案例填写合适的值。

对于线性模型，rstandard(*, type = "predictive") 提供留一法交叉验证残差，模型model 的 “PRESS” 统计量(预测平方和，与 CV 分数相同)为

   PRESS <- sum(rstandard(model, type="pred")^2)

函数hat()的存在主要是为了S(版本2)兼容性；我们建议改用hatvalues()。

注意

对于 hatvalues 、 dfbeta 和 dfbetas ，线性模型的方法也适用于广义线性模型。

例子

require(graphics)

## Analysis of the life-cycle savings data
## given in Belsley, Kuh and Welsch.
lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)

inflm.SR <- influence.measures(lm.SR)
which(apply(inflm.SR$is.inf, 1, any))
# which observations 'are' influential
summary(inflm.SR) # only these
inflm.SR          # all
plot(rstudent(lm.SR) ~ hatvalues(lm.SR)) # recommended by some
plot(lm.SR, which = 5) # an enhanced version of that via plot(<lm>)

## The 'infl' argument is not needed, but avoids recomputation:
rs <- rstandard(lm.SR)
iflSR <- influence(lm.SR)
all.equal(rs, rstandard(lm.SR, infl = iflSR), tolerance = 1e-10)
## to "see" the larger values:
1000 * round(dfbetas(lm.SR, infl = iflSR), 3)
cat("PRESS :"); (PRESS <- sum( rstandard(lm.SR, type = "predictive")^2 ))
stopifnot(all.equal(PRESS, sum( (residuals(lm.SR) / (1 - iflSR$hat))^2)))

## Show that "PRE-residuals"  ==  L.O.O. Crossvalidation (CV) errors:
X <- model.matrix(lm.SR)
y <- model.response(model.frame(lm.SR))
## Leave-one-out CV least-squares prediction errors (relatively fast)
rCV <- vapply(seq_len(nrow(X)), function(i)
              y[i] - X[i,] %*% .lm.fit(X[-i,], y[-i])$coefficients,
              numeric(1))
## are the same as the *faster* rstandard(*, "pred") :
stopifnot(all.equal(rCV, unname(rstandard(lm.SR, type = "predictive"))))


## Huber's data [Atkinson 1985]
xh <- c(-4:0, 10)
yh <- c(2.48, .73, -.04, -1.44, -1.32, 0)
lmH <- lm(yh ~ xh)
summary(lmH)
im <- influence.measures(lmH)
 im 
is.inf <- apply(im$is.inf, 1, any)
plot(xh,yh, main = "Huber's data: L.S. line and influential obs.")
abline(lmH); points(xh[is.inf], yh[is.inf], pch = 20, col = 2)

## Irwin's data [Williams 1987]
xi <- 1:5
yi <- c(0,2,14,19,30)    # number of mice responding to dose xi
mi <- rep(40, 5)         # number of mice exposed
glmI <- glm(cbind(yi, mi -yi) ~ xi, family = binomial)
summary(glmI)
signif(cooks.distance(glmI), 3)   # ~= Ci in Table 3, p.184
imI <- influence.measures(glmI)
 imI 
stopifnot(all.equal(imI$infmat[,"cook.d"],
          cooks.distance(glmI)))

作者

Several R core team members and John Fox, originally in his ‘car’ package.

参考

Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.

Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. London: Chapman and Hall.

Williams, D. A. (1987). Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics, 36, 181-191. doi:10.2307/2347550.

Fox, J. (1997). Applied Regression, Linear Models, and Related Methods. Sage.

Fox, J. (2002) An R and S-Plus Companion to Applied Regression. Sage Publ.

Fox, J. and Weisberg, S. (2011). An R Companion to Applied Regression, second edition. Sage Publ; https://socialsciences.mcmaster.ca/jfox/Books/Companion/.

也可以看看

influence(包含lm.influence)。

‘plotmath’用于在绘图注释中使用hat。

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 Regression Deletion Diagnostics。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。