当前位置: 首页>>代码示例 >>用法及示例精选 >>正文


R influence.measures 回归删除诊断


R语言 influence.measures 位于 stats 包(package)。

说明

这套函数可用于计算 Belsley、Kuh 和 Welsch (1980)、Cook 和 Weisberg (1982) 等中讨论的线性和广义线性模型的一些回归(留一删除)诊断。

用法

influence.measures(model, infl = influence(model))

rstandard(model, ...)
## S3 method for class 'lm'
rstandard(model, infl = lm.influence(model, do.coef = FALSE),
          sd = sqrt(deviance(model)/df.residual(model)),
          type = c("sd.1", "predictive"), ...)
## S3 method for class 'glm'
rstandard(model, infl = influence(model, do.coef = FALSE),
          type = c("deviance", "pearson"), ...)

rstudent(model, ...)
## S3 method for class 'lm'
rstudent(model, infl = lm.influence(model, do.coef = FALSE),
         res = infl$wt.res, ...)
## S3 method for class 'glm'
rstudent(model, infl = influence(model, do.coef = FALSE), ...)

dffits(model, infl = , res = )

dfbeta(model, ...)
## S3 method for class 'lm'
dfbeta(model, infl = lm.influence(model, do.coef = TRUE), ...)

dfbetas(model, ...)
## S3 method for class 'lm'
dfbetas(model, infl = lm.influence(model, do.coef = TRUE), ...)

covratio(model, infl = lm.influence(model, do.coef = FALSE),
         res = weighted.residuals(model))

cooks.distance(model, ...)
## S3 method for class 'lm'
cooks.distance(model, infl = lm.influence(model, do.coef = FALSE),
               res = weighted.residuals(model),
               sd = sqrt(deviance(model)/df.residual(model)),
               hat = infl$hat, ...)
## S3 method for class 'glm'
cooks.distance(model, infl = influence(model, do.coef = FALSE),
               res = infl$pear.res,
               dispersion = summary(model)$dispersion,
               hat = infl$hat, ...)

hatvalues(model, ...)
## S3 method for class 'lm'
hatvalues(model, infl = lm.influence(model, do.coef = FALSE), ...)

hat(x, intercept = TRUE)

参数

model

一个R对象,通常由lm或者glm.

infl

lm.influenceinfluence 返回的影响结构(后者仅适用于 rstudentcooks.distanceglm 方法)。

res

(可能加权)残差,具有适当的默认值。

sd

要使用的标准差,请参阅默认值。

dispersion

要使用的色散(对于 glm 对象),请参阅默认值。

hat

帽子值 ,请参阅默认值。

type

rstandard 的残差类型,lmglm 具有不同的选项和含义。可以缩写。

x

或设计矩阵。

intercept

是否应该在 x 前面添加截距列?

...

传入或传出其他方法的进一步参数。

细节

主要的高级函数是 influence.measures,它生成一个类 "infl" 对象表格显示,显示每个模型变量的 DFBETAS、DFFITS、协方差比率、Cook 距离和帽子矩阵的对角元素。对任何这些措施有影响的案例都标有星号。

函数dfbetasdffitscovratiocooks.distance 提供对相应诊断量的直接访问。函数rstandardrstudent 分别给出标准化残差和学生化残差。 (这些方法分别使用误差方差的总体测量和留一测量,将残差重新归一化为具有单位方差。)

请注意,对于多元 lm() 模型(属于 "mlm" 类),这些函数返回 3d 数组而不是矩阵,或者返回矩阵而不是向量。

广义线性模型的值是近似值,如 Williams (1987) 中所述(除了 Cook 距离缩放为 而不是卡方值)。当某些情况影响较大时,近似值可能会很差。

可选的 inflressd 参数是为了鼓励使用这些直接访问函数,例如,在底层基本影响度量(来自 lm.influence 或通用 influence )是已经可用。

请注意,带有 weights == 0 的案例将从所有这些函数中删除,但如果已使用 na.action = na.exclude 拟合线性模型,则会为拟合过程中排除的案例填写合适的值。

对于线性模型,rstandard(*, type = "predictive") 提供留一法交叉验证残差,模型model 的 “PRESS” 统计量(预测平方和,与 CV 分数相同)为

   PRESS <- sum(rstandard(model, type="pred")^2)

函数hat()的存在主要是为了S(版本2)兼容性;我们建议改用hatvalues()

注意

对于 hatvaluesdfbetadfbetas ,线性模型的方法也适用于广义线性模型。

例子

require(graphics)

## Analysis of the life-cycle savings data
## given in Belsley, Kuh and Welsch.
lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)

inflm.SR <- influence.measures(lm.SR)
which(apply(inflm.SR$is.inf, 1, any))
# which observations 'are' influential
summary(inflm.SR) # only these
inflm.SR          # all
plot(rstudent(lm.SR) ~ hatvalues(lm.SR)) # recommended by some
plot(lm.SR, which = 5) # an enhanced version of that via plot(<lm>)

## The 'infl' argument is not needed, but avoids recomputation:
rs <- rstandard(lm.SR)
iflSR <- influence(lm.SR)
all.equal(rs, rstandard(lm.SR, infl = iflSR), tolerance = 1e-10)
## to "see" the larger values:
1000 * round(dfbetas(lm.SR, infl = iflSR), 3)
cat("PRESS :"); (PRESS <- sum( rstandard(lm.SR, type = "predictive")^2 ))
stopifnot(all.equal(PRESS, sum( (residuals(lm.SR) / (1 - iflSR$hat))^2)))

## Show that "PRE-residuals"  ==  L.O.O. Crossvalidation (CV) errors:
X <- model.matrix(lm.SR)
y <- model.response(model.frame(lm.SR))
## Leave-one-out CV least-squares prediction errors (relatively fast)
rCV <- vapply(seq_len(nrow(X)), function(i)
              y[i] - X[i,] %*% .lm.fit(X[-i,], y[-i])$coefficients,
              numeric(1))
## are the same as the *faster* rstandard(*, "pred") :
stopifnot(all.equal(rCV, unname(rstandard(lm.SR, type = "predictive"))))


## Huber's data [Atkinson 1985]
xh <- c(-4:0, 10)
yh <- c(2.48, .73, -.04, -1.44, -1.32, 0)
lmH <- lm(yh ~ xh)
summary(lmH)
im <- influence.measures(lmH)
 im 
is.inf <- apply(im$is.inf, 1, any)
plot(xh,yh, main = "Huber's data: L.S. line and influential obs.")
abline(lmH); points(xh[is.inf], yh[is.inf], pch = 20, col = 2)

## Irwin's data [Williams 1987]
xi <- 1:5
yi <- c(0,2,14,19,30)    # number of mice responding to dose xi
mi <- rep(40, 5)         # number of mice exposed
glmI <- glm(cbind(yi, mi -yi) ~ xi, family = binomial)
summary(glmI)
signif(cooks.distance(glmI), 3)   # ~= Ci in Table 3, p.184
imI <- influence.measures(glmI)
 imI 
stopifnot(all.equal(imI$infmat[,"cook.d"],
          cooks.distance(glmI)))

作者

Several R core team members and John Fox, originally in his ‘car’ package.

参考

Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.

Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. London: Chapman and Hall.

Williams, D. A. (1987). Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics, 36, 181-191. doi:10.2307/2347550.

Fox, J. (1997). Applied Regression, Linear Models, and Related Methods. Sage.

Fox, J. (2002) An R and S-Plus Companion to Applied Regression. Sage Publ.

Fox, J. and Weisberg, S. (2011). An R Companion to Applied Regression, second edition. Sage Publ; https://socialsciences.mcmaster.ca/jfox/Books/Companion/.

也可以看看

influence(包含lm.influence)。

plotmath’用于在绘图注释中使用hat

相关用法


注:本文由纯净天空筛选整理自R-devel大神的英文原创作品 Regression Deletion Diagnostics。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。