當前位置: 首頁>>代碼示例 >>用法及示例精選 >>正文


R influence.measures 回歸刪除診斷


R語言 influence.measures 位於 stats 包(package)。

說明

這套函數可用於計算 Belsley、Kuh 和 Welsch (1980)、Cook 和 Weisberg (1982) 等中討論的線性和廣義線性模型的一些回歸(留一刪除)診斷。

用法

influence.measures(model, infl = influence(model))

rstandard(model, ...)
## S3 method for class 'lm'
rstandard(model, infl = lm.influence(model, do.coef = FALSE),
          sd = sqrt(deviance(model)/df.residual(model)),
          type = c("sd.1", "predictive"), ...)
## S3 method for class 'glm'
rstandard(model, infl = influence(model, do.coef = FALSE),
          type = c("deviance", "pearson"), ...)

rstudent(model, ...)
## S3 method for class 'lm'
rstudent(model, infl = lm.influence(model, do.coef = FALSE),
         res = infl$wt.res, ...)
## S3 method for class 'glm'
rstudent(model, infl = influence(model, do.coef = FALSE), ...)

dffits(model, infl = , res = )

dfbeta(model, ...)
## S3 method for class 'lm'
dfbeta(model, infl = lm.influence(model, do.coef = TRUE), ...)

dfbetas(model, ...)
## S3 method for class 'lm'
dfbetas(model, infl = lm.influence(model, do.coef = TRUE), ...)

covratio(model, infl = lm.influence(model, do.coef = FALSE),
         res = weighted.residuals(model))

cooks.distance(model, ...)
## S3 method for class 'lm'
cooks.distance(model, infl = lm.influence(model, do.coef = FALSE),
               res = weighted.residuals(model),
               sd = sqrt(deviance(model)/df.residual(model)),
               hat = infl$hat, ...)
## S3 method for class 'glm'
cooks.distance(model, infl = influence(model, do.coef = FALSE),
               res = infl$pear.res,
               dispersion = summary(model)$dispersion,
               hat = infl$hat, ...)

hatvalues(model, ...)
## S3 method for class 'lm'
hatvalues(model, infl = lm.influence(model, do.coef = FALSE), ...)

hat(x, intercept = TRUE)

參數

model

一個R對象,通常由lm或者glm.

infl

lm.influenceinfluence 返回的影響結構(後者僅適用於 rstudentcooks.distanceglm 方法)。

res

(可能加權)殘差,具有適當的默認值。

sd

要使用的標準差,請參閱默認值。

dispersion

要使用的色散(對於 glm 對象),請參閱默認值。

hat

帽子值 ,請參閱默認值。

type

rstandard 的殘差類型,lmglm 具有不同的選項和含義。可以縮寫。

x

或設計矩陣。

intercept

是否應該在 x 前麵添加截距列?

...

傳入或傳出其他方法的進一步參數。

細節

主要的高級函數是 influence.measures,它生成一個類 "infl" 對象表格顯示,顯示每個模型變量的 DFBETAS、DFFITS、協方差比率、Cook 距離和帽子矩陣的對角元素。對任何這些措施有影響的案例都標有星號。

函數dfbetasdffitscovratiocooks.distance 提供對相應診斷量的直接訪問。函數rstandardrstudent 分別給出標準化殘差和學生化殘差。 (這些方法分別使用誤差方差的總體測量和留一測量,將殘差重新歸一化為具有單位方差。)

請注意,對於多元 lm() 模型(屬於 "mlm" 類),這些函數返回 3d 數組而不是矩陣,或者返回矩陣而不是向量。

廣義線性模型的值是近似值,如 Williams (1987) 中所述(除了 Cook 距離縮放為 而不是卡方值)。當某些情況影響較大時,近似值可能會很差。

可選的 inflressd 參數是為了鼓勵使用這些直接訪問函數,例如,在底層基本影響度量(來自 lm.influence 或通用 influence )是已經可用。

請注意,帶有 weights == 0 的案例將從所有這些函數中刪除,但如果已使用 na.action = na.exclude 擬合線性模型,則會為擬合過程中排除的案例填寫合適的值。

對於線性模型,rstandard(*, type = "predictive") 提供留一法交叉驗證殘差,模型model 的 “PRESS” 統計量(預測平方和,與 CV 分數相同)為

   PRESS <- sum(rstandard(model, type="pred")^2)

函數hat()的存在主要是為了S(版本2)兼容性;我們建議改用hatvalues()

注意

對於 hatvaluesdfbetadfbetas ,線性模型的方法也適用於廣義線性模型。

例子

require(graphics)

## Analysis of the life-cycle savings data
## given in Belsley, Kuh and Welsch.
lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)

inflm.SR <- influence.measures(lm.SR)
which(apply(inflm.SR$is.inf, 1, any))
# which observations 'are' influential
summary(inflm.SR) # only these
inflm.SR          # all
plot(rstudent(lm.SR) ~ hatvalues(lm.SR)) # recommended by some
plot(lm.SR, which = 5) # an enhanced version of that via plot(<lm>)

## The 'infl' argument is not needed, but avoids recomputation:
rs <- rstandard(lm.SR)
iflSR <- influence(lm.SR)
all.equal(rs, rstandard(lm.SR, infl = iflSR), tolerance = 1e-10)
## to "see" the larger values:
1000 * round(dfbetas(lm.SR, infl = iflSR), 3)
cat("PRESS :"); (PRESS <- sum( rstandard(lm.SR, type = "predictive")^2 ))
stopifnot(all.equal(PRESS, sum( (residuals(lm.SR) / (1 - iflSR$hat))^2)))

## Show that "PRE-residuals"  ==  L.O.O. Crossvalidation (CV) errors:
X <- model.matrix(lm.SR)
y <- model.response(model.frame(lm.SR))
## Leave-one-out CV least-squares prediction errors (relatively fast)
rCV <- vapply(seq_len(nrow(X)), function(i)
              y[i] - X[i,] %*% .lm.fit(X[-i,], y[-i])$coefficients,
              numeric(1))
## are the same as the *faster* rstandard(*, "pred") :
stopifnot(all.equal(rCV, unname(rstandard(lm.SR, type = "predictive"))))


## Huber's data [Atkinson 1985]
xh <- c(-4:0, 10)
yh <- c(2.48, .73, -.04, -1.44, -1.32, 0)
lmH <- lm(yh ~ xh)
summary(lmH)
im <- influence.measures(lmH)
 im 
is.inf <- apply(im$is.inf, 1, any)
plot(xh,yh, main = "Huber's data: L.S. line and influential obs.")
abline(lmH); points(xh[is.inf], yh[is.inf], pch = 20, col = 2)

## Irwin's data [Williams 1987]
xi <- 1:5
yi <- c(0,2,14,19,30)    # number of mice responding to dose xi
mi <- rep(40, 5)         # number of mice exposed
glmI <- glm(cbind(yi, mi -yi) ~ xi, family = binomial)
summary(glmI)
signif(cooks.distance(glmI), 3)   # ~= Ci in Table 3, p.184
imI <- influence.measures(glmI)
 imI 
stopifnot(all.equal(imI$infmat[,"cook.d"],
          cooks.distance(glmI)))

作者

Several R core team members and John Fox, originally in his ‘car’ package.

參考

Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.

Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. London: Chapman and Hall.

Williams, D. A. (1987). Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics, 36, 181-191. doi:10.2307/2347550.

Fox, J. (1997). Applied Regression, Linear Models, and Related Methods. Sage.

Fox, J. (2002) An R and S-Plus Companion to Applied Regression. Sage Publ.

Fox, J. and Weisberg, S. (2011). An R Companion to Applied Regression, second edition. Sage Publ; https://socialsciences.mcmaster.ca/jfox/Books/Companion/.

也可以看看

influence(包含lm.influence)。

plotmath’用於在繪圖注釋中使用hat

相關用法


注:本文由純淨天空篩選整理自R-devel大神的英文原創作品 Regression Deletion Diagnostics。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。