R plot.lm 绘制 lm 对象的诊断图

R语言 plot.lm 位于 stats 包(package)。

说明

目前有六个图(可通过 which 选择)：残差与拟合值的图、\sqrt{| residuals |} 与拟合值的 Scale-Location 图、Q-Q 残差图、Cook 距离与行标签的图、残差与杠杆的关系图，以及库克距离与杠杆/(1-杠杆)的关系图。默认情况下，提供前三个和5。

用法

## S3 method for class 'lm'
plot(x, which = c(1,2,3,5), 
     caption = list("Residuals vs Fitted", "Q-Q Residuals",
       "Scale-Location", "Cook's distance",
       "Residuals vs Leverage",
       expression("Cook's dist vs Leverage* " * h[ii] / (1 - h[ii]))),
     panel = if(add.smooth) function(x, y, ...)
              panel.smooth(x, y, iter=iter.smooth, ...) else points,
     sub.caption = NULL, main = "",
     ask = prod(par("mfcol")) < length(which) && dev.interactive(),
     ...,
     id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75,
     qqline = TRUE, cook.levels = c(0.5, 1.0),
     cook.col = 8, cook.lty = 2, cook.legendChanges = list(),
     add.smooth = getOption("add.smooth"),
     iter.smooth = if(isGlm) 0 else 3,
     label.pos = c(4,2),
     cex.caption = 1, cex.oma.main = 1.25
   , extend.ylim.f = 0.08
     )

参数

`x`	`lm` 对象，通常是 `lm` 或 `glm` 的结果。
`which`	数字的子集 `1:6` ，默认为 `1:3, 5` ，指的是 “残差与拟合”，又名“Tukey-Anscombe”图 “残留Q-Q”情节 "Scale-Location" 《库克的距离》 “残差与杠杆” “库克距离与 Lev./(1-Lev.)” 另请参阅下面的“详细信息”。
`caption`	标题出现在图上方； `character` 向量或有效图形注释的`list`，请参见 `as.graphicsAnnot` ，长度为 6，j-th 对应于 `which[j]` 条目，另请参见“用法”中的默认向量。可以设置为 `""` 或 `NA` 以抑制所有字幕。
`panel`	面板函数。 `points` 的有用替代方案 `panel.smooth` 可以通过 `add.smooth = TRUE` 选择。
`sub.caption`	通用标题——如果有多个数字，则位于数字之上；否则用作 `sub` (s. `title` )。如果 `NULL` ，默认情况下，使用 `deparse(x$call)` 的可能缩写版本。
`main`	除了 `caption` 之外，每个图的标题。
`ask`	逻辑性；如果 `TRUE` ，则在每个绘图之前都会询问用户，请参阅`par(ask=.)` 。
`...`	要传递给绘图函数的其他参数。
`id.n`	每个图中要标记的点数，从最极端的开始。
`labels.id`	标签向量，从中选择极值点的标签。 `NULL` 使用观察数。
`cex.id`	点标签的放大。
`qqline`	逻辑指示是否应将 `qqline()` 添加到正常的 Q-Q 图中。
`cook.levels`	绘制轮廓的库克距离的级别。
`cook.col` , `cook.lty`	用于这些轮廓线的颜色和线条类型。
`cook.legendChanges`	`legend` 的参数 `list` (或 `NULL` 以抑制调用)，应从 `plot.lm()` 默认值 `list(x = "bottomleft", legend = "Cook's distance", lty = cook.lty, col = cook.col, text.col = cook.col, bty = "n", x.intersp = 1/4, y.intersp = 1/8)` 进行修改(或添加到)。
`add.smooth`	逻辑指示是否应将平滑器添加到大多数绘图中；另请参见上面的`panel`。
`iter.smooth`	鲁棒性迭代次数，`panel.smooth()` 中的参数 `iter` ；默认情况下，`glm` 拟合不使用此类迭代，这对于二元观测的(主要)情况特别理想，而且也适用于响应分布可能高度倾斜的其他模型。
`label.pos`	分别针对图 1-3、5、6 的图表左半部分和右半部分的标签定位。
`cex.caption`	控制 `caption` 的大小。
`cex.oma.main`	仅当有多个数字时，才控制 `sub.caption` 的大小。
`extend.ylim.f`	长度为 1 或 2 的数值向量，当 `id.n` 非空时，在 `ylim <- extendrange(r=ylim, f = *)` 中用于绘图 `1` 和 `5`。

细节

sub.caption(默认情况下，函数调用)在绘图位于不同页面上时在每个绘图上显示为副标题(在 x 轴标题下)，或者在存在多个绘图时在外边距(如果有)中显示为副标题每页。

“Scale-Location”图 (which=3)，也称为“Spread-Location”或“S-L”图，取绝对残差的平方根以减少偏度(\sqrt{| E |} 的偏度比| E | 高斯 zero-mean E )。

‘S-L’、Q-Q 和 Residual-Leverage ( which=5 ) 图使用具有相同方差的标准化残差(在假设下)。它们以 R_i / (s \times \sqrt{1 - h_{ii}}) 形式给出，其中 ‘leverages’ h_{ii} 是帽子矩阵 influence()$hat (另请参见 hat )的对角线条目，并且 Residual-Leverage 图使用标准化 Pearson 残差 ( residuals.glm(type = "pearson") ) 为 R[i] 。

Residual-Leverage 图 ( which=5 ) 显示 cook.levels 值(默认为 0.5 和 1)的库克距离相等的轮廓，并忽略杠杆为 1 的情况并带有警告。如果杠杆是恒定的(通常是平衡 aov 情况下的情况)，则绘图将使用因子水平组合而不是 x 轴的杠杆。 (因子水平按平均拟合值排序。)

在库克距离与杠杆/(1-杠杆) (= “leverage*”) 图中 ( which=6 ) 中，大小相等的标准化残差 ( rstandard(.) ) 的轮廓是穿过原点的线。这些线标有幅度。 x 轴标有(非等距)杠杆 h_{ii} 。

对于 glm 情况，Q-Q 图基于标准化偏差残差的绝对值。当应用鞍点近似时，它们具有近似的半正态分布。鞍点近似对于正态和逆高斯族是精确的，并且对于色散较小(大形状)的 Gamma 族以及计数较大的泊松和二项式族近似成立(Dunn 和 Smyth 2018)。

例子

require(graphics)

## Analysis of the life-cycle savings data
## given in Belsley, Kuh and Welsch.
lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
plot(lm.SR)

## 4 plots on 1 page;
## allow room for printing model formula in outer margin:
par(mfrow = c(2, 2), oma = c(0, 0, 2, 0)) -> opar
plot(lm.SR)
plot(lm.SR, id.n = NULL)                 # no id's
plot(lm.SR, id.n = 5, labels.id = NULL)  # 5 id numbers

## Was default in R <= 2.1.x:
## Cook's distances instead of Residual-Leverage plot
plot(lm.SR, which = 1:4)

## All the above fit a smooth curve where applicable
## by default unless "add.smooth" is changed.
## Give a smoother curve by increasing the lowess span :
plot(lm.SR, panel = function(x, y) panel.smooth(x, y, span = 1))

par(mfrow = c(2,1)) # same oma as above
plot(lm.SR, which = 1:2, sub.caption = "Saving Rates, n=50, p=5")

## Cook's distance tweaking
par(mfrow = c(2,3)) # same oma ...
plot(lm.SR, which = 1:6, cook.col = "royalblue")

## A case where over plotting of the "legend" is to be avoided:
if(dev.interactive(TRUE)) getOption("device")(height = 6, width = 4)
par(mfrow = c(3,1), mar = c(5,5,4,2)/2 +.1, mgp = c(1.4, .5, 0))
plot(lm.SR, which = 5, extend.ylim.f = c(0.2, 0.08))
plot(lm.SR, which = 5, cook.lty = "dotdash",
     cook.legendChanges = list(x = "bottomright", legend = "Cook"))
plot(lm.SR, which = 5, cook.legendChanges = NULL)  # no "legend"


par(opar) # reset par()s

作者

John Maindonald and Martin Maechler.

参考

Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.

Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. London: Chapman and Hall.

Firth, D. (1991) Generalized Linear Models. In Hinkley, D. V. and Reid, N. and Snell, E. J., eds: Pp. 55-82 in Statistical Theory and Modelling. In Honour of Sir David Cox, FRS. London: Chapman and Hall.

Hinkley, D. V. (1975). On power transformations to symmetry. Biometrika, 62, 101-111. doi:10.2307/2334491.

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. London: Chapman and Hall.

Dunn, P.K. and Smyth G.K. (2018) Generalized Linear Models with Examples in R. New York: Springer-Verlag.

也可以看看

termplot、lm.influence、cooks.distance、hatvalues。

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 Plot Diagnostics for an lm Object。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。