R plot.lm 繪製 lm 對象的診斷圖

R語言 plot.lm 位於 stats 包(package)。

說明

目前有六個圖(可通過 which 選擇)：殘差與擬合值的圖、\sqrt{| residuals |} 與擬合值的 Scale-Location 圖、Q-Q 殘差圖、Cook 距離與行標簽的圖、殘差與杠杆的關係圖，以及庫克距離與杠杆/(1-杠杆)的關係圖。默認情況下，提供前三個和5。

用法

## S3 method for class 'lm'
plot(x, which = c(1,2,3,5), 
     caption = list("Residuals vs Fitted", "Q-Q Residuals",
       "Scale-Location", "Cook's distance",
       "Residuals vs Leverage",
       expression("Cook's dist vs Leverage* " * h[ii] / (1 - h[ii]))),
     panel = if(add.smooth) function(x, y, ...)
              panel.smooth(x, y, iter=iter.smooth, ...) else points,
     sub.caption = NULL, main = "",
     ask = prod(par("mfcol")) < length(which) && dev.interactive(),
     ...,
     id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75,
     qqline = TRUE, cook.levels = c(0.5, 1.0),
     cook.col = 8, cook.lty = 2, cook.legendChanges = list(),
     add.smooth = getOption("add.smooth"),
     iter.smooth = if(isGlm) 0 else 3,
     label.pos = c(4,2),
     cex.caption = 1, cex.oma.main = 1.25
   , extend.ylim.f = 0.08
     )

參數

`x`	`lm` 對象，通常是 `lm` 或 `glm` 的結果。
`which`	數字的子集 `1:6` ，默認為 `1:3, 5` ，指的是 “殘差與擬合”，又名“Tukey-Anscombe”圖 “殘留Q-Q”情節 "Scale-Location" 《庫克的距離》 “殘差與杠杆” “庫克距離與 Lev./(1-Lev.)” 另請參閱下麵的“詳細信息”。
`caption`	標題出現在圖上方； `character` 向量或有效圖形注釋的`list`，請參見 `as.graphicsAnnot` ，長度為 6，j-th 對應於 `which[j]` 條目，另請參見“用法”中的默認向量。可以設置為 `""` 或 `NA` 以抑製所有字幕。
`panel`	麵板函數。 `points` 的有用替代方案 `panel.smooth` 可以通過 `add.smooth = TRUE` 選擇。
`sub.caption`	通用標題——如果有多個數字，則位於數字之上；否則用作 `sub` (s. `title` )。如果 `NULL` ，默認情況下，使用 `deparse(x$call)` 的可能縮寫版本。
`main`	除了 `caption` 之外，每個圖的標題。
`ask`	邏輯性；如果 `TRUE` ，則在每個繪圖之前都會詢問用戶，請參閱`par(ask=.)` 。
`...`	要傳遞給繪圖函數的其他參數。
`id.n`	每個圖中要標記的點數，從最極端的開始。
`labels.id`	標簽向量，從中選擇極值點的標簽。 `NULL` 使用觀察數。
`cex.id`	點標簽的放大。
`qqline`	邏輯指示是否應將 `qqline()` 添加到正常的 Q-Q 圖中。
`cook.levels`	繪製輪廓的庫克距離的級別。
`cook.col` , `cook.lty`	用於這些輪廓線的顏色和線條類型。
`cook.legendChanges`	`legend` 的參數 `list` (或 `NULL` 以抑製調用)，應從 `plot.lm()` 默認值 `list(x = "bottomleft", legend = "Cook's distance", lty = cook.lty, col = cook.col, text.col = cook.col, bty = "n", x.intersp = 1/4, y.intersp = 1/8)` 進行修改(或添加到)。
`add.smooth`	邏輯指示是否應將平滑器添加到大多數繪圖中；另請參見上麵的`panel`。
`iter.smooth`	魯棒性迭代次數，`panel.smooth()` 中的參數 `iter` ；默認情況下，`glm` 擬合不使用此類迭代，這對於二元觀測的(主要)情況特別理想，而且也適用於響應分布可能高度傾斜的其他模型。
`label.pos`	分別針對圖 1-3、5、6 的圖表左半部分和右半部分的標簽定位。
`cex.caption`	控製 `caption` 的大小。
`cex.oma.main`	僅當有多個數字時，才控製 `sub.caption` 的大小。
`extend.ylim.f`	長度為 1 或 2 的數值向量，當 `id.n` 非空時，在 `ylim <- extendrange(r=ylim, f = *)` 中用於繪圖 `1` 和 `5`。

細節

sub.caption(默認情況下，函數調用)在繪圖位於不同頁麵上時在每個繪圖上顯示為副標題(在 x 軸標題下)，或者在存在多個繪圖時在外邊距(如果有)中顯示為副標題每頁。

“Scale-Location”圖 (which=3)，也稱為“Spread-Location”或“S-L”圖，取絕對殘差的平方根以減少偏度(\sqrt{| E |} 的偏度比| E | 高斯 zero-mean E )。

‘S-L’、Q-Q 和 Residual-Leverage ( which=5 ) 圖使用具有相同方差的標準化殘差(在假設下)。它們以 R_i / (s \times \sqrt{1 - h_{ii}}) 形式給出，其中 ‘leverages’ h_{ii} 是帽子矩陣 influence()$hat (另請參見 hat )的對角線條目，並且 Residual-Leverage 圖使用標準化 Pearson 殘差 ( residuals.glm(type = "pearson") ) 為 R[i] 。

Residual-Leverage 圖 ( which=5 ) 顯示 cook.levels 值(默認為 0.5 和 1)的庫克距離相等的輪廓，並忽略杠杆為 1 的情況並帶有警告。如果杠杆是恒定的(通常是平衡 aov 情況下的情況)，則繪圖將使用因子水平組合而不是 x 軸的杠杆。 (因子水平按平均擬合值排序。)

在庫克距離與杠杆/(1-杠杆) (= “leverage*”) 圖中 ( which=6 ) 中，大小相等的標準化殘差 ( rstandard(.) ) 的輪廓是穿過原點的線。這些線標有幅度。 x 軸標有(非等距)杠杆 h_{ii} 。

對於 glm 情況，Q-Q 圖基於標準化偏差殘差的絕對值。當應用鞍點近似時，它們具有近似的半正態分布。鞍點近似對於正態和逆高斯族是精確的，並且對於色散較小(大形狀)的 Gamma 族以及計數較大的泊鬆和二項式族近似成立(Dunn 和 Smyth 2018)。

例子

require(graphics)

## Analysis of the life-cycle savings data
## given in Belsley, Kuh and Welsch.
lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
plot(lm.SR)

## 4 plots on 1 page;
## allow room for printing model formula in outer margin:
par(mfrow = c(2, 2), oma = c(0, 0, 2, 0)) -> opar
plot(lm.SR)
plot(lm.SR, id.n = NULL)                 # no id's
plot(lm.SR, id.n = 5, labels.id = NULL)  # 5 id numbers

## Was default in R <= 2.1.x:
## Cook's distances instead of Residual-Leverage plot
plot(lm.SR, which = 1:4)

## All the above fit a smooth curve where applicable
## by default unless "add.smooth" is changed.
## Give a smoother curve by increasing the lowess span :
plot(lm.SR, panel = function(x, y) panel.smooth(x, y, span = 1))

par(mfrow = c(2,1)) # same oma as above
plot(lm.SR, which = 1:2, sub.caption = "Saving Rates, n=50, p=5")

## Cook's distance tweaking
par(mfrow = c(2,3)) # same oma ...
plot(lm.SR, which = 1:6, cook.col = "royalblue")

## A case where over plotting of the "legend" is to be avoided:
if(dev.interactive(TRUE)) getOption("device")(height = 6, width = 4)
par(mfrow = c(3,1), mar = c(5,5,4,2)/2 +.1, mgp = c(1.4, .5, 0))
plot(lm.SR, which = 5, extend.ylim.f = c(0.2, 0.08))
plot(lm.SR, which = 5, cook.lty = "dotdash",
     cook.legendChanges = list(x = "bottomright", legend = "Cook"))
plot(lm.SR, which = 5, cook.legendChanges = NULL)  # no "legend"


par(opar) # reset par()s

作者

John Maindonald and Martin Maechler.

參考

Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.

Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. London: Chapman and Hall.

Firth, D. (1991) Generalized Linear Models. In Hinkley, D. V. and Reid, N. and Snell, E. J., eds: Pp. 55-82 in Statistical Theory and Modelling. In Honour of Sir David Cox, FRS. London: Chapman and Hall.

Hinkley, D. V. (1975). On power transformations to symmetry. Biometrika, 62, 101-111. doi:10.2307/2334491.

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. London: Chapman and Hall.

Dunn, P.K. and Smyth G.K. (2018) Generalized Linear Models with Examples in R. New York: Springer-Verlag.

也可以看看

termplot、lm.influence、cooks.distance、hatvalues。

相關用法

注：本文由純淨天空篩選整理自R-devel大神的英文原創作品 Plot Diagnostics for an lm Object。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。