R qq.gam gam 模型残差的 QQ 图

R语言 qq.gam 位于 mgcv 包(package)。

说明

采用 gam() 生成的拟合 gam 对象并生成其残差的 QQ 图(以拟合模型系数和尺度参数为条件)。如果满足模型分布假设，那么通常这些图应该接近一条直线(尽管离散数据可能会产生与该线明显的随机偏离)。

用法

qq.gam(object, rep=0, level=.9,s.rep=10,
       type=c("deviance","pearson","response"),
       pch=".", rl.col=2, rep.col="gray80", ...)

参数

`object`	由`gam()`(或`glm` 对象)生成的拟合`gam` 对象。
`rep`	要生成多少个复制数据集来模拟残差分布的分位数。 `0` 会产生一种用于直接计算的高效免模拟方法(如果这对于对象族来说是可能的)。
`level`	如果对分位数使用模拟，则可以为 QQ-plot 提供参考区间，这指定了级别。 0 或更少表示没有间隔，1 或更多则简单地为生成的每个重复绘制 QQ 图。
`s.rep`	在直接计算下将均匀分位数随机化到数据的次数。
`type`	应该绘制什么样的残差？请参阅`residuals.gam`。
`pch`	要使用的情节字符。 19 很好。
`rl.col`	绘图上参考线的颜色。
`rep.col`	参考带或复制参考图的颜色。
`...`	传递给绘图函数的额外图形参数。

细节

模型残差的 QQ-plots 可以通过两种方式之一生成。最便宜的方法通过将均匀分布的分位数与每个数据相关联，并将这些均匀分位数输入与每个数据相关的分位数函数来生成参考分位数。然后使用所得分位数代替每个数据来生成残差的近似分位数。残余分位数是对数据的均匀分位数的 s.rep 随机化进行平均的。

第二种方法是使用直接模拟。对于每个重复，根据拟合模型模拟数据，并计算相应的残差。这会重复 rep 次。分位数很容易从如此获得的残差的经验分布中获得。通过该方法也可以计算参考带。

即使rep 设置为零，如果该族没有可用的分位数函数，例程也会尝试模拟分位数。如果没有可用的随机偏差生成函数族(例如，对于拟族)，则生成正常的QQ-plot。拟合模型系数和尺度参数估计的常规条件。

这些图与 Ben 和 Yohai (2004) 中提出的图非常相似，但制作起来要便宜得多(不建议对 Ben 和 Yohai 中的二进制数据残差进行解释)。

请注意，从拟合到二进制数据的原始残差图几乎不包含有关模型拟合的有用信息。残差是负还是正取决于响应是零还是一。给定符号后，残差的大小完全由拟合值决定。因此，从二进制数据的残差 QQ-plots 中只能检测到最严重的模型违规。要真正检查二进制数据残差的分布假设，您必须能够以某种方式对数据进行分组。除二元以外的二项式模型都可以。

例子


library(mgcv)
## simulate binomial data...
set.seed(0)
n.samp <- 400
dat <- gamSim(1,n=n.samp,dist="binary",scale=.33)
p <- binomial()$linkinv(dat$f) ## binomial p
n <- sample(c(1,3),n.samp,replace=TRUE) ## binomial n
dat$y <- rbinom(n,n,p)
dat$n <- n

lr.fit <- gam(y/n~s(x0)+s(x1)+s(x2)+s(x3)
             ,family=binomial,data=dat,weights=n,method="REML")

par(mfrow=c(2,2))
## normal QQ-plot of deviance residuals
qqnorm(residuals(lr.fit),pch=19,cex=.3)
## Quick QQ-plot of deviance residuals
qq.gam(lr.fit,pch=19,cex=.3)
## Simulation based QQ-plot with reference bands 
qq.gam(lr.fit,rep=100,level=.9)
## Simulation based QQ-plot, Pearson resids, all
## simulated reference plots shown...  
qq.gam(lr.fit,rep=100,level=1,type="pearson",pch=19,cex=.2)

## Now fit the wrong model and check....

pif <- gam(y~s(x0)+s(x1)+s(x2)+s(x3)
             ,family=poisson,data=dat,method="REML")
par(mfrow=c(2,2))
qqnorm(residuals(pif),pch=19,cex=.3)
qq.gam(pif,pch=19,cex=.3)
qq.gam(pif,rep=100,level=.9)
qq.gam(pif,rep=100,level=1,type="pearson",pch=19,cex=.2)

## Example of binary data model violation so gross that you see a problem 
## on the QQ plot...

y <- c(rep(1,10),rep(0,20),rep(1,40),rep(0,10),rep(1,40),rep(0,40))
x <- 1:160
b <- glm(y~x,family=binomial)
par(mfrow=c(2,2))
## Note that the next two are not necessarily similar under gross 
## model violation...
qq.gam(b)
qq.gam(b,rep=50,level=1)
## and a much better plot for detecting the problem
plot(x,residuals(b),pch=19,cex=.3)
plot(x,y);lines(x,fitted(b))

## alternative model
b <- gam(y~s(x,k=5),family=binomial,method="ML")
qq.gam(b)
qq.gam(b,rep=50,level=1)
plot(x,residuals(b),pch=19,cex=.3)
plot(b,residuals=TRUE,pch=19,cex=.3)

作者

Simon N. Wood simon.wood@r-project.org

参考

N.H. Augustin, E-A Sauleaub, S.N. Wood (2012) On quantile quantile plots for generalized linear models Computational Statistics & Data Analysis. 56(8), 2404-2409.

M.G. Ben and V.J. Yohai (2004) JCGS 13(1), 36-47.

https://www.maths.ed.ac.uk/~swood34/

也可以看看

choose.k、gam

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 QQ plots for gam model residuals。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。