R gamm 广义加性混合模型

R语言 gamm 位于 mgcv 包(package)。

说明

通过在正常错误恒等链接情况下调用 lme 或调用 gammPQL(对 MASS 库中的 glmmPQL 进行修改)，将指定的广义加性混合模型 (GAMM) 拟合到数据否则。在后一种情况下，估计值仅约为 MLE。该例程通常比 gam 慢，并且在数值上不太稳健。

要使用 lme4 代替 nlme 作为底层拟合引擎，请参阅包 gamm4 中的 gamm4 。

平滑在对 gam 的调用中指定为固定效应模型公式的一部分，但平滑的摆动分量被视为随机效应。 lme 可用的随机效应结构和相关性结构用于指定其他随机效应和相关性。

假设随机效应和相关结构主要用于对数据中的残差相关性进行建模，并且主要兴趣在于对包括平滑在内的固定效应模型公式中的项进行推断。因此，例程计算固定效应公式中所有项(包括平滑项)系数的后验协方差矩阵。

为了有效地使用此函数，熟悉 gam 和 lme 的使用会有所帮助。

用法

gamm(formula,random=NULL,correlation=NULL,family=gaussian(),
data=list(),weights=NULL,subset=NULL,na.action,knots=NULL,
control=list(niterEM=0,optimMethod="L-BFGS-B",returnObject=TRUE),
niterPQL=20,verbosePQL=TRUE,method="ML",drop.unused.levels=TRUE,
mustart=NULL, etastart=NULL,...)

参数

`formula`	GAM 公式(另请参见 `formula.gam` 和 `gam.models` )。这类似于 `glm` 的公式，只不过可以将平滑项( `s` 、 `te` 等)添加到公式的右侧。请注意，不支持用于平滑和固定平滑参数的`id`。任何偏移量都应在公式中指定。
`random`	在调用 `lme` 时指定的(可选)随机效应结构：仅允许使用 `list` 形式，以便于在 `gamm` 内操作随机效应结构，以处理平滑项。请参阅下面的示例。
`correlation`	可选的 `corStruct` 对象(请参阅 `corClasses` )，用于定义 `lme` 中的相关结构。假定该对象的公式中的任何分组因子都嵌套在任何随机效应分组因子中，无需在公式中明确说明(这与 `lme` 的行为略有不同)。这是一般情况下的 GEE 关联方法。请参阅下面的示例。
`family`	`family` 用于调用 `glm` 或 `gam` 。如果没有偏移项，则带有恒等链接的默认 `gaussian` 会导致 `gamm` 通过直接调用 `lme` 进行拟合，否则使用 `gammPQL`。
`data`	包含模型响应变量和公式所需的协变量的 DataFrame 或列表。默认情况下，变量取自 `environment(formula)` ，通常是调用 `gamm` 的环境。
`weights`	在一般情况下，权重与`glm`权重含义相同。 `lme` 类型权重参数只能在恒等链接高斯情况下使用，没有偏移量(有关如何使用此类参数的详细信息，请参阅 `lme` 的文档)。
`subset`	一个可选向量，指定要在拟合过程中使用的观测子集。
`na.action`	一个函数，指示当数据包含“NA”时应该发生什么。默认值由 ‘options’ 的“na.action”设置设置，如果未设置，则为“na.fail”。 “factory-fresh” 默认为“na.omit”。
`knots`	这是一个可选列表，包含用户指定的用于基础构造的结值。不同的项可以使用不同数量的结，除非它们共享协变量。
`control`	`lme` 的拟合控制参数列表，用于替换 `lmeControl` 返回的默认值。请注意 `lme` 使用的 EM 迭代次数的设置：平滑是使用自定义 `pdMat` 类设置的，目前 EM 迭代代码不支持这些类。如果您提供控制值列表，建议也包含 `niterEM=0` ，并且如果您想扰乱模型拟合中使用的起始值(通常是更差的值！)，则仅从 0 开始增加。仅当您的 R 版本没有 `nlminb` 优化器函数时，才使用 `optimMethod` 选项。
`niterPQL`	PQL 迭代的最大次数(如果有)。
`verbosePQL`	PQL 是否应该报告其进展情况？
`method`	当直接调用`lme` 时，在高斯加性混合模型情况下使用`"ML"` 或`"REML"` 中的哪一个。在一般情况下被忽略(或者如果模型有偏移)，在这种情况下使用`gammPQL`。
`drop.unused.levels`	默认情况下，未使用的级别会在拟合之前从因子中删除。对于某些涉及因子变量的平滑，您可能需要将其关闭。仅当您知道自己在做什么时才这样做。
`mustart`	如果使用 PQL，则为平均值的起始值。
`etastart`	如果使用 PQL，则线性预测器的起始值(如果提供，则覆盖 `mustart`)。
`...`	传递的进一步论点，例如至`lme`

细节

Wahba (1983) 和 Silverman (1985) 引入的样条平滑贝叶斯模型开启了估计广义加性模型中项的平滑程度的可能性，作为被视为随机效应的平滑项的摆动分量的方差。一些作者已经认识到这一点(参见 Wang 1998；Ruppert、Wand 和 Carroll，2003)，并且在正常误差中，可以使用通用线性混合效应建模软件(例如 lme)执行恒等链接情况估计。在广义情况下，迄今为止仅可进行近似推断，例如使用 Breslow 和 Clayton (1993) 的惩罚 Quasi-Likelihood 方法，如 Venables 和 Ripley (2002) 在 glmmPQL 中实现的方法。这种方法的一个优点是，它允许通过随机效应或 nlme 库中可用的相关结构来处理相关错误(使用超出严格相加情况的相关结构相当于使用 GEE 方法进行拟合)。

有关 GAM 如何表示为混合模型以及如何使用 gamm 中的 lme 或 gammPQL 进行估计的一些细节可以在 Wood (2004 ,2006a,b) 中找到。此外gamm 获得所有固定效应和平滑项参数的后验协方差矩阵。该方法类似于 Lin & Zhang (1999) 中说明的方法 - 基于参数的估计，获得由权重、相关性和随机效应结构隐含的数据(或广义情况下的伪数据)的协方差矩阵这些项用于获得固定和平滑效果的后验协方差矩阵。

用于表示平滑项的基数与 gam 中使用的基数相同，但自适应平滑基数不可用。使用 predict.gam 从返回的 gam 对象进行预测很简单，但这会将随机效应设置为零。如果您想使用设置为预测值的随机效应进行预测，那么您可以调整下面示例中给出的预测代码。

如果 lme 收敛失败，请考虑修改 options(mgcv.vc.logrange) ：减少它有助于消除可能性的不确定性(如果这是问题所在)，但减少太大可能会强制过度平滑或欠平滑。有关此选项的更多信息，请参阅notExp2。如果失败，您可以尝试增加 control 中的 niterEM 选项：这会扰乱拟合中使用的起始值，但通常是可能性较低的值！请注意，此版本的 gamm 最适合与 R 2.2.0 或更高版本以及 nlme 、3.1-62 及更高版本一起使用，因为它们使用改进的优化器。

值

返回包含两项的列表：

`gam`	类 `gam` 的对象，与 GCV/UBRE 模型选择相关的信息较少。目前，这包含足够的信息来使用 `predict` 、 `summary` 和 `print` 方法以及 `vis.gam` ，但不能使用例如`anova` 方法函数用于比较模型。这是基于使用 `gammPQL` 时的工作模型。
`lme`	`lme` 或 `gammPQL` 返回的拟合模型对象。请注意，由于 GAMM 的拆分方式以及对 `lme` 和 `gammPQL` 的调用的构造方式，模型公式和分组结构可能看起来相当奇怪。

警告

gamm 的参数列表与 gam 略有不同，gam 参数(例如提供给 gamm 的 gamma 将被忽略)。

gamm 在处理二进制数据时表现不佳，因为它使用 PQL。最好将 gam 与 s(...,bs="re") 术语或 gamm4 一起使用。

gamm 假设您知道自己在做什么！例如，与 MASS 中的 glmmPQL 不同，它将在 PQL 迭代收敛时从工作模型返回完整的 lme 对象，包括“对数似然”，即使这不是拟合 GAMM 的似然。

如果相关结构用于非常大的数据组，则该例程将非常慢并且占用大量内存。例如绝对不建议尝试使用数千个数据运行示例部分中的空间示例：通常相关性应该仅适用于可以由分组因子定义的集群内，并且如果这些集群不会变得太大，则拟合通常是合适的可能的。

模型必须包含至少一种随机效应：具有非零平滑参数的平滑，或参数 random 中指定的随机效应。

gamm 在数值上不如 gam 稳定：lme 调用偶尔会失败。请参阅详细信息部分以获取建议，或尝试 ‘gamm4’ 包。

gamm 通常比 gam 慢得多，并且在某些平台上，您可能需要增加 R 可用的内存才能将其用于大型数据集(请参阅 memory.limit )。

请注意，拟合的 GAM 对象中返回的权重是虚拟的，而不是 PQL 迭代使用的权重：这使得部分残差图看起来很奇怪。

请注意，返回对象的 gam 对象部分在 gamObject 中定义的所有元素层面并不完整，并且不继承自 glm ：因此例如multi-model anova 调用将不起作用。它也是基于使用 PQL 时的工作模型。

gamm 中用于平滑参数的参数化通过有效无穷大和有效零将它们限制在上方和下方。有关如何更改此设置的详细信息，请参阅notExp2。

不支持链接平滑参数和自适应平滑。

例子

library(mgcv)
## simple examples using gamm as alternative to gam
set.seed(0) 
dat <- gamSim(1,n=200,scale=2)
b <- gamm(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
plot(b$gam,pages=1)
summary(b$lme) # details of underlying lme fit
summary(b$gam) # gam style summary of fitted model
anova(b$gam) 
gam.check(b$gam) # simple checking plots

b <- gamm(y~te(x0,x1)+s(x2)+s(x3),data=dat) 
op <- par(mfrow=c(2,2))
plot(b$gam)
par(op)
rm(dat)


## Add a factor to the linear predictor, to be modelled as random
dat <- gamSim(6,n=200,scale=.2,dist="poisson")
b2 <- gamm(y~s(x0)+s(x1)+s(x2),family=poisson,
           data=dat,random=list(fac=~1))
plot(b2$gam,pages=1)
fac <- dat$fac
rm(dat)
vis.gam(b2$gam)

## In the generalized case the 'gam' object is based on the working
## model used in the PQL fitting. Residuals for this are not
## that useful on their own as the following illustrates...

gam.check(b2$gam) 

## But more useful residuals are easy to produce on a model
## by model basis. For example...

fv <- exp(fitted(b2$lme)) ## predicted values (including re)
rsd <- (b2$gam$y - fv)/sqrt(fv) ## Pearson residuals (Poisson case)
op <- par(mfrow=c(1,2))
qqnorm(rsd);plot(fv^.5,rsd)
par(op)

## now an example with autocorrelated errors....
n <- 200;sig <- 2
x <- 0:(n-1)/(n-1)
f <- 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
e <- rnorm(n,0,sig)
for (i in 2:n) e[i] <- 0.6*e[i-1] + e[i]
y <- f + e
op <- par(mfrow=c(2,2))
## Fit model with AR1 residuals
b <- gamm(y~s(x,k=20),correlation=corAR1())
plot(b$gam);lines(x,f-mean(f),col=2)
## Raw residuals still show correlation, of course...
acf(residuals(b$gam),main="raw residual ACF")
## But standardized are now fine...
acf(residuals(b$lme,type="normalized"),main="standardized residual ACF")
## compare with model without AR component...
b <- gam(y~s(x,k=20))
plot(b);lines(x,f-mean(f),col=2)

## more complicated autocorrelation example - AR errors
## only within groups defined by `fac'
e <- rnorm(n,0,sig)
for (i in 2:n) e[i] <- 0.6*e[i-1]*(fac[i-1]==fac[i]) + e[i]
y <- f + e
b <- gamm(y~s(x,k=20),correlation=corAR1(form=~1|fac))
plot(b$gam);lines(x,f-mean(f),col=2)
par(op) 

## more complex situation with nested random effects and within
## group correlation 

set.seed(0)
n.g <- 10
n<-n.g*10*4
## simulate smooth part...
dat <- gamSim(1,n=n,scale=2)
f <- dat$f
## simulate nested random effects....
fa <- as.factor(rep(1:10,rep(4*n.g,10)))
ra <- rep(rnorm(10),rep(4*n.g,10))
fb <- as.factor(rep(rep(1:4,rep(n.g,4)),10))
rb <- rep(rnorm(4),rep(n.g,4))
for (i in 1:9) rb <- c(rb,rep(rnorm(4),rep(n.g,4)))
## simulate auto-correlated errors within groups
e<-array(0,0)
for (i in 1:40) {
  eg <- rnorm(n.g, 0, sig)
  for (j in 2:n.g) eg[j] <- eg[j-1]*0.6+ eg[j]
  e<-c(e,eg)
}
dat$y <- f + ra + rb + e
dat$fa <- fa;dat$fb <- fb
## fit model .... 
b <- gamm(y~s(x0,bs="cr")+s(x1,bs="cr")+s(x2,bs="cr")+
  s(x3,bs="cr"),data=dat,random=list(fa=~1,fb=~1),
  correlation=corAR1())
plot(b$gam,pages=1)
summary(b$gam)
vis.gam(b$gam)

## Prediction from gam object, optionally adding 
## in random effects. 

## Extract random effects and make names more convenient...
refa <- ranef(b$lme,level=5)
rownames(refa) <- substr(rownames(refa),start=9,stop=20)
refb <- ranef(b$lme,level=6)
rownames(refb) <- substr(rownames(refb),start=9,stop=20)

## make a prediction, with random effects zero...
p0 <- predict(b$gam,data.frame(x0=.3,x1=.6,x2=.98,x3=.77))

## add in effect for fa = "2" and fb="2/4"...
p <- p0 + refa["2",1] + refb["2/4",1] 

## and a "spatial" example...
library(nlme);set.seed(1);n <- 100
dat <- gamSim(2,n=n,scale=0) ## standard example
attach(dat)
old.par<-par(mfrow=c(2,2))
contour(truth$x,truth$z,truth$f)  ## true function
f <- data$f                       ## true expected response
## Now simulate correlated errors...
cstr <- corGaus(.1,form = ~x+z)  
cstr <- Initialize(cstr,data.frame(x=data$x,z=data$z))
V <- corMatrix(cstr) ## correlation matrix for data
Cv <- chol(V)
e <- t(Cv) %*% rnorm(n)*0.05 # correlated errors
## next add correlated simulated errors to expected values
data$y <- f + e ## ... to produce response
b<- gamm(y~s(x,z,k=50),correlation=corGaus(.1,form=~x+z),
         data=data)
plot(b$gam) # gamm fit accounting for correlation
# overfits when correlation ignored.....  
b1 <- gamm(y~s(x,z,k=50),data=data);plot(b1$gam) 
b2 <- gam(y~s(x,z,k=50),data=data);plot(b2)
par(old.par)

作者

Simon N. Wood simon.wood@r-project.org

参考

Breslow, N. E. and Clayton, D. G. (1993) Approximate inference in generalized linear mixed models. Journal of the American Statistical Association 88, 9-25.

Lin, X and Zhang, D. (1999) Inference in generalized additive mixed models by using smoothing splines. JRSSB. 55(2):381-400

Pinheiro J.C. and Bates, D.M. (2000) Mixed effects Models in S and S-PLUS. Springer

Ruppert, D., Wand, M.P. and Carroll, R.J. (2003) Semiparametric Regression. Cambridge

Silverman, B.W. (1985) Some aspects of the spline smoothing approach to nonparametric regression. JRSSB 47:1-52

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Wahba, G. (1983) Bayesian confidence intervals for the cross validated smoothing spline. JRSSB 45:133-150

Wood, S.N. (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association. 99:673-686

Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B 65(1):95-114

Wood, S.N. (2006a) Low rank scale invariant tensor product smooths for generalized additive mixed models. Biometrics 62(4):1025-1036

Wood S.N. (2006b) Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC Press.

Wang, Y. (1998) Mixed effects smoothing spline analysis of variance. J.R. Statist. Soc. B 60, 159-174

https://www.maths.ed.ac.uk/~swood34/

也可以看看

magic 用于相关数据的替代方案，te , s , predict.gam , plot.gam , summary.gam , negbin , vis.gam , pdTens , gamm4 ( https://cran.r-project.org/package=gamm4 )

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 Generalized Additive Mixed Models。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。