R gam.mh 具有 gam 拟合的简单后验模拟

R语言 gam.mh 位于 mgcv 包(package)。

说明

GAM 系数可以直接从高斯近似到系数的后验进行模拟，或者使用简单的 Metropolis Hastings 采样器。另请参阅ginla。

用法

gam.mh(b,ns=10000,burn=1000,t.df=40,rw.scale=.25,thin=1)

参数

`b`	来自 `gam` 的拟合模型对象。不支持`bam` 拟合。
`ns`	要生成的样本数。
`burn`	要丢弃的任何初始烧录期的长度(除了代码之外)。
`t.df`	静态多元 t 建议的自由度。对于较重的尾部提案，较低。
`rw.scale`	生成随机游走提案时缩放后验协方差矩阵的因子。负或非有限跳过随机游走步骤。
`thin`	仅保留每个 `thin` 样本。

细节

后验模拟对于推断模型系数的非线性函数特别有用。模拟后验随机抽取，计算每次抽取的函数，然后从该函数的后验中进行抽取。在许多情况下，模型系数后验的高斯近似是准确的，并且从中生成的样本可以被视为来自系数后验的样本。请参阅下面的示例代码。这种方法在计算上非常高效。

在其他情况下，高斯近似可能会变得很差。一个典型的例子是在具有 log 或 logit 链接的空间模型中，当存在大面积的观测值仅包含零时。在这种情况下，线性预测器很难识别，高斯近似可能变得毫无用处(下面提供了一个示例)。在这种情况下，使用 Metropolis Hastings 采样器从后验模拟有时会很有用。一种简单的方法将基于后验高斯近似的固定提案与基于近似后验协方差矩阵的缩小版本的随机游走提案交替使用。 gam.mh 实现了这一点。固定提案通常促进快速混合，而随机游走组件确保链不会陷入固定高斯提案密度远低于后验密度的区域。

该函数报告两种类型步骤的接受率。如果随机游走接受概率高于四分之一，那么 rw.step 可能应该增加。同样，如果接受率太低，则应降低接受率。随机游走步骤可以完全关闭(见上文)，但如果这样做，检查链是否有卡住的部分很重要。

值

包含矩阵 bs 中保留的模拟系数和接受概率的两个条目的列表。

例子

library(mgcv)
set.seed(3);n <- 400

############################################
## First example: simulated Tweedie model...
############################################

dat <- gamSim(1,n=n,dist="poisson",scale=.2)
dat$y <- rTweedie(exp(dat$f),p=1.3,phi=.5) ## Tweedie response
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=tw(),
          data=dat,method="REML")

## simulate directly from Gaussian approximate posterior...
br <- rmvn(1000,coef(b),vcov(b))

## Alternatively use MH sampling...
br <- gam.mh(b,thin=2,ns=2000,rw.scale=.15)$bs
## If 'coda' installed, can check effective sample size
## require(coda);effectiveSize(as.mcmc(br))

## Now compare simulation results and Gaussian approximation for
## smooth term confidence intervals...
x <- seq(0,1,length=100)
pd <- data.frame(x0=x,x1=x,x2=x,x3=x)
X <- predict(b,newdata=pd,type="lpmatrix")
par(mfrow=c(2,2))
for(i in 1:4) {
  plot(b,select=i,scale=0,scheme=1)
  ii <- b$smooth[[i]]$first.para:b$smooth[[i]]$last.para
  ff <- X[,ii]%*%t(br[,ii]) ## posterior curve sample
  fq <- apply(ff,1,quantile,probs=c(.025,.16,.84,.975))
  lines(x,fq[1,],col=2,lty=2);lines(x,fq[4,],col=2,lty=2)
  lines(x,fq[2,],col=2);lines(x,fq[3,],col=2)
}

###############################################################
## Second example, where Gaussian approximation is a failure...
###############################################################

y <- c(rep(0, 89), 1, 0, 1, 0, 0, 1, rep(0, 13), 1, 0, 0, 1, 
       rep(0, 10), 1, 0, 0, 1, 1, 0, 1, rep(0,4), 1, rep(0,3),  
       1, rep(0, 3), 1, rep(0, 10), 1, rep(0, 4), 1, 0, 1, 0, 0, 
       rep(1, 4), 0, rep(1, 5), rep(0, 4), 1, 1, rep(0, 46))
set.seed(3);x <- sort(c(0:10*5,rnorm(length(y)-11)*20+100))
b <- gam(y ~ s(x, k = 15),method = 'REML', family = binomial)
br <- gam.mh(b,thin=2,ns=2000,rw.scale=.4)$bs
X <- model.matrix(b)
par(mfrow=c(1,1))
plot(x, y, col = rgb(0,0,0,0.25), ylim = c(0,1))
ff <- X%*%t(br) ## posterior curve sample
linv <- b$family$linkinv
## Get intervals for the curve on the response scale...
fq <- linv(apply(ff,1,quantile,probs=c(.025,.16,.5,.84,.975)))
lines(x,fq[1,],col=2,lty=2);lines(x,fq[5,],col=2,lty=2)
lines(x,fq[2,],col=2);lines(x,fq[4,],col=2)
lines(x,fq[3,],col=4)
## Compare to the Gaussian posterior approximation
fv <- predict(b,se=TRUE)
lines(x,linv(fv$fit))
lines(x,linv(fv$fit-2*fv$se.fit),lty=3)
lines(x,linv(fv$fit+2*fv$se.fit),lty=3)
## ... Notice the useless 95% CI (black dotted) based on the
## Gaussian approximation!

作者

Simon N. Wood simon.wood@r-project.org

参考

Wood, S.N. (2015) Core Statistics, Cambridge

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 Simple posterior simulation with gam fits。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。