R ziP GAM 零膨胀(跨栏)泊松回归族

R语言 ziP 位于 mgcv 包(package)。

说明

系列与 gam 或 bam 一起使用，当零概率的互补对数对数线性依赖于泊松参数的对数时，实现零膨胀泊松数据的回归。请务必小心使用，请注意，仅仅具有许多零响应观测值并不表示零膨胀：问题是在给定指定模型的情况下是否有太多零。

这种模型实际上仅适用于没有任何协变量有助于解释数据中的零的情况。如果您的协变量预测哪些观测值可能具有零均值，那么在此基础上添加零膨胀模型可能会导致可识别性问题。可识别性问题可能会导致拟合失败，或者导致线性预测变量或预测值出现荒谬的值。

用法

ziP(theta = NULL, link = "identity",b=0)

参数

`theta`	控制零通胀率的均值线性变换的斜率和截距的 2 个参数。如果提供则视为固定参数( `\theta_1` 和 `\theta_2` )，否则进行估计。
`link`	链接函数：目前仅支持`"identity"`。
`b`	非负常数，指定零通胀率对线性预测变量的最小依赖性。

细节

零计数的概率由 1-p 给出，而计数的概率 y>0 由截断的泊松概率函数 p\mu^y/((\exp(\mu)-1)y!) 给出。线性预测器给出 \log \mu ，而 \eta = \log(-\log(1-p)) 和 \eta = \theta_1 + \{b+\exp(\theta_2)\} \log \mu 。 theta 参数与平滑参数一起进行估计。从零开始增加 b 参数可以大大减少可识别性问题，特别是当非零数据很少时。

该模型的拟合值为泊松参数的对数。将 predict 函数与 type=="response" 结合使用以获得预测的预期响应。请注意，模型摘要中报告的 theta 参数为 \theta_1 和 b + \exp(\theta_2) 。

这些模型应该经过非常仔细的检查，特别是在拟合尚未收敛的情况下。建立具有可识别性问题的模型非常容易，特别是如果数据不是真正的零膨胀，而只是有许多零，因为协变量空间的某些部分的均值非常低。请参阅示例以了解一些明显的检查。认真对待收敛警告。

值

类 extended.family 的对象。

警告

零膨胀模型通常是over-used。数据中存在大量零本身并不意味着通货膨胀为零。 *考虑到模型平均值*，有太多零可能意味着零通胀。

例子


rzip <- function(gamma,theta= c(-2,.3)) {
## generate zero inflated Poisson random variables, where 
## lambda = exp(gamma), eta = theta[1] + exp(theta[2])*gamma
## and 1-p = exp(-exp(eta)).
   y <- gamma; n <- length(y)
   lambda <- exp(gamma)
   eta <- theta[1] + exp(theta[2])*gamma
   p <- 1- exp(-exp(eta))
   ind <- p > runif(n)
   y[!ind] <- 0
   np <- sum(ind)
   ## generate from zero truncated Poisson, given presence...
   y[ind] <- qpois(runif(np,dpois(0,lambda[ind]),1),lambda[ind])
   y
} 

library(mgcv)
## Simulate some ziP data...
set.seed(1);n<-400
dat <- gamSim(1,n=n)
dat$y <- rzip(dat$f/4-1)

b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=ziP(),data=dat)

b$outer.info ## check convergence!!
b
plot(b,pages=1)
plot(b,pages=1,unconditional=TRUE) ## add s.p. uncertainty 
gam.check(b)
## more checking...
## 1. If the zero inflation rate becomes decoupled from the linear predictor, 
## it is possible for the linear predictor to be almost unbounded in regions
## containing many zeroes. So examine if the range of predicted values 
## is sane for the zero cases? 
range(predict(b,type="response")[b$y==0])

## 2. Further plots...
par(mfrow=c(2,2))
plot(predict(b,type="response"),residuals(b))
plot(predict(b,type="response"),b$y);abline(0,1,col=2)
plot(b$linear.predictors,b$y)
qq.gam(b,rep=20,level=1)

## 3. Refit fixing the theta parameters at their estimated values, to check we 
## get essentially the same fit...
thb <- b$family$getTheta()
b0 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=ziP(theta=thb),data=dat)
b;b0

## Example fit forcing minimum linkage of prob present and
## linear predictor. Can fix some identifiability problems.
b2 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=ziP(b=.3),data=dat)

作者

Simon N. Wood simon.wood@r-project.org

参考

Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association 111, 1548-1575 doi:10.1080/01621459.2016.1180986

也可以看看

ziplss

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 GAM zero-inflated (hurdle) Poisson regression family。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。