R ziP GAM 零膨脹(跨欄)泊鬆回歸族

R語言 ziP 位於 mgcv 包(package)。

說明

係列與 gam 或 bam 一起使用，當零概率的互補對數對數線性依賴於泊鬆參數的對數時，實現零膨脹泊鬆數據的回歸。請務必小心使用，請注意，僅僅具有許多零響應觀測值並不表示零膨脹：問題是在給定指定模型的情況下是否有太多零。

這種模型實際上僅適用於沒有任何協變量有助於解釋數據中的零的情況。如果您的協變量預測哪些觀測值可能具有零均值，那麽在此基礎上添加零膨脹模型可能會導致可識別性問題。可識別性問題可能會導致擬合失敗，或者導致線性預測變量或預測值出現荒謬的值。

用法

ziP(theta = NULL, link = "identity",b=0)

參數

`theta`	控製零通脹率的均值線性變換的斜率和截距的 2 個參數。如果提供則視為固定參數( `\theta_1` 和 `\theta_2` )，否則進行估計。
`link`	鏈接函數：目前僅支持`"identity"`。
`b`	非負常數，指定零通脹率對線性預測變量的最小依賴性。

細節

零計數的概率由 1-p 給出，而計數的概率 y>0 由截斷的泊鬆概率函數 p\mu^y/((\exp(\mu)-1)y!) 給出。線性預測器給出 \log \mu ，而 \eta = \log(-\log(1-p)) 和 \eta = \theta_1 + \{b+\exp(\theta_2)\} \log \mu 。 theta 參數與平滑參數一起進行估計。從零開始增加 b 參數可以大大減少可識別性問題，特別是當非零數據很少時。

該模型的擬合值為泊鬆參數的對數。將 predict 函數與 type=="response" 結合使用以獲得預測的預期響應。請注意，模型摘要中報告的 theta 參數為 \theta_1 和 b + \exp(\theta_2) 。

這些模型應該經過非常仔細的檢查，特別是在擬合尚未收斂的情況下。建立具有可識別性問題的模型非常容易，特別是如果數據不是真正的零膨脹，而隻是有許多零，因為協變量空間的某些部分的均值非常低。請參閱示例以了解一些明顯的檢查。認真對待收斂警告。

值

類 extended.family 的對象。

警告

零膨脹模型通常是over-used。數據中存在大量零本身並不意味著通貨膨脹為零。 *考慮到模型平均值*，有太多零可能意味著零通脹。

例子


rzip <- function(gamma,theta= c(-2,.3)) {
## generate zero inflated Poisson random variables, where 
## lambda = exp(gamma), eta = theta[1] + exp(theta[2])*gamma
## and 1-p = exp(-exp(eta)).
   y <- gamma; n <- length(y)
   lambda <- exp(gamma)
   eta <- theta[1] + exp(theta[2])*gamma
   p <- 1- exp(-exp(eta))
   ind <- p > runif(n)
   y[!ind] <- 0
   np <- sum(ind)
   ## generate from zero truncated Poisson, given presence...
   y[ind] <- qpois(runif(np,dpois(0,lambda[ind]),1),lambda[ind])
   y
} 

library(mgcv)
## Simulate some ziP data...
set.seed(1);n<-400
dat <- gamSim(1,n=n)
dat$y <- rzip(dat$f/4-1)

b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=ziP(),data=dat)

b$outer.info ## check convergence!!
b
plot(b,pages=1)
plot(b,pages=1,unconditional=TRUE) ## add s.p. uncertainty 
gam.check(b)
## more checking...
## 1. If the zero inflation rate becomes decoupled from the linear predictor, 
## it is possible for the linear predictor to be almost unbounded in regions
## containing many zeroes. So examine if the range of predicted values 
## is sane for the zero cases? 
range(predict(b,type="response")[b$y==0])

## 2. Further plots...
par(mfrow=c(2,2))
plot(predict(b,type="response"),residuals(b))
plot(predict(b,type="response"),b$y);abline(0,1,col=2)
plot(b$linear.predictors,b$y)
qq.gam(b,rep=20,level=1)

## 3. Refit fixing the theta parameters at their estimated values, to check we 
## get essentially the same fit...
thb <- b$family$getTheta()
b0 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=ziP(theta=thb),data=dat)
b;b0

## Example fit forcing minimum linkage of prob present and
## linear predictor. Can fix some identifiability problems.
b2 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=ziP(b=.3),data=dat)

作者

Simon N. Wood simon.wood@r-project.org

參考

Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association 111, 1548-1575 doi:10.1080/01621459.2016.1180986

也可以看看

ziplss

相關用法

注：本文由純淨天空篩選整理自R-devel大神的英文原創作品 GAM zero-inflated (hurdle) Poisson regression family。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。