R gfam 分組家庭 - 純淨天空

R語言 gfam 位於 mgcv 包(package)。

說明

與 gam 或 bam 一起使用的係列，允許單變量響應向量由來自多個不同分布的變量組成。響應變量以 2 列矩陣的形式提供，其中第一列包含響應觀測值，第二列索引其來源的分布(族)。 gfam 將族列表作為其單個參數。

對於通過共享某些組件的模型鏈接的不同源的數據進行建模非常有用。不共享的平滑模型組件通常使用 by 變量進行處理(請參閱 gam.models )。

用法

gfam(fl)

參數

`fl`	家庭名單。這些可以是繼承自 `family` 或 `extended.family` 的任何係列，可與 `gam` 一起使用，前提是它們通常不需要矩陣響應變量。

細節

gfam 的每個組件函數使用列表 fl 中提供的族來獲取該族數據子集所需的數量，並適當地組合結果。例如，它通過計算應用於其數據子集的族特定偏差和每個族的導數並對它們求和，提供模型的總偏差(兩倍負對數似然)及其導數。其他量的計算方法相同。

正則指數族計算的數量與擴展族不同，因此 gfam 在內部將這些族產生的結果轉換為 extended.family 形式。

顯然，尺度參數必須針對每個族單獨處理，並被視為要估計的參數，就像其他extended.family非位置分布參數一樣。這又是在內部處理的。此要求是始終生成 extended.family 的部分原因，即使 fl 的所有元素都是標準指數族也是如此。因此，平滑參數估計總是通過 REML 或 NCV 進行。

請注意，零偏差目前是通過假設每個係列的單個參數模型而不是僅一個參數來計算的，這可能會稍微降低解釋的偏差。還要注意，殘差檢查可能應該通過按家庭分解殘差來完成。因此，不提供函數來促進 qq.gam 的殘差檢查。

對響應量表的預測需要提供家族索引向量以及響應名稱，作為新預測數據的一部分。但是，諸如 ocat 等通常為預測類型 "response" 生成矩陣預測的係列，在成為 gfam 的一部分時將無法執行此操作。

gfam 依賴於 Wood、Pya 和 Saefken (2016) 中的方法。

值

類 extended.family 的對象。

例子

library(mgcv)
## a mixed family simulator function to play with...
sim.gfam <- function(dist,n=400) {
## dist can be norm, pois, gamma, binom, nbinom, tw, ocat (R assumed 4)
## links used are identitiy, log or logit.
  dat <- gamSim(1,n=n,verbose=FALSE)
  nf <- length(dist) ## how many families
  fin <- c(1:nf,sample(1:nf,n-nf,replace=TRUE)) ## family index
  dat[,6:10] <- dat[,6:10]/5 ## a scale that works for all links used
  y <- dat$y;
  for (i in 1:nf) {
    ii <- which(fin==i) ## index of current family
    ni <- length(ii);fi <- dat$f[ii]
    if (dist[i]=="norm") {
      y[ii] <- fi + rnorm(ni)*.5
    } else if (dist[i]=="pois") {
      y[ii] <- rpois(ni,exp(fi))
    } else if (dist[i]=="gamma") {
      scale <- .5
      y[ii] <- rgamma(ni,shape=1/scale,scale=exp(fi)*scale)
    } else if (dist[i]=="binom") {
      y[ii] <- rbinom(ni,1,binomial()$linkinv(fi))
    } else if (dist[i]=="nbinom") {
      y[ii] <- rnbinom(ni,size=3,mu=exp(fi))
    } else if (dist[i]=="tw") {
      y[ii] <- rTweedie(exp(fi),p=1.5,phi=1.5)
    } else if (dist[i]=="ocat") {
      alpha <- c(-Inf,1,2,2.5,Inf)
      R <- length(alpha)-1
      yi <- fi
      u <- runif(ni)
      u <- yi + log(u/(1-u)) 
      for (j in 1:R) {
        yi[u > alpha[j]&u <= alpha[j+1]] <- j
      }
      y[ii] <- yi
    }
  }
  dat$y <- cbind(y,fin)
  dat
} ## sim.gfam

## some examples

dat <- sim.gfam(c("binom","tw","norm"))
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),
         family=gfam(list(binomial,tw,gaussian)),data=dat)
predict(b,data.frame(y=1:3,x0=c(.5,.5,.5),x1=c(.3,.2,.3),
        x2=c(.2,.5,.8),x3=c(.1,.5,.9)),type="response",se=TRUE)
summary(b)
plot(b,pages=1)

## set up model so that only the binomial observations depend
## on x0...

dat$id1 <- as.numeric(dat$y[,2]==1)
b1 <- gam(y~s(x0,by=id1)+s(x1)+s(x2)+s(x3),
         family=gfam(list(binomial,tw,gaussian)),data=dat)
plot(b1,pages=1) ## note the CI width increase

作者

Simon N. Wood simon.wood@r-project.org

參考

Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association 111, 1548-1575 doi:10.1080/01621459.2016.1180986

相關用法

注：本文由純淨天空篩選整理自R-devel大神的英文原創作品 Grouped families。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。