R family 模型的族对象

R语言 family 位于 stats 包(package)。

说明

系列对象提供了一种方便的方法来指定 glm 等函数所使用的模型的详细信息。有关如何进行此类模型拟合的详细信息，请参阅 glm 的文档。

用法

family(object, ...)

binomial(link = "logit")
gaussian(link = "identity")
Gamma(link = "inverse")
inverse.gaussian(link = "1/mu^2")
poisson(link = "log")
quasi(link = "identity", variance = "constant")
quasibinomial(link = "logit")
quasipoisson(link = "log")

参数

`link`	模型链接函数的规范。这可以是名称/表达式、文字字符串、长度为 1 的字符向量或 `"link-glm"` 类的对象(例如由 `make.link` 生成)，前提是未通过接下来给出的标准名称之一指定它。 `gaussian`家人接受链接(作为名称)`identity`,`log`和`inverse`;这`binomial`家庭链接`logit`,`probit`,`cauchit`,(分别对应逻辑分布、正态分布和柯西分布函数)`log`和`cloglog`(补充log-log)；这`Gamma`家庭链接`inverse`,`identity`和`log`;这`poisson`家庭链接`log`,`identity`，和`sqrt`;和`inverse.gaussian`家庭链接`1/mu^2`,`inverse`,`identity`和`log`. `quasi` 系列接受链接 `logit` 、 `probit` 、 `cloglog` 、 `identity` 、 `inverse` 、 `log` 、 `1/mu^2` 和 `sqrt` 以及函数`power` 可用于创建电源链接函数。
`variance`	对于除 `quasi` 之外的所有系列，方差函数由系列确定。 `quasi` 系列将接受文字字符串(或不带引号的名称/表达式)规范 `"constant"` 、 `"mu(1-mu)"` 、 `"mu"` 、 `"mu^2"` 和 `"mu^3"` ，一个长度为 1 的字符向量，取以下值之一这些值，或包含组件 `varfun` 、 `validmu` 、 `dev.resids` 、 `initialize` 和 `name` 的列表。
`object`	函数 `family` 访问存储在建模函数创建的对象中的 `family` 对象(例如 `glm` )。
`...`	进一步的参数传递给方法。

细节

family 是一个通用函数，具有类 "glm" 和 "lm" 的方法(后者返回 gaussian() )。

对于 binomial 和 quasibinomial 系列，可以通过以下三种方式之一指定响应：

作为因子：‘success’ 被解释为不具有第一级别的因子(因此通常具有第二级别)。
作为值介于 0 和 1 之间的数值向量，解释为成功案例的比例(案例总数由 weights 给出)。
作为一个两列整数矩阵：第一列给出成功的次数，第二列给出失败的次数。

quasibinomial 和 quasipoisson 系列与 binomial 和 poisson 系列的不同之处仅在于色散参数不固定为 1，因此它们可以模拟过度色散。对于二项式情况，请参阅McCullagh 和 Nelder(1989 年，第 124-8 页)。尽管它们表明(在某些限制下)存在一个模型，其方差与 quasi-binomial 模型中的均值成比例，但请注意，glm 不会计算该模型中的 maximum-likelihood 估计值。 S 的行为更接近准变体。

值

"family" 类的对象(具有简洁的打印方法)。这是一个包含元素的列表

`family`	性格：姓氏。
`link`	字符：链接名称。
`linkfun`	函数：链接。
`linkinv`	function：链接函数的反函数。
`variance`	function：方差作为均值的函数。
`dev.resids`	函数将每个观测值的偏差作为 `(y, mu, wt)` 的函数，由 `residuals` 方法在计算偏差残差时使用。
`aic`	函数给出 AIC 值(如果适用)(但对于准系列为 `NA`)。更准确地说，此函数返回 `-2\ell + 2 s` ，其中 `\ell` 是对数似然，`s` 是估计的尺度参数的数量。请注意，位置参数的惩罚项(通常是 “regression coefficients”)在其他地方添加，例如在 `glm.fit()` 或 `AIC()` 中，请参阅 `glm` 中的 AIC 示例。有关色散参数的假设，请参阅`logLik`。
`mu.eta`	函数：inverse-link 函数相对于线性预测器的导数。如果 inverse-link 函数是 `\mu = g^{-1}(\eta)`，其中 `\eta` 是线性预测变量的值，则此函数返回 `d(g^{-1})/d\eta = d\mu/d\eta` 。
`initialize`	表达。这需要设置该系列所需的任何数据对象以及`n`(二项式系列中的AIC所需)和`mustart`(请参阅`glm`)。
`validmu`	逻辑函数。如果均值向量 `mu` 位于 `variance` 的域内，则返回 `TRUE` 。
`valideta`	逻辑函数。如果线性预测变量 `eta` 位于 `linkinv` 的域内，则返回 `TRUE` 。
`simulate`	(可选)函数 `simulate(object, nsim)` 由 `simulate` 的 `"lm"` 方法调用。它通常会返回一个包含 `nsim` 列和每个拟合值一行的矩阵，但它也可以返回长度为 `nsim` 的列表。显然，“准”家庭将缺少这一点。
`dispersion`	(可选，因为R版本 4.3.0) numeric：色散参数的值(如果固定)，或`NA_real_`如果免费的话。

注意

link 和 variance 参数对于向后兼容性具有相当尴尬的语义。推荐的方法是将它们作为带引号的字符串提供，但也可以不带引号提供它们(作为名称或表达式)。此外，它们可以作为长度为一的字符向量提供，给出选项之一的名称，或者作为列表(对于 link ，属于 "link-glm" 类)。这些限制仅适用于以名称形式给出的链接：当以字符串形式给出时，接受 make.link 已知的所有链接。

这可能是不明确的：提供 link = logit 可能意味着链接的未加引号的名称或对象 logit 的值。如果可能的话，它被解释为允许的链接的名称，然后解释为对象。 (您可以通过 logit[1] 强制解释始终为对象的值。)

例子

require(utils) # for str

nf <- gaussian()  # Normal family
nf
str(nf)

gf <- Gamma()
gf
str(gf)
gf$linkinv
gf$variance(-3:4) #- == (.)^2

## Binomial with default 'logit' link:  Check some properties visually:
bi <- binomial()
et <- seq(-10,10, by=1/8)
plot(et, bi$mu.eta(et), type="l")
## show that mu.eta() is derivative of linkinv() :
lines((et[-1]+et[-length(et)])/2, col=adjustcolor("red", 1/4),
      diff(bi$linkinv(et))/diff(et), type="l", lwd=4)
## which here is the logistic density:
lines(et, dlogis(et), lwd=3, col=adjustcolor("blue", 1/4))
stopifnot(exprs = {
  all.equal(bi$ mu.eta(et), dlogis(et))
  all.equal(bi$linkinv(et), plogis(et) -> m)
  all.equal(bi$linkfun(m ), qlogis(m))    #  logit(.) == qlogis(.) !
})

## Data from example(glm) :
d.AD <- data.frame(treatment = gl(3,3),
                   outcome   = gl(3,1,9),
                   counts    = c(18,17,15, 20,10,20, 25,13,12))
glm.D93 <- glm(counts ~ outcome + treatment, d.AD, family = poisson())
## Quasipoisson: compare with above / example(glm) :
glm.qD93 <- glm(counts ~ outcome + treatment, d.AD, family = quasipoisson())

glm.qD93
anova  (glm.qD93, test = "F")
summary(glm.qD93)
## for Poisson results (same as from 'glm.D93' !) use
anova  (glm.qD93, dispersion = 1, test = "Chisq")
summary(glm.qD93, dispersion = 1)



## Example of user-specified link, a logit model for p^days
## See Shaffer, T.  2004. Auk 121(2): 526-540.
logexp <- function(days = 1)
{
    linkfun <- function(mu) qlogis(mu^(1/days))
    linkinv <- function(eta) plogis(eta)^days
    mu.eta  <- function(eta) days * plogis(eta)^(days-1) *
                  binomial()$mu.eta(eta)
    valideta <- function(eta) TRUE
    link <- paste0("logexp(", days, ")")
    structure(list(linkfun = linkfun, linkinv = linkinv,
                   mu.eta = mu.eta, valideta = valideta, name = link),
              class = "link-glm")
}
(bil3 <- binomial(logexp(3)))

## in practice this would be used with a vector of 'days', in
## which case use an offset of 0 in the corresponding formula
## to get the null deviance right.

## Binomial with identity link: often not a good idea, as both
## computationally and conceptually difficult:
binomial(link = "identity")  ## is exactly the same as
binomial(link = make.link("identity"))



## tests of quasi
x <- rnorm(100)
y <- rpois(100, exp(1+x))
glm(y ~ x, family = quasi(variance = "mu", link = "log"))
# which is the same as
glm(y ~ x, family = poisson)
glm(y ~ x, family = quasi(variance = "mu^2", link = "log"))
## Not run: glm(y ~ x, family = quasi(variance = "mu^3", link = "log")) # fails
y <- rbinom(100, 1, plogis(x))
# need to set a starting value for the next fit
glm(y ~ x, family = quasi(variance = "mu(1-mu)", link = "logit"), start = c(0,1))

作者

The design was inspired by S functions of the same names described in Hastie & Pregibon (1992) (except quasibinomial and quasipoisson).

参考

McCullagh P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman and Hall.

Dobson, A. J. (1983) An Introduction to Statistical Modelling. London: Chapman and Hall.

Cox, D. R. and Snell, E. J. (1981). Applied Statistics; Principles and Examples. London: Chapman and Hall.

Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

也可以看看

glm、power、make.link。

对于二项式系数，choose ；二项式和负二项式分布 Binomial 和 NegBinomial 。

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 Family Objects for Models。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。