R parsnip glm_grouped 将数据集中的分组二项式结果与个案权重拟合

stats::glm() 假设具有个案权重的表格数据集对应于“不同的观测值具有不同的离散度”(参见 ?glm )。

在某些情况下，案例权重反映了多次观察到相同的协变量模式(即频率权重)。在本例中， stats::glm() 期望数据被格式化为每个因子级别的事件数，以便结果可以作为 cbind(events_1, events_2) 给出给公式。

glm_grouped() 将具有整数大小写权重的数据转换为二项式数据的预期“事件数”格式。

用法

glm_grouped(formula, data, weights, ...)

参数

formula: 具有一个结果的公式对象，该结果是两级因子。
data: 包含结果和预测变量(但不包含案例权重)的 DataFrame 。
weights: 权重的整数向量，其长度与 data 中的行数相同。如果它是非整数，它将被转换为整数(带有警告)。
...: 要传递给 stats::glm() 的选项。如果family未设置，它将自动分配基本二项式族。

值

由 stats::glm() 生成的对象。

例子

#----------------------------------------------------------------------------
# The same data set formatted three ways

# First with basic case weights that, from ?glm, are used inappropriately.
ucb_weighted <- as.data.frame(UCBAdmissions)
ucb_weighted$Freq <- as.integer(ucb_weighted$Freq)
head(ucb_weighted)
#>      Admit Gender Dept Freq
#> 1 Admitted   Male    A  512
#> 2 Rejected   Male    A  313
#> 3 Admitted Female    A   89
#> 4 Rejected Female    A   19
#> 5 Admitted   Male    B  353
#> 6 Rejected   Male    B  207
nrow(ucb_weighted)
#> [1] 24

# Format when yes/no data are in individual rows (probably still inappropriate)
library(tidyr)
ucb_long <- uncount(ucb_weighted, Freq)
head(ucb_long)
#>      Admit Gender Dept
#> 1 Admitted   Male    A
#> 2 Admitted   Male    A
#> 3 Admitted   Male    A
#> 4 Admitted   Male    A
#> 5 Admitted   Male    A
#> 6 Admitted   Male    A
nrow(ucb_long)
#> [1] 4526

# Format where the outcome is formatted as number of events
ucb_events <-
  ucb_weighted %>%
  tidyr::pivot_wider(
    id_cols = c(Gender, Dept),
    names_from = Admit,
    values_from = Freq,
    values_fill = 0L
  )
head(ucb_events)
#> # A tibble: 6 × 4
#>   Gender Dept  Admitted Rejected
#>   <fct>  <fct>    <int>    <int>
#> 1 Male   A          512      313
#> 2 Female A           89       19
#> 3 Male   B          353      207
#> 4 Female B           17        8
#> 5 Male   C          120      205
#> 6 Female C          202      391
nrow(ucb_events)
#> [1] 12

#----------------------------------------------------------------------------
# Different model fits

# Treat data as separate Bernoulli data:
glm(Admit ~ Gender + Dept, data = ucb_long, family = binomial)
#> 
#> Call:  glm(formula = Admit ~ Gender + Dept, family = binomial, data = ucb_long)
#> 
#> Coefficients:
#>  (Intercept)  GenderFemale         DeptB         DeptC         DeptD  
#>     -0.58205      -0.09987       0.04340       1.26260       1.29461  
#>        DeptE         DeptF  
#>      1.73931       3.30648  
#> 
#> Degrees of Freedom: 4525 Total (i.e. Null);  4519 Residual
#> Null Deviance:	    6044 
#> Residual Deviance: 5187 	AIC: 5201

# Weights produce the same statistics
glm(
  Admit ~ Gender + Dept,
  data = ucb_weighted,
  family = binomial,
  weights = ucb_weighted$Freq
)
#> 
#> Call:  glm(formula = Admit ~ Gender + Dept, family = binomial, data = ucb_weighted, 
#>     weights = ucb_weighted$Freq)
#> 
#> Coefficients:
#>  (Intercept)  GenderFemale         DeptB         DeptC         DeptD  
#>     -0.58205      -0.09987       0.04340       1.26260       1.29461  
#>        DeptE         DeptF  
#>      1.73931       3.30648  
#> 
#> Degrees of Freedom: 23 Total (i.e. Null);  17 Residual
#> Null Deviance:	    6044 
#> Residual Deviance: 5187 	AIC: 5201

# Data as binomial "x events out of n trials" format. Note that, to get the same
# coefficients, the order of the levels must be reversed.
glm(
  cbind(Rejected, Admitted) ~ Gender + Dept,
  data = ucb_events,
  family = binomial
)
#> 
#> Call:  glm(formula = cbind(Rejected, Admitted) ~ Gender + Dept, family = binomial, 
#>     data = ucb_events)
#> 
#> Coefficients:
#>  (Intercept)  GenderFemale         DeptB         DeptC         DeptD  
#>     -0.58205      -0.09987       0.04340       1.26260       1.29461  
#>        DeptE         DeptF  
#>      1.73931       3.30648  
#> 
#> Degrees of Freedom: 11 Total (i.e. Null);  5 Residual
#> Null Deviance:	    877.1 
#> Residual Deviance: 20.2 	AIC: 103.1

# The new function that starts with frequency weights and gets the correct place:
glm_grouped(Admit ~ Gender + Dept, data = ucb_weighted, weights = ucb_weighted$Freq)
#> 
#> Call:  glm(formula = formula, family = "binomial", data = data)
#> 
#> Coefficients:
#>  (Intercept)  GenderFemale         DeptB         DeptC         DeptD  
#>     -0.58205      -0.09987       0.04340       1.26260       1.29461  
#>        DeptE         DeptF  
#>      1.73931       3.30648  
#> 
#> Degrees of Freedom: 11 Total (i.e. Null);  5 Residual
#> Null Deviance:	    877.1 
#> Residual Deviance: 20.2 	AIC: 103.1

源代码：R/grouped_binomial.R

相关用法

注：本文由纯净天空筛选整理自Max Kuhn等大神的英文原创作品 Fit a grouped binomial outcome from a data set with case weights。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。