R parsnip glm_grouped 將數據集中的分組二項式結果與個案權重擬合

stats::glm() 假設具有個案權重的表格數據集對應於“不同的觀測值具有不同的離散度”(參見 ?glm )。

在某些情況下，案例權重反映了多次觀察到相同的協變量模式(即頻率權重)。在本例中， stats::glm() 期望數據被格式化為每個因子級別的事件數，以便結果可以作為 cbind(events_1, events_2) 給出給公式。

glm_grouped() 將具有整數大小寫權重的數據轉換為二項式數據的預期“事件數”格式。

用法

glm_grouped(formula, data, weights, ...)

參數

formula: 具有一個結果的公式對象，該結果是兩級因子。
data: 包含結果和預測變量(但不包含案例權重)的 DataFrame 。
weights: 權重的整數向量，其長度與 data 中的行數相同。如果它是非整數，它將被轉換為整數(帶有警告)。
...: 要傳遞給 stats::glm() 的選項。如果family未設置，它將自動分配基本二項式族。

值

由 stats::glm() 生成的對象。

例子

#----------------------------------------------------------------------------
# The same data set formatted three ways

# First with basic case weights that, from ?glm, are used inappropriately.
ucb_weighted <- as.data.frame(UCBAdmissions)
ucb_weighted$Freq <- as.integer(ucb_weighted$Freq)
head(ucb_weighted)
#>      Admit Gender Dept Freq
#> 1 Admitted   Male    A  512
#> 2 Rejected   Male    A  313
#> 3 Admitted Female    A   89
#> 4 Rejected Female    A   19
#> 5 Admitted   Male    B  353
#> 6 Rejected   Male    B  207
nrow(ucb_weighted)
#> [1] 24

# Format when yes/no data are in individual rows (probably still inappropriate)
library(tidyr)
ucb_long <- uncount(ucb_weighted, Freq)
head(ucb_long)
#>      Admit Gender Dept
#> 1 Admitted   Male    A
#> 2 Admitted   Male    A
#> 3 Admitted   Male    A
#> 4 Admitted   Male    A
#> 5 Admitted   Male    A
#> 6 Admitted   Male    A
nrow(ucb_long)
#> [1] 4526

# Format where the outcome is formatted as number of events
ucb_events <-
  ucb_weighted %>%
  tidyr::pivot_wider(
    id_cols = c(Gender, Dept),
    names_from = Admit,
    values_from = Freq,
    values_fill = 0L
  )
head(ucb_events)
#> # A tibble: 6 × 4
#>   Gender Dept  Admitted Rejected
#>   <fct>  <fct>    <int>    <int>
#> 1 Male   A          512      313
#> 2 Female A           89       19
#> 3 Male   B          353      207
#> 4 Female B           17        8
#> 5 Male   C          120      205
#> 6 Female C          202      391
nrow(ucb_events)
#> [1] 12

#----------------------------------------------------------------------------
# Different model fits

# Treat data as separate Bernoulli data:
glm(Admit ~ Gender + Dept, data = ucb_long, family = binomial)
#> 
#> Call:  glm(formula = Admit ~ Gender + Dept, family = binomial, data = ucb_long)
#> 
#> Coefficients:
#>  (Intercept)  GenderFemale         DeptB         DeptC         DeptD  
#>     -0.58205      -0.09987       0.04340       1.26260       1.29461  
#>        DeptE         DeptF  
#>      1.73931       3.30648  
#> 
#> Degrees of Freedom: 4525 Total (i.e. Null);  4519 Residual
#> Null Deviance:	    6044 
#> Residual Deviance: 5187 	AIC: 5201

# Weights produce the same statistics
glm(
  Admit ~ Gender + Dept,
  data = ucb_weighted,
  family = binomial,
  weights = ucb_weighted$Freq
)
#> 
#> Call:  glm(formula = Admit ~ Gender + Dept, family = binomial, data = ucb_weighted, 
#>     weights = ucb_weighted$Freq)
#> 
#> Coefficients:
#>  (Intercept)  GenderFemale         DeptB         DeptC         DeptD  
#>     -0.58205      -0.09987       0.04340       1.26260       1.29461  
#>        DeptE         DeptF  
#>      1.73931       3.30648  
#> 
#> Degrees of Freedom: 23 Total (i.e. Null);  17 Residual
#> Null Deviance:	    6044 
#> Residual Deviance: 5187 	AIC: 5201

# Data as binomial "x events out of n trials" format. Note that, to get the same
# coefficients, the order of the levels must be reversed.
glm(
  cbind(Rejected, Admitted) ~ Gender + Dept,
  data = ucb_events,
  family = binomial
)
#> 
#> Call:  glm(formula = cbind(Rejected, Admitted) ~ Gender + Dept, family = binomial, 
#>     data = ucb_events)
#> 
#> Coefficients:
#>  (Intercept)  GenderFemale         DeptB         DeptC         DeptD  
#>     -0.58205      -0.09987       0.04340       1.26260       1.29461  
#>        DeptE         DeptF  
#>      1.73931       3.30648  
#> 
#> Degrees of Freedom: 11 Total (i.e. Null);  5 Residual
#> Null Deviance:	    877.1 
#> Residual Deviance: 20.2 	AIC: 103.1

# The new function that starts with frequency weights and gets the correct place:
glm_grouped(Admit ~ Gender + Dept, data = ucb_weighted, weights = ucb_weighted$Freq)
#> 
#> Call:  glm(formula = formula, family = "binomial", data = data)
#> 
#> Coefficients:
#>  (Intercept)  GenderFemale         DeptB         DeptC         DeptD  
#>     -0.58205      -0.09987       0.04340       1.26260       1.29461  
#>        DeptE         DeptF  
#>      1.73931       3.30648  
#> 
#> Degrees of Freedom: 11 Total (i.e. Null);  5 Residual
#> Null Deviance:	    877.1 
#> Residual Deviance: 20.2 	AIC: 103.1

源代碼：R/grouped_binomial.R

相關用法

注：本文由純淨天空篩選整理自Max Kuhn等大神的英文原創作品 Fit a grouped binomial outcome from a data set with case weights。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。