stats::glm()
假設具有個案權重的表格數據集對應於“不同的觀測值具有不同的離散度”(參見 ?glm
)。
在某些情況下,案例權重反映了多次觀察到相同的協變量模式(即頻率權重)。在本例中, stats::glm()
期望數據被格式化為每個因子級別的事件數,以便結果可以作為 cbind(events_1, events_2)
給出給公式。
glm_grouped()
將具有整數大小寫權重的數據轉換為二項式數據的預期“事件數”格式。
參數
- formula
-
具有一個結果的公式對象,該結果是兩級因子。
- data
-
包含結果和預測變量(但不包含案例權重)的 DataFrame 。
- weights
-
權重的整數向量,其長度與
data
中的行數相同。如果它是非整數,它將被轉換為整數(帶有警告)。 - ...
-
要傳遞給
stats::glm()
的選項。如果family
未設置,它將自動分配基本二項式族。
值
由 stats::glm()
生成的對象。
例子
#----------------------------------------------------------------------------
# The same data set formatted three ways
# First with basic case weights that, from ?glm, are used inappropriately.
ucb_weighted <- as.data.frame(UCBAdmissions)
ucb_weighted$Freq <- as.integer(ucb_weighted$Freq)
head(ucb_weighted)
#> Admit Gender Dept Freq
#> 1 Admitted Male A 512
#> 2 Rejected Male A 313
#> 3 Admitted Female A 89
#> 4 Rejected Female A 19
#> 5 Admitted Male B 353
#> 6 Rejected Male B 207
nrow(ucb_weighted)
#> [1] 24
# Format when yes/no data are in individual rows (probably still inappropriate)
library(tidyr)
ucb_long <- uncount(ucb_weighted, Freq)
head(ucb_long)
#> Admit Gender Dept
#> 1 Admitted Male A
#> 2 Admitted Male A
#> 3 Admitted Male A
#> 4 Admitted Male A
#> 5 Admitted Male A
#> 6 Admitted Male A
nrow(ucb_long)
#> [1] 4526
# Format where the outcome is formatted as number of events
ucb_events <-
ucb_weighted %>%
tidyr::pivot_wider(
id_cols = c(Gender, Dept),
names_from = Admit,
values_from = Freq,
values_fill = 0L
)
head(ucb_events)
#> # A tibble: 6 × 4
#> Gender Dept Admitted Rejected
#> <fct> <fct> <int> <int>
#> 1 Male A 512 313
#> 2 Female A 89 19
#> 3 Male B 353 207
#> 4 Female B 17 8
#> 5 Male C 120 205
#> 6 Female C 202 391
nrow(ucb_events)
#> [1] 12
#----------------------------------------------------------------------------
# Different model fits
# Treat data as separate Bernoulli data:
glm(Admit ~ Gender + Dept, data = ucb_long, family = binomial)
#>
#> Call: glm(formula = Admit ~ Gender + Dept, family = binomial, data = ucb_long)
#>
#> Coefficients:
#> (Intercept) GenderFemale DeptB DeptC DeptD
#> -0.58205 -0.09987 0.04340 1.26260 1.29461
#> DeptE DeptF
#> 1.73931 3.30648
#>
#> Degrees of Freedom: 4525 Total (i.e. Null); 4519 Residual
#> Null Deviance: 6044
#> Residual Deviance: 5187 AIC: 5201
# Weights produce the same statistics
glm(
Admit ~ Gender + Dept,
data = ucb_weighted,
family = binomial,
weights = ucb_weighted$Freq
)
#>
#> Call: glm(formula = Admit ~ Gender + Dept, family = binomial, data = ucb_weighted,
#> weights = ucb_weighted$Freq)
#>
#> Coefficients:
#> (Intercept) GenderFemale DeptB DeptC DeptD
#> -0.58205 -0.09987 0.04340 1.26260 1.29461
#> DeptE DeptF
#> 1.73931 3.30648
#>
#> Degrees of Freedom: 23 Total (i.e. Null); 17 Residual
#> Null Deviance: 6044
#> Residual Deviance: 5187 AIC: 5201
# Data as binomial "x events out of n trials" format. Note that, to get the same
# coefficients, the order of the levels must be reversed.
glm(
cbind(Rejected, Admitted) ~ Gender + Dept,
data = ucb_events,
family = binomial
)
#>
#> Call: glm(formula = cbind(Rejected, Admitted) ~ Gender + Dept, family = binomial,
#> data = ucb_events)
#>
#> Coefficients:
#> (Intercept) GenderFemale DeptB DeptC DeptD
#> -0.58205 -0.09987 0.04340 1.26260 1.29461
#> DeptE DeptF
#> 1.73931 3.30648
#>
#> Degrees of Freedom: 11 Total (i.e. Null); 5 Residual
#> Null Deviance: 877.1
#> Residual Deviance: 20.2 AIC: 103.1
# The new function that starts with frequency weights and gets the correct place:
glm_grouped(Admit ~ Gender + Dept, data = ucb_weighted, weights = ucb_weighted$Freq)
#>
#> Call: glm(formula = formula, family = "binomial", data = data)
#>
#> Coefficients:
#> (Intercept) GenderFemale DeptB DeptC DeptD
#> -0.58205 -0.09987 0.04340 1.26260 1.29461
#> DeptE DeptF
#> 1.73931 3.30648
#>
#> Degrees of Freedom: 11 Total (i.e. Null); 5 Residual
#> Null Deviance: 877.1
#> Residual Deviance: 20.2 AIC: 103.1
相關用法
- R parsnip gen_additive_mod 廣義加性模型 (GAM)
- R parsnip logistic_reg 邏輯回歸
- R parsnip predict.model_fit 模型預測
- R parsnip linear_reg 線性回歸
- R parsnip C5_rules C5.0 基於規則的分類模型
- R parsnip set_engine 聲明計算引擎和特定參數
- R parsnip condense_control 將控製對象壓縮為更小的控製對象
- R parsnip control_parsnip 控製擬合函數
- R parsnip augment 通過預測增強數據
- R parsnip repair_call 修複模型調用對象
- R parsnip dot-model_param_name_key 翻譯模型調整參數的名稱
- R parsnip rule_fit 規則擬合模型
- R parsnip svm_rbf 徑向基函數支持向量機
- R parsnip set_args 更改模型規範的元素
- R parsnip translate 解決計算引擎的模型規範
- R parsnip max_mtry_formula 根據公式確定 mtry 的最大值。此函數可能會根據公式和數據集限製 mtry 的值。對於生存和/或多變量模型來說,這是一種安全的方法。
- R parsnip svm_linear 線性支持向量機
- R parsnip set_new_model 注冊模型的工具
- R parsnip rand_forest 隨機森林
- R parsnip mlp 單層神經網絡
- R parsnip nearest_neighbor K-最近鄰
- R parsnip parsnip_update 更新型號規格
- R parsnip fit 將模型規範擬合到數據集
- R parsnip boost_tree 增強樹
- R parsnip bart 貝葉斯加性回歸樹 (BART)
注:本文由純淨天空篩選整理自Max Kuhn等大神的英文原創作品 Fit a grouped binomial outcome from a data set with case weights。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。