stats::glm()
假设具有个案权重的表格数据集对应于“不同的观测值具有不同的离散度”(参见 ?glm
)。
在某些情况下,案例权重反映了多次观察到相同的协变量模式(即频率权重)。在本例中, stats::glm()
期望数据被格式化为每个因子级别的事件数,以便结果可以作为 cbind(events_1, events_2)
给出给公式。
glm_grouped()
将具有整数大小写权重的数据转换为二项式数据的预期“事件数”格式。
参数
- formula
-
具有一个结果的公式对象,该结果是两级因子。
- data
-
包含结果和预测变量(但不包含案例权重)的 DataFrame 。
- weights
-
权重的整数向量,其长度与
data
中的行数相同。如果它是非整数,它将被转换为整数(带有警告)。 - ...
-
要传递给
stats::glm()
的选项。如果family
未设置,它将自动分配基本二项式族。
值
由 stats::glm()
生成的对象。
例子
#----------------------------------------------------------------------------
# The same data set formatted three ways
# First with basic case weights that, from ?glm, are used inappropriately.
ucb_weighted <- as.data.frame(UCBAdmissions)
ucb_weighted$Freq <- as.integer(ucb_weighted$Freq)
head(ucb_weighted)
#> Admit Gender Dept Freq
#> 1 Admitted Male A 512
#> 2 Rejected Male A 313
#> 3 Admitted Female A 89
#> 4 Rejected Female A 19
#> 5 Admitted Male B 353
#> 6 Rejected Male B 207
nrow(ucb_weighted)
#> [1] 24
# Format when yes/no data are in individual rows (probably still inappropriate)
library(tidyr)
ucb_long <- uncount(ucb_weighted, Freq)
head(ucb_long)
#> Admit Gender Dept
#> 1 Admitted Male A
#> 2 Admitted Male A
#> 3 Admitted Male A
#> 4 Admitted Male A
#> 5 Admitted Male A
#> 6 Admitted Male A
nrow(ucb_long)
#> [1] 4526
# Format where the outcome is formatted as number of events
ucb_events <-
ucb_weighted %>%
tidyr::pivot_wider(
id_cols = c(Gender, Dept),
names_from = Admit,
values_from = Freq,
values_fill = 0L
)
head(ucb_events)
#> # A tibble: 6 × 4
#> Gender Dept Admitted Rejected
#> <fct> <fct> <int> <int>
#> 1 Male A 512 313
#> 2 Female A 89 19
#> 3 Male B 353 207
#> 4 Female B 17 8
#> 5 Male C 120 205
#> 6 Female C 202 391
nrow(ucb_events)
#> [1] 12
#----------------------------------------------------------------------------
# Different model fits
# Treat data as separate Bernoulli data:
glm(Admit ~ Gender + Dept, data = ucb_long, family = binomial)
#>
#> Call: glm(formula = Admit ~ Gender + Dept, family = binomial, data = ucb_long)
#>
#> Coefficients:
#> (Intercept) GenderFemale DeptB DeptC DeptD
#> -0.58205 -0.09987 0.04340 1.26260 1.29461
#> DeptE DeptF
#> 1.73931 3.30648
#>
#> Degrees of Freedom: 4525 Total (i.e. Null); 4519 Residual
#> Null Deviance: 6044
#> Residual Deviance: 5187 AIC: 5201
# Weights produce the same statistics
glm(
Admit ~ Gender + Dept,
data = ucb_weighted,
family = binomial,
weights = ucb_weighted$Freq
)
#>
#> Call: glm(formula = Admit ~ Gender + Dept, family = binomial, data = ucb_weighted,
#> weights = ucb_weighted$Freq)
#>
#> Coefficients:
#> (Intercept) GenderFemale DeptB DeptC DeptD
#> -0.58205 -0.09987 0.04340 1.26260 1.29461
#> DeptE DeptF
#> 1.73931 3.30648
#>
#> Degrees of Freedom: 23 Total (i.e. Null); 17 Residual
#> Null Deviance: 6044
#> Residual Deviance: 5187 AIC: 5201
# Data as binomial "x events out of n trials" format. Note that, to get the same
# coefficients, the order of the levels must be reversed.
glm(
cbind(Rejected, Admitted) ~ Gender + Dept,
data = ucb_events,
family = binomial
)
#>
#> Call: glm(formula = cbind(Rejected, Admitted) ~ Gender + Dept, family = binomial,
#> data = ucb_events)
#>
#> Coefficients:
#> (Intercept) GenderFemale DeptB DeptC DeptD
#> -0.58205 -0.09987 0.04340 1.26260 1.29461
#> DeptE DeptF
#> 1.73931 3.30648
#>
#> Degrees of Freedom: 11 Total (i.e. Null); 5 Residual
#> Null Deviance: 877.1
#> Residual Deviance: 20.2 AIC: 103.1
# The new function that starts with frequency weights and gets the correct place:
glm_grouped(Admit ~ Gender + Dept, data = ucb_weighted, weights = ucb_weighted$Freq)
#>
#> Call: glm(formula = formula, family = "binomial", data = data)
#>
#> Coefficients:
#> (Intercept) GenderFemale DeptB DeptC DeptD
#> -0.58205 -0.09987 0.04340 1.26260 1.29461
#> DeptE DeptF
#> 1.73931 3.30648
#>
#> Degrees of Freedom: 11 Total (i.e. Null); 5 Residual
#> Null Deviance: 877.1
#> Residual Deviance: 20.2 AIC: 103.1
相关用法
- R parsnip gen_additive_mod 广义加性模型 (GAM)
- R parsnip logistic_reg 逻辑回归
- R parsnip predict.model_fit 模型预测
- R parsnip linear_reg 线性回归
- R parsnip C5_rules C5.0 基于规则的分类模型
- R parsnip set_engine 声明计算引擎和特定参数
- R parsnip condense_control 将控制对象压缩为更小的控制对象
- R parsnip control_parsnip 控制拟合函数
- R parsnip augment 通过预测增强数据
- R parsnip repair_call 修复模型调用对象
- R parsnip dot-model_param_name_key 翻译模型调整参数的名称
- R parsnip rule_fit 规则拟合模型
- R parsnip svm_rbf 径向基函数支持向量机
- R parsnip set_args 更改模型规范的元素
- R parsnip translate 解决计算引擎的模型规范
- R parsnip max_mtry_formula 根据公式确定 mtry 的最大值。此函数可能会根据公式和数据集限制 mtry 的值。对于生存和/或多变量模型来说,这是一种安全的方法。
- R parsnip svm_linear 线性支持向量机
- R parsnip set_new_model 注册模型的工具
- R parsnip rand_forest 随机森林
- R parsnip mlp 单层神经网络
- R parsnip nearest_neighbor K-最近邻
- R parsnip parsnip_update 更新型号规格
- R parsnip fit 将模型规范拟合到数据集
- R parsnip boost_tree 增强树
- R parsnip bart 贝叶斯加性回归树 (BART)
注:本文由纯净天空筛选整理自Max Kuhn等大神的英文原创作品 Fit a grouped binomial outcome from a data set with case weights。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。