check_missing
创建配方操作的规范,该规范将检查变量是否包含缺失值。
用法
check_missing(
recipe,
...,
role = NA,
trained = FALSE,
columns = NULL,
skip = FALSE,
id = rand_id("missing")
)
参数
- recipe
-
一个菜谱对象。该检查将添加到该配方的操作序列中。
- ...
-
一个或多个选择器函数用于选择用于此检查的变量。有关更多详细信息,请参阅
selections()
。 - role
-
由于没有创建新变量,因此此检查未使用。
- trained
-
...
中的选择器是否已由prep()
解析的逻辑。 - columns
-
所选变量名称的字符串。该字段是一个占位符,一旦使用
prep()
就会被填充。 - skip
-
一个合乎逻辑的。当
bake()
烘焙食谱时是否应该跳过检查?虽然所有操作都是在prep()
运行时烘焙的,但某些操作可能无法对新数据进行(例如处理结果变量)。使用skip = TRUE
时应小心,因为它可能会影响后续操作的计算。 - id
-
此检查唯一的字符串,用于识别它。
tidy() 结果
当您tidy()
进行此检查时,将返回带有terms
列(所选的选择器或变量)的tibble。
也可以看看
其他检查:check_class()
、check_cols()
、check_new_values()
、check_range()
例子
data(credit_data, package = "modeldata")
is.na(credit_data) %>% colSums()
#> Status Seniority Home Time Age Marital Records
#> 0 0 6 0 0 1 0
#> Job Expenses Income Assets Debt Amount Price
#> 2 0 381 47 18 0 0
# If the test passes, `new_data` is returned unaltered
recipe(credit_data) %>%
check_missing(Age, Expenses) %>%
prep() %>%
bake(credit_data)
#> # A tibble: 4,454 × 14
#> Status Seniority Home Time Age Marital Records Job Expenses
#> <fct> <int> <fct> <int> <int> <fct> <fct> <fct> <int>
#> 1 good 9 rent 60 30 married no freelance 73
#> 2 good 17 rent 60 58 widow no fixed 48
#> 3 bad 10 owner 36 46 married yes freelance 90
#> 4 good 0 rent 60 24 single no fixed 63
#> 5 good 0 rent 36 26 single no fixed 46
#> 6 good 1 owner 60 36 married no fixed 75
#> 7 good 29 owner 60 44 married no fixed 75
#> 8 good 9 parents 12 27 single no fixed 35
#> 9 good 0 owner 60 32 married no freelance 90
#> 10 bad 0 parents 48 41 married no partime 90
#> # ℹ 4,444 more rows
#> # ℹ 5 more variables: Income <int>, Assets <int>, Debt <int>,
#> # Amount <int>, Price <int>
# If your training set doesn't pass, prep() will stop with an error
if (FALSE) {
recipe(credit_data) %>%
check_missing(Income) %>%
prep()
}
# If `new_data` contain missing values, the check will stop `bake()`
train_data <- credit_data %>% dplyr::filter(Income > 150)
test_data <- credit_data %>% dplyr::filter(Income <= 150 | is.na(Income))
rp <- recipe(train_data) %>%
check_missing(Income) %>%
prep()
bake(rp, train_data)
#> # A tibble: 1,338 × 14
#> Status Seniority Home Time Age Marital Records Job Expenses
#> <fct> <int> <fct> <int> <int> <fct> <fct> <fct> <int>
#> 1 bad 10 owner 36 46 married yes freelance 90
#> 2 good 0 rent 60 24 single no fixed 63
#> 3 good 1 owner 60 36 married no fixed 75
#> 4 good 8 owner 60 30 married no fixed 75
#> 5 good 19 priv 36 37 married no fixed 75
#> 6 good 15 priv 24 52 single no freelance 35
#> 7 good 33 rent 24 68 married no freelance 65
#> 8 good 5 owner 60 22 single no fixed 45
#> 9 good 19 owner 60 43 single no fixed 75
#> 10 good 15 owner 36 43 married no fixed 75
#> # ℹ 1,328 more rows
#> # ℹ 5 more variables: Income <int>, Assets <int>, Debt <int>,
#> # Amount <int>, Price <int>
if (FALSE) {
bake(rp, test_data)
}
相关用法
- R recipes check_range 检查范围一致性
- R recipes check_new_values 检查新值
- R recipes check_cols 检查所有列是否都存在
- R recipes check_class 检查变量类别
- R recipes step_unknown 将缺失的类别分配给“未知”
- R recipes step_relu 应用(平滑)修正线性变换
- R recipes step_poly_bernstein 广义伯恩斯坦多项式基
- R recipes step_impute_knn 通过 k 最近邻进行插补
- R recipes step_impute_mean 使用平均值估算数值数据
- R recipes step_inverse 逆变换
- R recipes step_pls 偏最小二乘特征提取
- R recipes update.step 更新菜谱步骤
- R recipes step_ratio 比率变量创建
- R recipes step_geodist 两个地点之间的距离
- R recipes step_nzv 近零方差滤波器
- R recipes step_nnmf 非负矩阵分解信号提取
- R recipes step_normalize 中心和比例数值数据
- R recipes step_depth 数据深度
- R recipes step_other 折叠一些分类级别
- R recipes step_harmonic 添加正弦和余弦项以进行谐波分析
- R recipes step_corr 高相关滤波器
- R recipes step_novel 新因子水平的简单赋值
- R recipes step_select 使用 dplyr 选择变量
- R recipes formula.recipe 从准备好的食谱创建配方
- R recipes step_regex 检测正则表达式
注:本文由纯净天空筛选整理自Max Kuhn等大神的英文原创作品 Check for Missing Values。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。