check_missing
創建配方操作的規範,該規範將檢查變量是否包含缺失值。
用法
check_missing(
recipe,
...,
role = NA,
trained = FALSE,
columns = NULL,
skip = FALSE,
id = rand_id("missing")
)
參數
- recipe
-
一個菜譜對象。該檢查將添加到該配方的操作序列中。
- ...
-
一個或多個選擇器函數用於選擇用於此檢查的變量。有關更多詳細信息,請參閱
selections()
。 - role
-
由於沒有創建新變量,因此此檢查未使用。
- trained
-
...
中的選擇器是否已由prep()
解析的邏輯。 - columns
-
所選變量名稱的字符串。該字段是一個占位符,一旦使用
prep()
就會被填充。 - skip
-
一個合乎邏輯的。當
bake()
烘焙食譜時是否應該跳過檢查?雖然所有操作都是在prep()
運行時烘焙的,但某些操作可能無法對新數據進行(例如處理結果變量)。使用skip = TRUE
時應小心,因為它可能會影響後續操作的計算。 - id
-
此檢查唯一的字符串,用於識別它。
tidy() 結果
當您tidy()
進行此檢查時,將返回帶有terms
列(所選的選擇器或變量)的tibble。
也可以看看
其他檢查:check_class()
、check_cols()
、check_new_values()
、check_range()
例子
data(credit_data, package = "modeldata")
is.na(credit_data) %>% colSums()
#> Status Seniority Home Time Age Marital Records
#> 0 0 6 0 0 1 0
#> Job Expenses Income Assets Debt Amount Price
#> 2 0 381 47 18 0 0
# If the test passes, `new_data` is returned unaltered
recipe(credit_data) %>%
check_missing(Age, Expenses) %>%
prep() %>%
bake(credit_data)
#> # A tibble: 4,454 × 14
#> Status Seniority Home Time Age Marital Records Job Expenses
#> <fct> <int> <fct> <int> <int> <fct> <fct> <fct> <int>
#> 1 good 9 rent 60 30 married no freelance 73
#> 2 good 17 rent 60 58 widow no fixed 48
#> 3 bad 10 owner 36 46 married yes freelance 90
#> 4 good 0 rent 60 24 single no fixed 63
#> 5 good 0 rent 36 26 single no fixed 46
#> 6 good 1 owner 60 36 married no fixed 75
#> 7 good 29 owner 60 44 married no fixed 75
#> 8 good 9 parents 12 27 single no fixed 35
#> 9 good 0 owner 60 32 married no freelance 90
#> 10 bad 0 parents 48 41 married no partime 90
#> # ℹ 4,444 more rows
#> # ℹ 5 more variables: Income <int>, Assets <int>, Debt <int>,
#> # Amount <int>, Price <int>
# If your training set doesn't pass, prep() will stop with an error
if (FALSE) {
recipe(credit_data) %>%
check_missing(Income) %>%
prep()
}
# If `new_data` contain missing values, the check will stop `bake()`
train_data <- credit_data %>% dplyr::filter(Income > 150)
test_data <- credit_data %>% dplyr::filter(Income <= 150 | is.na(Income))
rp <- recipe(train_data) %>%
check_missing(Income) %>%
prep()
bake(rp, train_data)
#> # A tibble: 1,338 × 14
#> Status Seniority Home Time Age Marital Records Job Expenses
#> <fct> <int> <fct> <int> <int> <fct> <fct> <fct> <int>
#> 1 bad 10 owner 36 46 married yes freelance 90
#> 2 good 0 rent 60 24 single no fixed 63
#> 3 good 1 owner 60 36 married no fixed 75
#> 4 good 8 owner 60 30 married no fixed 75
#> 5 good 19 priv 36 37 married no fixed 75
#> 6 good 15 priv 24 52 single no freelance 35
#> 7 good 33 rent 24 68 married no freelance 65
#> 8 good 5 owner 60 22 single no fixed 45
#> 9 good 19 owner 60 43 single no fixed 75
#> 10 good 15 owner 36 43 married no fixed 75
#> # ℹ 1,328 more rows
#> # ℹ 5 more variables: Income <int>, Assets <int>, Debt <int>,
#> # Amount <int>, Price <int>
if (FALSE) {
bake(rp, test_data)
}
相關用法
- R recipes check_range 檢查範圍一致性
- R recipes check_new_values 檢查新值
- R recipes check_cols 檢查所有列是否都存在
- R recipes check_class 檢查變量類別
- R recipes step_unknown 將缺失的類別分配給“未知”
- R recipes step_relu 應用(平滑)修正線性變換
- R recipes step_poly_bernstein 廣義伯恩斯坦多項式基
- R recipes step_impute_knn 通過 k 最近鄰進行插補
- R recipes step_impute_mean 使用平均值估算數值數據
- R recipes step_inverse 逆變換
- R recipes step_pls 偏最小二乘特征提取
- R recipes update.step 更新菜譜步驟
- R recipes step_ratio 比率變量創建
- R recipes step_geodist 兩個地點之間的距離
- R recipes step_nzv 近零方差濾波器
- R recipes step_nnmf 非負矩陣分解信號提取
- R recipes step_normalize 中心和比例數值數據
- R recipes step_depth 數據深度
- R recipes step_other 折疊一些分類級別
- R recipes step_harmonic 添加正弦和餘弦項以進行諧波分析
- R recipes step_corr 高相關濾波器
- R recipes step_novel 新因子水平的簡單賦值
- R recipes step_select 使用 dplyr 選擇變量
- R recipes formula.recipe 從準備好的食譜創建配方
- R recipes step_regex 檢測正則表達式
注:本文由純淨天空篩選整理自Max Kuhn等大神的英文原創作品 Check for Missing Values。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。