shrink()
子集 data
僅包含原型 ptype
指定的所需列。
例子
# ---------------------------------------------------------------------------
# Setup
train <- iris[1:100, ]
test <- iris[101:150, ]
# ---------------------------------------------------------------------------
# shrink()
# mold() is run at model fit time
# and a formula preprocessing blueprint is recorded
x <- mold(log(Sepal.Width) ~ Species, train)
# Inside the result of mold() are the prototype tibbles
# for the predictors and the outcomes
ptype_pred <- x$blueprint$ptypes$predictors
ptype_out <- x$blueprint$ptypes$outcomes
# Pass the test data, along with a prototype, to
# shrink() to extract the prototype columns
shrink(test, ptype_pred)
#> # A tibble: 50 × 1
#> Species
#> <fct>
#> 1 virginica
#> 2 virginica
#> 3 virginica
#> 4 virginica
#> 5 virginica
#> 6 virginica
#> 7 virginica
#> 8 virginica
#> 9 virginica
#> 10 virginica
#> # ℹ 40 more rows
# To extract the outcomes, just use the
# outcome prototype
shrink(test, ptype_out)
#> # A tibble: 50 × 1
#> Sepal.Width
#> <dbl>
#> 1 3.3
#> 2 2.7
#> 3 3
#> 4 2.9
#> 5 3
#> 6 3
#> 7 2.5
#> 8 2.9
#> 9 2.5
#> 10 3.6
#> # ℹ 40 more rows
# shrink() makes sure that the columns
# required by `ptype` actually exist in the data
# and errors nicely when they don't
test2 <- subset(test, select = -Species)
try(shrink(test2, ptype_pred))
#> Error in validate_column_names(data, cols) :
#> The following required columns are missing: 'Species'.
相關用法
- R hardhat standardize 標準化結果
- R hardhat scream ? 尖叫。
- R hardhat spruce-multiple 完善多結果預測
- R hardhat validate_prediction_size 確保預測具有正確的行數
- R hardhat default_recipe_blueprint 默認配方藍圖
- R hardhat is_blueprint x 是預處理藍圖嗎?
- R hardhat validate_column_names 確保數據包含所需的列名
- R hardhat default_formula_blueprint 默認公式藍圖
- R hardhat update_blueprint 更新預處理藍圖
- R hardhat weighted_table 加權表
- R hardhat validate_outcomes_are_univariate 確保結果是單變量
- R hardhat get_levels 從 DataFrame 中提取因子水平
- R hardhat add_intercept_column 向數據添加截距列
- R hardhat is_frequency_weights x 是頻率權重向量嗎?
- R hardhat model_offset 提取模型偏移
- R hardhat model_matrix 構建設計矩陣
- R hardhat is_importance_weights x 是重要性權重向量嗎?
- R hardhat run-mold 根據藍圖 Mold()
- R hardhat get_data_classes 從 DataFrame 或矩陣中提取數據類
- R hardhat fct_encode_one_hot 將一個因子編碼為 one-hot 指標矩陣
- R hardhat new_frequency_weights 構建頻率權重向量
- R hardhat validate_no_formula_duplication 確保公式中不出現重複項
- R hardhat default_xy_blueprint 默認 XY 藍圖
- R hardhat validate_outcomes_are_numeric 確保結果都是數字
- R hardhat frequency_weights 頻率權重
注:本文由純淨天空篩選整理自Davis Vaughan等大神的英文原創作品 Subset only required columns。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。