fit_best()
从调整许多模型中获取结果,并将与最佳性能相关的工作流配置与训练集相匹配。
参数
- x
-
已使用
workflow_map()
求值的workflow_set
对象。请注意,工作流程集必须已安装 control optionsave_workflow = TRUE
。 - metric
-
给出对结果进行排名的指标的字符串。
- ...
-
要传递给 tune::fit_best 的其他选项。
细节
此函数是在拟合的工作流程集中拟合数值最佳配置所需步骤的快捷方式。该函数对结果进行排名,提取与最佳结果相关的调整结果,然后再次对包含最佳结果的调整结果调用fit_best()
(本身是一个包装器)。
在伪代码中:
rankings <- rank_results(wf_set, metric, select_best = TRUE)
tune_res <- extract_workflow_set_result(wf_set, rankings$wflow_id[1])
fit_best(tune_res, metric)
注意
该软件包提供两个预生成的工作流程集 two_class_set
和 chi_features_set
,以及适合 two_class_res
和 chi_features_res
的相关模型集。
two_class_*
对象基于使用 modeldata 包中的 two_class_dat
数据的二元分类问题。这六个模型利用裸公式或基本配方,利用 recipes::step_YeoJohnson()
作为预处理器,以及决策树、逻辑回归或 MARS 模型规范。有关源代码,请参阅?two_class_set
。
chi_features_*
对象基于使用 modeldata 包中的 Chicago
数据的回归问题。这三个模型均采用线性回归模型规范,具有不同复杂性的三种不同配方。这些对象旨在近似 Kuhn 和 Johnson (2019) 第 1.3 节中构建的模型序列。有关源代码,请参阅?chi_features_set
。
例子
library(tune)
library(modeldata)
library(rsample)
data(Chicago)
Chicago <- Chicago[1:1195,]
time_val_split <-
sliding_period(
Chicago,
date,
"month",
lookback = 38,
assess_stop = 1
)
chi_features_set
#> # A workflow set/tibble: 3 × 4
#> wflow_id info option result
#> <chr> <list> <list> <list>
#> 1 date_lm <tibble [1 × 4]> <opts[0]> <list [0]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[0]> <list [0]>
#> 3 plus_pca_lm <tibble [1 × 4]> <opts[0]> <list [0]>
chi_features_res_new <-
chi_features_set %>%
# note: must set `save_workflow = TRUE` to use `fit_best()`
option_add(control = control_grid(save_workflow = TRUE)) %>%
# evaluate with resamples
workflow_map(resamples = time_val_split, grid = 21, seed = 1, verbose = TRUE)
#> i No tuning parameters. `fit_resamples()` will be attempted
#> i 1 of 3 resampling: date_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations A: x1
#> There were issues with some computations A: x1
#>
#> ✔ 1 of 3 resampling: date_lm (662ms)
#> i No tuning parameters. `fit_resamples()` will be attempted
#> i 2 of 3 resampling: plus_holidays_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations A: x1
#> There were issues with some computations A: x1
#>
#> ✔ 2 of 3 resampling: plus_holidays_lm (693ms)
#> i 3 of 3 tuning: plus_pca_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations A: x4
#> There were issues with some computations A: x4
#>
#> ✔ 3 of 3 tuning: plus_pca_lm (2.3s)
chi_features_res_new
#> # A workflow set/tibble: 3 × 4
#> wflow_id info option result
#> <chr> <list> <list> <list>
#> 1 date_lm <tibble [1 × 4]> <opts[3]> <rsmp[+]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[3]> <rsmp[+]>
#> 3 plus_pca_lm <tibble [1 × 4]> <opts[3]> <tune[+]>
# sort models by performance metrics
rank_results(chi_features_res_new)
#> # A tibble: 12 × 9
#> wflow_id .config .metric mean std_err n preprocessor model rank
#> <chr> <chr> <chr> <dbl> <dbl> <int> <chr> <chr> <int>
#> 1 plus_pca_… Prepro… rmse 0.586 NA 1 recipe line… 1
#> 2 plus_pca_… Prepro… rsq 0.989 NA 1 recipe line… 1
#> 3 plus_pca_… Prepro… rmse 0.590 NA 1 recipe line… 2
#> 4 plus_pca_… Prepro… rsq 0.988 NA 1 recipe line… 2
#> 5 plus_pca_… Prepro… rmse 0.591 NA 1 recipe line… 3
#> 6 plus_pca_… Prepro… rsq 0.988 NA 1 recipe line… 3
#> 7 plus_pca_… Prepro… rmse 0.594 NA 1 recipe line… 4
#> 8 plus_pca_… Prepro… rsq 0.989 NA 1 recipe line… 4
#> 9 plus_holi… Prepro… rmse 0.646 NA 1 recipe line… 5
#> 10 plus_holi… Prepro… rsq 0.986 NA 1 recipe line… 5
#> 11 date_lm Prepro… rmse 0.733 NA 1 recipe line… 6
#> 12 date_lm Prepro… rsq 0.982 NA 1 recipe line… 6
# fit the numerically optimal configuration to the training set
chi_features_wf <- fit_best(chi_features_res_new)
chi_features_wf
#> ══ Workflow [trained] ════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#>
#> ── Preprocessor ──────────────────────────────────────────────────────────
#> 5 Recipe Steps
#>
#> • step_date()
#> • step_holiday()
#> • step_dummy()
#> • step_zv()
#> • step_pca()
#>
#> ── Model ─────────────────────────────────────────────────────────────────
#>
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#>
#> Coefficients:
#> (Intercept) temp_min temp
#> 5.067e+02 -4.811e-04 6.885e-02
#> temp_max temp_change dew
#> 9.511e-04 NA -5.110e-02
#> humidity pressure pressure_change
#> 2.516e-02 6.921e-01 2.230e-02
#> wind wind_max gust
#> -1.642e-02 1.409e-04 3.146e-03
#> gust_max percip percip_max
#> 7.870e-03 -7.111e+00 2.199e-01
#> weather_rain weather_snow weather_cloud
#> -6.168e-01 -2.689e-01 -9.951e-02
#> weather_storm Blackhawks_Away Blackhawks_Home
#> 2.603e-01 -1.245e-01 -1.114e-01
#> Bulls_Away Bulls_Home Bears_Away
#> 9.407e-02 1.833e-01 3.306e-01
#> Bears_Home WhiteSox_Away WhiteSox_Home
#> 3.531e-01 -5.198e-01 NA
#> Cubs_Away Cubs_Home date_year
#> NA NA -2.638e-01
#> date_LaborDay date_NewYearsDay date_ChristmasDay
#> 5.166e-01 -1.275e+01 -1.308e+01
#> date_dow_Mon date_dow_Tue date_dow_Wed
#> 1.232e+01 1.345e+01 1.348e+01
#> date_dow_Thu date_dow_Fri date_dow_Sat
#> 1.325e+01 1.281e+01 9.855e-01
#> date_month_Feb date_month_Mar date_month_Apr
#> 4.218e-02 3.897e-01 5.472e-01
#> date_month_May date_month_Jun date_month_Jul
#> 2.842e-01 9.032e-01 3.897e-01
#> date_month_Aug date_month_Sep date_month_Oct
#> 4.855e-01 1.588e-01 6.197e-01
#> date_month_Nov date_month_Dec PC1
#> -4.350e-01 -8.359e-01 2.979e-02
#> PC2 PC3
#> 1.225e-01 -1.722e-01
#>
# to select optimal value based on a specific metric:
fit_best(chi_features_res_new, metric = "rmse")
#> ══ Workflow [trained] ════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#>
#> ── Preprocessor ──────────────────────────────────────────────────────────
#> 5 Recipe Steps
#>
#> • step_date()
#> • step_holiday()
#> • step_dummy()
#> • step_zv()
#> • step_pca()
#>
#> ── Model ─────────────────────────────────────────────────────────────────
#>
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#>
#> Coefficients:
#> (Intercept) temp_min temp
#> 5.067e+02 -4.811e-04 6.885e-02
#> temp_max temp_change dew
#> 9.511e-04 NA -5.110e-02
#> humidity pressure pressure_change
#> 2.516e-02 6.921e-01 2.230e-02
#> wind wind_max gust
#> -1.642e-02 1.409e-04 3.146e-03
#> gust_max percip percip_max
#> 7.870e-03 -7.111e+00 2.199e-01
#> weather_rain weather_snow weather_cloud
#> -6.168e-01 -2.689e-01 -9.951e-02
#> weather_storm Blackhawks_Away Blackhawks_Home
#> 2.603e-01 -1.245e-01 -1.114e-01
#> Bulls_Away Bulls_Home Bears_Away
#> 9.407e-02 1.833e-01 3.306e-01
#> Bears_Home WhiteSox_Away WhiteSox_Home
#> 3.531e-01 -5.198e-01 NA
#> Cubs_Away Cubs_Home date_year
#> NA NA -2.638e-01
#> date_LaborDay date_NewYearsDay date_ChristmasDay
#> 5.166e-01 -1.275e+01 -1.308e+01
#> date_dow_Mon date_dow_Tue date_dow_Wed
#> 1.232e+01 1.345e+01 1.348e+01
#> date_dow_Thu date_dow_Fri date_dow_Sat
#> 1.325e+01 1.281e+01 9.855e-01
#> date_month_Feb date_month_Mar date_month_Apr
#> 4.218e-02 3.897e-01 5.472e-01
#> date_month_May date_month_Jun date_month_Jul
#> 2.842e-01 9.032e-01 3.897e-01
#> date_month_Aug date_month_Sep date_month_Oct
#> 4.855e-01 1.588e-01 6.197e-01
#> date_month_Nov date_month_Dec PC1
#> -4.350e-01 -8.359e-01 2.979e-02
#> PC2 PC3
#> 1.225e-01 -1.722e-01
#>
相关用法
- R workflowsets extract_workflow_set_result 提取工作流集的元素
- R workflowsets comment_add 为工作流程添加注释和评论
- R workflowsets option_add 添加和编辑工作流程集中保存的选项
- R workflowsets leave_var_out_formulas 创建没有每个预测变量的公式
- R workflowsets collect_metrics.workflow_set 获取并格式化通过调整工作流集函数生成的结果
- R workflowsets workflow_map 处理一系列工作流程
- R workflowsets as_workflow_set 将现有对象转换为工作流集
- R workflowsets option_list 制作一个分类的选项列表
- R workflowsets rank_results 按指标对结果进行排名
- R workflowsets workflow_set 从预处理和模型对象生成一组工作流对象
- R workflowsets pull_workflow_set_result 从工作流集中提取元素
- R workflowsets autoplot.workflow_set 绘制工作流程集的结果
- R workflowsets update_workflow_model 更新工作流集中的工作流组件
- R workflows add_model 将模型添加到工作流程
- R workflows workflow 创建工作流程
- R workflows extract-workflow 提取工作流程的元素
- R workflows add_variables 将变量添加到工作流程
- R workflows add_formula 将公式术语添加到工作流程
- R workflows predict-workflow 从工作流程进行预测
- R workflows augment.workflow 通过预测增强数据
- R workflows add_recipe 将配方添加到工作流程
- R workflows glance.workflow 工作流程模型一览
- R workflows is_trained_workflow 确定工作流程是否经过训练
- R workflows fit-workflow 适合工作流对象
- R workflows add_case_weights 将案例权重添加到工作流程
注:本文由纯净天空筛选整理自Max Kuhn等大神的英文原创作品 Fit a model to the numerically optimal configuration。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。