R workflowsets fit_best.workflow_set 将模型拟合到数值最优配置

fit_best() 从调整许多模型中获取结果，并将与最佳性能相关的工作流配置与训练集相匹配。

用法

# S3 method for workflow_set
fit_best(x, metric = NULL, ...)

参数

x: 已使用 workflow_map() 求值的 workflow_set 对象。请注意，工作流程集必须已安装 control option save_workflow = TRUE 。
metric: 给出对结果进行排名的指标的字符串。
...: 要传递给 tune::fit_best 的其他选项。

细节

此函数是在拟合的工作流程集中拟合数值最佳配置所需步骤的快捷方式。该函数对结果进行排名，提取与最佳结果相关的调整结果，然后再次对包含最佳结果的调整结果调用fit_best()(本身是一个包装器)。

在伪代码中：

rankings <- rank_results(wf_set, metric, select_best = TRUE)
tune_res <- extract_workflow_set_result(wf_set, rankings$wflow_id[1])
fit_best(tune_res, metric)

注意

该软件包提供两个预生成的工作流程集 two_class_set 和 chi_features_set ，以及适合 two_class_res 和 chi_features_res 的相关模型集。

two_class_* 对象基于使用 modeldata 包中的 two_class_dat 数据的二元分类问题。这六个模型利用裸公式或基本配方，利用 recipes::step_YeoJohnson() 作为预处理器，以及决策树、逻辑回归或 MARS 模型规范。有关源代码，请参阅?two_class_set。

chi_features_* 对象基于使用 modeldata 包中的 Chicago 数据的回归问题。这三个模型均采用线性回归模型规范，具有不同复杂性的三种不同配方。这些对象旨在近似 Kuhn 和 Johnson (2019) 第 1.3 节中构建的模型序列。有关源代码，请参阅?chi_features_set。

例子

library(tune)
library(modeldata)
library(rsample)

data(Chicago)
Chicago <- Chicago[1:1195,]

time_val_split <-
   sliding_period(
      Chicago,
      date,
      "month",
      lookback = 38,
      assess_stop = 1
   )

chi_features_set
#> # A workflow set/tibble: 3 × 4
#>   wflow_id         info             option    result    
#>   <chr>            <list>           <list>    <list>    
#> 1 date_lm          <tibble [1 × 4]> <opts[0]> <list [0]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[0]> <list [0]>
#> 3 plus_pca_lm      <tibble [1 × 4]> <opts[0]> <list [0]>

chi_features_res_new <-
   chi_features_set %>%
   # note: must set `save_workflow = TRUE` to use `fit_best()`
   option_add(control = control_grid(save_workflow = TRUE)) %>%
   # evaluate with resamples
   workflow_map(resamples = time_val_split, grid = 21, seed = 1, verbose = TRUE)
#> i	No tuning parameters. `fit_resamples()` will be attempted
#> i 1 of 3 resampling: date_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations   A: x1
#> There were issues with some computations   A: x1
#> 
#> ✔ 1 of 3 resampling: date_lm (662ms)
#> i	No tuning parameters. `fit_resamples()` will be attempted
#> i 2 of 3 resampling: plus_holidays_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations   A: x1
#> There were issues with some computations   A: x1
#> 
#> ✔ 2 of 3 resampling: plus_holidays_lm (693ms)
#> i 3 of 3 tuning:     plus_pca_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations   A: x4
#> There were issues with some computations   A: x4
#> 
#> ✔ 3 of 3 tuning:     plus_pca_lm (2.3s)

chi_features_res_new
#> # A workflow set/tibble: 3 × 4
#>   wflow_id         info             option    result   
#>   <chr>            <list>           <list>    <list>   
#> 1 date_lm          <tibble [1 × 4]> <opts[3]> <rsmp[+]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[3]> <rsmp[+]>
#> 3 plus_pca_lm      <tibble [1 × 4]> <opts[3]> <tune[+]>

# sort models by performance metrics
rank_results(chi_features_res_new)
#> # A tibble: 12 × 9
#>    wflow_id   .config .metric  mean std_err     n preprocessor model  rank
#>    <chr>      <chr>   <chr>   <dbl>   <dbl> <int> <chr>        <chr> <int>
#>  1 plus_pca_… Prepro… rmse    0.586      NA     1 recipe       line…     1
#>  2 plus_pca_… Prepro… rsq     0.989      NA     1 recipe       line…     1
#>  3 plus_pca_… Prepro… rmse    0.590      NA     1 recipe       line…     2
#>  4 plus_pca_… Prepro… rsq     0.988      NA     1 recipe       line…     2
#>  5 plus_pca_… Prepro… rmse    0.591      NA     1 recipe       line…     3
#>  6 plus_pca_… Prepro… rsq     0.988      NA     1 recipe       line…     3
#>  7 plus_pca_… Prepro… rmse    0.594      NA     1 recipe       line…     4
#>  8 plus_pca_… Prepro… rsq     0.989      NA     1 recipe       line…     4
#>  9 plus_holi… Prepro… rmse    0.646      NA     1 recipe       line…     5
#> 10 plus_holi… Prepro… rsq     0.986      NA     1 recipe       line…     5
#> 11 date_lm    Prepro… rmse    0.733      NA     1 recipe       line…     6
#> 12 date_lm    Prepro… rsq     0.982      NA     1 recipe       line…     6

# fit the numerically optimal configuration to the training set
chi_features_wf <- fit_best(chi_features_res_new)

chi_features_wf
#> ══ Workflow [trained] ════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#> 
#> ── Preprocessor ──────────────────────────────────────────────────────────
#> 5 Recipe Steps
#> 
#> • step_date()
#> • step_holiday()
#> • step_dummy()
#> • step_zv()
#> • step_pca()
#> 
#> ── Model ─────────────────────────────────────────────────────────────────
#> 
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#> 
#> Coefficients:
#>       (Intercept)           temp_min               temp  
#>         5.067e+02         -4.811e-04          6.885e-02  
#>          temp_max        temp_change                dew  
#>         9.511e-04                 NA         -5.110e-02  
#>          humidity           pressure    pressure_change  
#>         2.516e-02          6.921e-01          2.230e-02  
#>              wind           wind_max               gust  
#>        -1.642e-02          1.409e-04          3.146e-03  
#>          gust_max             percip         percip_max  
#>         7.870e-03         -7.111e+00          2.199e-01  
#>      weather_rain       weather_snow      weather_cloud  
#>        -6.168e-01         -2.689e-01         -9.951e-02  
#>     weather_storm    Blackhawks_Away    Blackhawks_Home  
#>         2.603e-01         -1.245e-01         -1.114e-01  
#>        Bulls_Away         Bulls_Home         Bears_Away  
#>         9.407e-02          1.833e-01          3.306e-01  
#>        Bears_Home      WhiteSox_Away      WhiteSox_Home  
#>         3.531e-01         -5.198e-01                 NA  
#>         Cubs_Away          Cubs_Home          date_year  
#>                NA                 NA         -2.638e-01  
#>     date_LaborDay   date_NewYearsDay  date_ChristmasDay  
#>         5.166e-01         -1.275e+01         -1.308e+01  
#>      date_dow_Mon       date_dow_Tue       date_dow_Wed  
#>         1.232e+01          1.345e+01          1.348e+01  
#>      date_dow_Thu       date_dow_Fri       date_dow_Sat  
#>         1.325e+01          1.281e+01          9.855e-01  
#>    date_month_Feb     date_month_Mar     date_month_Apr  
#>         4.218e-02          3.897e-01          5.472e-01  
#>    date_month_May     date_month_Jun     date_month_Jul  
#>         2.842e-01          9.032e-01          3.897e-01  
#>    date_month_Aug     date_month_Sep     date_month_Oct  
#>         4.855e-01          1.588e-01          6.197e-01  
#>    date_month_Nov     date_month_Dec                PC1  
#>        -4.350e-01         -8.359e-01          2.979e-02  
#>               PC2                PC3  
#>         1.225e-01         -1.722e-01  
#> 

# to select optimal value based on a specific metric:
fit_best(chi_features_res_new, metric = "rmse")
#> ══ Workflow [trained] ════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#> 
#> ── Preprocessor ──────────────────────────────────────────────────────────
#> 5 Recipe Steps
#> 
#> • step_date()
#> • step_holiday()
#> • step_dummy()
#> • step_zv()
#> • step_pca()
#> 
#> ── Model ─────────────────────────────────────────────────────────────────
#> 
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#> 
#> Coefficients:
#>       (Intercept)           temp_min               temp  
#>         5.067e+02         -4.811e-04          6.885e-02  
#>          temp_max        temp_change                dew  
#>         9.511e-04                 NA         -5.110e-02  
#>          humidity           pressure    pressure_change  
#>         2.516e-02          6.921e-01          2.230e-02  
#>              wind           wind_max               gust  
#>        -1.642e-02          1.409e-04          3.146e-03  
#>          gust_max             percip         percip_max  
#>         7.870e-03         -7.111e+00          2.199e-01  
#>      weather_rain       weather_snow      weather_cloud  
#>        -6.168e-01         -2.689e-01         -9.951e-02  
#>     weather_storm    Blackhawks_Away    Blackhawks_Home  
#>         2.603e-01         -1.245e-01         -1.114e-01  
#>        Bulls_Away         Bulls_Home         Bears_Away  
#>         9.407e-02          1.833e-01          3.306e-01  
#>        Bears_Home      WhiteSox_Away      WhiteSox_Home  
#>         3.531e-01         -5.198e-01                 NA  
#>         Cubs_Away          Cubs_Home          date_year  
#>                NA                 NA         -2.638e-01  
#>     date_LaborDay   date_NewYearsDay  date_ChristmasDay  
#>         5.166e-01         -1.275e+01         -1.308e+01  
#>      date_dow_Mon       date_dow_Tue       date_dow_Wed  
#>         1.232e+01          1.345e+01          1.348e+01  
#>      date_dow_Thu       date_dow_Fri       date_dow_Sat  
#>         1.325e+01          1.281e+01          9.855e-01  
#>    date_month_Feb     date_month_Mar     date_month_Apr  
#>         4.218e-02          3.897e-01          5.472e-01  
#>    date_month_May     date_month_Jun     date_month_Jul  
#>         2.842e-01          9.032e-01          3.897e-01  
#>    date_month_Aug     date_month_Sep     date_month_Oct  
#>         4.855e-01          1.588e-01          6.197e-01  
#>    date_month_Nov     date_month_Dec                PC1  
#>        -4.350e-01         -8.359e-01          2.979e-02  
#>               PC2                PC3  
#>         1.225e-01         -1.722e-01  
#>

源代码：R/fit_best.R

相关用法

注：本文由纯净天空筛选整理自Max Kuhn等大神的英文原创作品 Fit a model to the numerically optimal configuration。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。