R workflowsets fit_best.workflow_set 將模型擬合到數值最優配置

fit_best() 從調整許多模型中獲取結果，並將與最佳性能相關的工作流配置與訓練集相匹配。

用法

# S3 method for workflow_set
fit_best(x, metric = NULL, ...)

參數

x: 已使用 workflow_map() 求值的 workflow_set 對象。請注意，工作流程集必須已安裝 control option save_workflow = TRUE 。
metric: 給出對結果進行排名的指標的字符串。
...: 要傳遞給 tune::fit_best 的其他選項。

細節

此函數是在擬合的工作流程集中擬合數值最佳配置所需步驟的快捷方式。該函數對結果進行排名，提取與最佳結果相關的調整結果，然後再次對包含最佳結果的調整結果調用fit_best()(本身是一個包裝器)。

在偽代碼中：

rankings <- rank_results(wf_set, metric, select_best = TRUE)
tune_res <- extract_workflow_set_result(wf_set, rankings$wflow_id[1])
fit_best(tune_res, metric)

注意

該軟件包提供兩個預生成的工作流程集 two_class_set 和 chi_features_set ，以及適合 two_class_res 和 chi_features_res 的相關模型集。

two_class_* 對象基於使用 modeldata 包中的 two_class_dat 數據的二元分類問題。這六個模型利用裸公式或基本配方，利用 recipes::step_YeoJohnson() 作為預處理器，以及決策樹、邏輯回歸或 MARS 模型規範。有關源代碼，請參閱?two_class_set。

chi_features_* 對象基於使用 modeldata 包中的 Chicago 數據的回歸問題。這三個模型均采用線性回歸模型規範，具有不同複雜性的三種不同配方。這些對象旨在近似 Kuhn 和 Johnson (2019) 第 1.3 節中構建的模型序列。有關源代碼，請參閱?chi_features_set。

例子

library(tune)
library(modeldata)
library(rsample)

data(Chicago)
Chicago <- Chicago[1:1195,]

time_val_split <-
   sliding_period(
      Chicago,
      date,
      "month",
      lookback = 38,
      assess_stop = 1
   )

chi_features_set
#> # A workflow set/tibble: 3 × 4
#>   wflow_id         info             option    result    
#>   <chr>            <list>           <list>    <list>    
#> 1 date_lm          <tibble [1 × 4]> <opts[0]> <list [0]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[0]> <list [0]>
#> 3 plus_pca_lm      <tibble [1 × 4]> <opts[0]> <list [0]>

chi_features_res_new <-
   chi_features_set %>%
   # note: must set `save_workflow = TRUE` to use `fit_best()`
   option_add(control = control_grid(save_workflow = TRUE)) %>%
   # evaluate with resamples
   workflow_map(resamples = time_val_split, grid = 21, seed = 1, verbose = TRUE)
#> i	No tuning parameters. `fit_resamples()` will be attempted
#> i 1 of 3 resampling: date_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations   A: x1
#> There were issues with some computations   A: x1
#> 
#> ✔ 1 of 3 resampling: date_lm (662ms)
#> i	No tuning parameters. `fit_resamples()` will be attempted
#> i 2 of 3 resampling: plus_holidays_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations   A: x1
#> There were issues with some computations   A: x1
#> 
#> ✔ 2 of 3 resampling: plus_holidays_lm (693ms)
#> i 3 of 3 tuning:     plus_pca_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations   A: x4
#> There were issues with some computations   A: x4
#> 
#> ✔ 3 of 3 tuning:     plus_pca_lm (2.3s)

chi_features_res_new
#> # A workflow set/tibble: 3 × 4
#>   wflow_id         info             option    result   
#>   <chr>            <list>           <list>    <list>   
#> 1 date_lm          <tibble [1 × 4]> <opts[3]> <rsmp[+]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[3]> <rsmp[+]>
#> 3 plus_pca_lm      <tibble [1 × 4]> <opts[3]> <tune[+]>

# sort models by performance metrics
rank_results(chi_features_res_new)
#> # A tibble: 12 × 9
#>    wflow_id   .config .metric  mean std_err     n preprocessor model  rank
#>    <chr>      <chr>   <chr>   <dbl>   <dbl> <int> <chr>        <chr> <int>
#>  1 plus_pca_… Prepro… rmse    0.586      NA     1 recipe       line…     1
#>  2 plus_pca_… Prepro… rsq     0.989      NA     1 recipe       line…     1
#>  3 plus_pca_… Prepro… rmse    0.590      NA     1 recipe       line…     2
#>  4 plus_pca_… Prepro… rsq     0.988      NA     1 recipe       line…     2
#>  5 plus_pca_… Prepro… rmse    0.591      NA     1 recipe       line…     3
#>  6 plus_pca_… Prepro… rsq     0.988      NA     1 recipe       line…     3
#>  7 plus_pca_… Prepro… rmse    0.594      NA     1 recipe       line…     4
#>  8 plus_pca_… Prepro… rsq     0.989      NA     1 recipe       line…     4
#>  9 plus_holi… Prepro… rmse    0.646      NA     1 recipe       line…     5
#> 10 plus_holi… Prepro… rsq     0.986      NA     1 recipe       line…     5
#> 11 date_lm    Prepro… rmse    0.733      NA     1 recipe       line…     6
#> 12 date_lm    Prepro… rsq     0.982      NA     1 recipe       line…     6

# fit the numerically optimal configuration to the training set
chi_features_wf <- fit_best(chi_features_res_new)

chi_features_wf
#> ══ Workflow [trained] ════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#> 
#> ── Preprocessor ──────────────────────────────────────────────────────────
#> 5 Recipe Steps
#> 
#> • step_date()
#> • step_holiday()
#> • step_dummy()
#> • step_zv()
#> • step_pca()
#> 
#> ── Model ─────────────────────────────────────────────────────────────────
#> 
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#> 
#> Coefficients:
#>       (Intercept)           temp_min               temp  
#>         5.067e+02         -4.811e-04          6.885e-02  
#>          temp_max        temp_change                dew  
#>         9.511e-04                 NA         -5.110e-02  
#>          humidity           pressure    pressure_change  
#>         2.516e-02          6.921e-01          2.230e-02  
#>              wind           wind_max               gust  
#>        -1.642e-02          1.409e-04          3.146e-03  
#>          gust_max             percip         percip_max  
#>         7.870e-03         -7.111e+00          2.199e-01  
#>      weather_rain       weather_snow      weather_cloud  
#>        -6.168e-01         -2.689e-01         -9.951e-02  
#>     weather_storm    Blackhawks_Away    Blackhawks_Home  
#>         2.603e-01         -1.245e-01         -1.114e-01  
#>        Bulls_Away         Bulls_Home         Bears_Away  
#>         9.407e-02          1.833e-01          3.306e-01  
#>        Bears_Home      WhiteSox_Away      WhiteSox_Home  
#>         3.531e-01         -5.198e-01                 NA  
#>         Cubs_Away          Cubs_Home          date_year  
#>                NA                 NA         -2.638e-01  
#>     date_LaborDay   date_NewYearsDay  date_ChristmasDay  
#>         5.166e-01         -1.275e+01         -1.308e+01  
#>      date_dow_Mon       date_dow_Tue       date_dow_Wed  
#>         1.232e+01          1.345e+01          1.348e+01  
#>      date_dow_Thu       date_dow_Fri       date_dow_Sat  
#>         1.325e+01          1.281e+01          9.855e-01  
#>    date_month_Feb     date_month_Mar     date_month_Apr  
#>         4.218e-02          3.897e-01          5.472e-01  
#>    date_month_May     date_month_Jun     date_month_Jul  
#>         2.842e-01          9.032e-01          3.897e-01  
#>    date_month_Aug     date_month_Sep     date_month_Oct  
#>         4.855e-01          1.588e-01          6.197e-01  
#>    date_month_Nov     date_month_Dec                PC1  
#>        -4.350e-01         -8.359e-01          2.979e-02  
#>               PC2                PC3  
#>         1.225e-01         -1.722e-01  
#> 

# to select optimal value based on a specific metric:
fit_best(chi_features_res_new, metric = "rmse")
#> ══ Workflow [trained] ════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#> 
#> ── Preprocessor ──────────────────────────────────────────────────────────
#> 5 Recipe Steps
#> 
#> • step_date()
#> • step_holiday()
#> • step_dummy()
#> • step_zv()
#> • step_pca()
#> 
#> ── Model ─────────────────────────────────────────────────────────────────
#> 
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#> 
#> Coefficients:
#>       (Intercept)           temp_min               temp  
#>         5.067e+02         -4.811e-04          6.885e-02  
#>          temp_max        temp_change                dew  
#>         9.511e-04                 NA         -5.110e-02  
#>          humidity           pressure    pressure_change  
#>         2.516e-02          6.921e-01          2.230e-02  
#>              wind           wind_max               gust  
#>        -1.642e-02          1.409e-04          3.146e-03  
#>          gust_max             percip         percip_max  
#>         7.870e-03         -7.111e+00          2.199e-01  
#>      weather_rain       weather_snow      weather_cloud  
#>        -6.168e-01         -2.689e-01         -9.951e-02  
#>     weather_storm    Blackhawks_Away    Blackhawks_Home  
#>         2.603e-01         -1.245e-01         -1.114e-01  
#>        Bulls_Away         Bulls_Home         Bears_Away  
#>         9.407e-02          1.833e-01          3.306e-01  
#>        Bears_Home      WhiteSox_Away      WhiteSox_Home  
#>         3.531e-01         -5.198e-01                 NA  
#>         Cubs_Away          Cubs_Home          date_year  
#>                NA                 NA         -2.638e-01  
#>     date_LaborDay   date_NewYearsDay  date_ChristmasDay  
#>         5.166e-01         -1.275e+01         -1.308e+01  
#>      date_dow_Mon       date_dow_Tue       date_dow_Wed  
#>         1.232e+01          1.345e+01          1.348e+01  
#>      date_dow_Thu       date_dow_Fri       date_dow_Sat  
#>         1.325e+01          1.281e+01          9.855e-01  
#>    date_month_Feb     date_month_Mar     date_month_Apr  
#>         4.218e-02          3.897e-01          5.472e-01  
#>    date_month_May     date_month_Jun     date_month_Jul  
#>         2.842e-01          9.032e-01          3.897e-01  
#>    date_month_Aug     date_month_Sep     date_month_Oct  
#>         4.855e-01          1.588e-01          6.197e-01  
#>    date_month_Nov     date_month_Dec                PC1  
#>        -4.350e-01         -8.359e-01          2.979e-02  
#>               PC2                PC3  
#>         1.225e-01         -1.722e-01  
#>

源代碼：R/fit_best.R

相關用法

注：本文由純淨天空篩選整理自Max Kuhn等大神的英文原創作品 Fit a model to the numerically optimal configuration。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。