workflow_map()
將在集合中的工作流中執行相同的函數。可以使用各種 tune_*()
函數以及 tune::fit_resamples()
。
參數
- object
-
工作流程集。
- fn
-
要運行的函數的名稱,作為字符。可接受的值為: "tune_grid" 、 "tune_bayes" 、 "fit_resamples" 、 "tune_race_anova" 、 "tune_race_win_loss" 或 "tune_sim_anneal" 。請注意,用戶無需在此參數中提供名稱空間或括號,例如提供
"tune_grid"
而不是"tune::tune_grid"
或"tune_grid()"
。 - verbose
-
記錄進度的邏輯。
- seed
-
在每個函數執行之前設置的單個整數。
- ...
-
傳遞給建模函數的選項。請參閱下麵的詳細信息。
值
更新的工作流程集。 option
列將使用為 workflow_map()
提供的 tune
包函數的任何選項進行更新。此外,結果將添加到 result
列。如果工作流的計算失敗,將保存 try-catch
對象來代替結果(不停止執行)。
細節
傳遞選項時,...
中傳遞的任何內容都將與 option
列中的任何值組合。 ...
中的值將覆蓋該列的值,並且新選項將添加到 options
列中。
執行中的任何失敗都會導致 results
的相應行包含 try-error
對象。
如果模型沒有調整參數映射到調整函數之一,則將使用 tune::fit_resamples()
代替,並且如果 verbose = TRUE
則會發出警告。
如果工作流需要未安裝的包,則會打印一條消息,並且 workflow_map()
繼續執行下一個工作流(如果有)。
注意
該軟件包提供兩個預生成的工作流程集 two_class_set
和 chi_features_set
,以及適合 two_class_res
和 chi_features_res
的相關模型集。
two_class_*
對象基於使用 modeldata 包中的 two_class_dat
數據的二元分類問題。這六個模型利用裸公式或基本配方,利用 recipes::step_YeoJohnson()
作為預處理器,以及決策樹、邏輯回歸或 MARS 模型規範。有關源代碼,請參閱?two_class_set
。
chi_features_*
對象基於使用 modeldata 包中的 Chicago
數據的回歸問題。這三個模型均采用線性回歸模型規範,具有不同複雜性的三種不同配方。這些對象旨在近似 Kuhn 和 Johnson (2019) 第 1.3 節中構建的模型序列。有關源代碼,請參閱?chi_features_set
。
例子
library(workflowsets)
library(workflows)
library(modeldata)
library(recipes)
library(parsnip)
library(dplyr)
library(rsample)
library(tune)
library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Use the argument `event_level = "second"` to alter this as needed.
library(dials)
#> Loading required package: scales
#>
#> Attaching package: ‘scales’
#> The following object is masked from ‘package:purrr’:
#>
#> discard
# An example of processed results
chi_features_res
#> # A workflow set/tibble: 3 × 4
#> wflow_id info option result
#> <chr> <list> <list> <list>
#> 1 date_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 3 plus_pca_lm <tibble [1 × 4]> <opts[3]> <tune[+]>
# Recreating them:
# ---------------------------------------------------------------------------
data(Chicago)
Chicago <- Chicago[1:1195,]
time_val_split <-
sliding_period(
Chicago,
date,
"month",
lookback = 38,
assess_stop = 1
)
# ---------------------------------------------------------------------------
base_recipe <-
recipe(ridership ~ ., data = Chicago) %>%
# create date features
step_date(date) %>%
step_holiday(date) %>%
# remove date from the list of predictors
update_role(date, new_role = "id") %>%
# create dummy variables from factor columns
step_dummy(all_nominal()) %>%
# remove any columns with a single unique value
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
date_only <-
recipe(ridership ~ ., data = Chicago) %>%
# create date features
step_date(date) %>%
update_role(date, new_role = "id") %>%
# create dummy variables from factor columns
step_dummy(all_nominal()) %>%
# remove any columns with a single unique value
step_zv(all_predictors())
date_and_holidays <-
recipe(ridership ~ ., data = Chicago) %>%
# create date features
step_date(date) %>%
step_holiday(date) %>%
# remove date from the list of predictors
update_role(date, new_role = "id") %>%
# create dummy variables from factor columns
step_dummy(all_nominal()) %>%
# remove any columns with a single unique value
step_zv(all_predictors())
date_and_holidays_and_pca <-
recipe(ridership ~ ., data = Chicago) %>%
# create date features
step_date(date) %>%
step_holiday(date) %>%
# remove date from the list of predictors
update_role(date, new_role = "id") %>%
# create dummy variables from factor columns
step_dummy(all_nominal()) %>%
# remove any columns with a single unique value
step_zv(all_predictors()) %>%
step_pca(!!stations, num_comp = tune())
# ---------------------------------------------------------------------------
lm_spec <- linear_reg() %>% set_engine("lm")
# ---------------------------------------------------------------------------
pca_param <-
parameters(num_comp()) %>%
update(num_comp = num_comp(c(0, 20)))
# ---------------------------------------------------------------------------
chi_features_set <-
workflow_set(
preproc = list(date = date_only,
plus_holidays = date_and_holidays,
plus_pca = date_and_holidays_and_pca),
models = list(lm = lm_spec),
cross = TRUE
)
# ---------------------------------------------------------------------------
chi_features_res_new <-
chi_features_set %>%
option_add(param_info = pca_param, id = "plus_pca_lm") %>%
workflow_map(resamples = time_val_split, grid = 21, seed = 1, verbose = TRUE)
#> i No tuning parameters. `fit_resamples()` will be attempted
#> i 1 of 3 resampling: date_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations A: x1
#> There were issues with some computations A: x1
#>
#> ✔ 1 of 3 resampling: date_lm (547ms)
#> i No tuning parameters. `fit_resamples()` will be attempted
#> i 2 of 3 resampling: plus_holidays_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations A: x1
#> There were issues with some computations A: x1
#>
#> ✔ 2 of 3 resampling: plus_holidays_lm (601ms)
#> i 3 of 3 tuning: plus_pca_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations A: x4
#> There were issues with some computations A: x10
#> There were issues with some computations A: x16
#> There were issues with some computations A: x18
#>
#> ✔ 3 of 3 tuning: plus_pca_lm (8.9s)
chi_features_res_new
#> # A workflow set/tibble: 3 × 4
#> wflow_id info option result
#> <chr> <list> <list> <list>
#> 1 date_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 3 plus_pca_lm <tibble [1 × 4]> <opts[3]> <tune[+]>
相關用法
- R workflowsets workflow_set 從預處理和模型對象生成一組工作流對象
- R workflowsets extract_workflow_set_result 提取工作流集的元素
- R workflowsets comment_add 為工作流程添加注釋和評論
- R workflowsets option_add 添加和編輯工作流程集中保存的選項
- R workflowsets fit_best.workflow_set 將模型擬合到數值最優配置
- R workflowsets leave_var_out_formulas 創建沒有每個預測變量的公式
- R workflowsets collect_metrics.workflow_set 獲取並格式化通過調整工作流集函數生成的結果
- R workflowsets as_workflow_set 將現有對象轉換為工作流集
- R workflowsets option_list 製作一個分類的選項列表
- R workflowsets rank_results 按指標對結果進行排名
- R workflowsets pull_workflow_set_result 從工作流集中提取元素
- R workflowsets autoplot.workflow_set 繪製工作流程集的結果
- R workflowsets update_workflow_model 更新工作流集中的工作流組件
- R workflows add_model 將模型添加到工作流程
- R workflows workflow 創建工作流程
- R workflows extract-workflow 提取工作流程的元素
- R workflows add_variables 將變量添加到工作流程
- R workflows add_formula 將公式術語添加到工作流程
- R workflows predict-workflow 從工作流程進行預測
- R workflows augment.workflow 通過預測增強數據
- R workflows add_recipe 將配方添加到工作流程
- R workflows glance.workflow 工作流程模型一覽
- R workflows is_trained_workflow 確定工作流程是否經過訓練
- R workflows fit-workflow 適合工作流對象
- R workflows add_case_weights 將案例權重添加到工作流程
注:本文由純淨天空篩選整理自Max Kuhn等大神的英文原創作品 Process a series of workflows。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。