workflow_map()
将在集合中的工作流中执行相同的函数。可以使用各种 tune_*()
函数以及 tune::fit_resamples()
。
参数
- object
-
工作流程集。
- fn
-
要运行的函数的名称,作为字符。可接受的值为: "tune_grid" 、 "tune_bayes" 、 "fit_resamples" 、 "tune_race_anova" 、 "tune_race_win_loss" 或 "tune_sim_anneal" 。请注意,用户无需在此参数中提供名称空间或括号,例如提供
"tune_grid"
而不是"tune::tune_grid"
或"tune_grid()"
。 - verbose
-
记录进度的逻辑。
- seed
-
在每个函数执行之前设置的单个整数。
- ...
-
传递给建模函数的选项。请参阅下面的详细信息。
值
更新的工作流程集。 option
列将使用为 workflow_map()
提供的 tune
包函数的任何选项进行更新。此外,结果将添加到 result
列。如果工作流的计算失败,将保存 try-catch
对象来代替结果(不停止执行)。
细节
传递选项时,...
中传递的任何内容都将与 option
列中的任何值组合。 ...
中的值将覆盖该列的值,并且新选项将添加到 options
列中。
执行中的任何失败都会导致 results
的相应行包含 try-error
对象。
如果模型没有调整参数映射到调整函数之一,则将使用 tune::fit_resamples()
代替,并且如果 verbose = TRUE
则会发出警告。
如果工作流需要未安装的包,则会打印一条消息,并且 workflow_map()
继续执行下一个工作流(如果有)。
注意
该软件包提供两个预生成的工作流程集 two_class_set
和 chi_features_set
,以及适合 two_class_res
和 chi_features_res
的相关模型集。
two_class_*
对象基于使用 modeldata 包中的 two_class_dat
数据的二元分类问题。这六个模型利用裸公式或基本配方,利用 recipes::step_YeoJohnson()
作为预处理器,以及决策树、逻辑回归或 MARS 模型规范。有关源代码,请参阅?two_class_set
。
chi_features_*
对象基于使用 modeldata 包中的 Chicago
数据的回归问题。这三个模型均采用线性回归模型规范,具有不同复杂性的三种不同配方。这些对象旨在近似 Kuhn 和 Johnson (2019) 第 1.3 节中构建的模型序列。有关源代码,请参阅?chi_features_set
。
例子
library(workflowsets)
library(workflows)
library(modeldata)
library(recipes)
library(parsnip)
library(dplyr)
library(rsample)
library(tune)
library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Use the argument `event_level = "second"` to alter this as needed.
library(dials)
#> Loading required package: scales
#>
#> Attaching package: ‘scales’
#> The following object is masked from ‘package:purrr’:
#>
#> discard
# An example of processed results
chi_features_res
#> # A workflow set/tibble: 3 × 4
#> wflow_id info option result
#> <chr> <list> <list> <list>
#> 1 date_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 3 plus_pca_lm <tibble [1 × 4]> <opts[3]> <tune[+]>
# Recreating them:
# ---------------------------------------------------------------------------
data(Chicago)
Chicago <- Chicago[1:1195,]
time_val_split <-
sliding_period(
Chicago,
date,
"month",
lookback = 38,
assess_stop = 1
)
# ---------------------------------------------------------------------------
base_recipe <-
recipe(ridership ~ ., data = Chicago) %>%
# create date features
step_date(date) %>%
step_holiday(date) %>%
# remove date from the list of predictors
update_role(date, new_role = "id") %>%
# create dummy variables from factor columns
step_dummy(all_nominal()) %>%
# remove any columns with a single unique value
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
date_only <-
recipe(ridership ~ ., data = Chicago) %>%
# create date features
step_date(date) %>%
update_role(date, new_role = "id") %>%
# create dummy variables from factor columns
step_dummy(all_nominal()) %>%
# remove any columns with a single unique value
step_zv(all_predictors())
date_and_holidays <-
recipe(ridership ~ ., data = Chicago) %>%
# create date features
step_date(date) %>%
step_holiday(date) %>%
# remove date from the list of predictors
update_role(date, new_role = "id") %>%
# create dummy variables from factor columns
step_dummy(all_nominal()) %>%
# remove any columns with a single unique value
step_zv(all_predictors())
date_and_holidays_and_pca <-
recipe(ridership ~ ., data = Chicago) %>%
# create date features
step_date(date) %>%
step_holiday(date) %>%
# remove date from the list of predictors
update_role(date, new_role = "id") %>%
# create dummy variables from factor columns
step_dummy(all_nominal()) %>%
# remove any columns with a single unique value
step_zv(all_predictors()) %>%
step_pca(!!stations, num_comp = tune())
# ---------------------------------------------------------------------------
lm_spec <- linear_reg() %>% set_engine("lm")
# ---------------------------------------------------------------------------
pca_param <-
parameters(num_comp()) %>%
update(num_comp = num_comp(c(0, 20)))
# ---------------------------------------------------------------------------
chi_features_set <-
workflow_set(
preproc = list(date = date_only,
plus_holidays = date_and_holidays,
plus_pca = date_and_holidays_and_pca),
models = list(lm = lm_spec),
cross = TRUE
)
# ---------------------------------------------------------------------------
chi_features_res_new <-
chi_features_set %>%
option_add(param_info = pca_param, id = "plus_pca_lm") %>%
workflow_map(resamples = time_val_split, grid = 21, seed = 1, verbose = TRUE)
#> i No tuning parameters. `fit_resamples()` will be attempted
#> i 1 of 3 resampling: date_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations A: x1
#> There were issues with some computations A: x1
#>
#> ✔ 1 of 3 resampling: date_lm (547ms)
#> i No tuning parameters. `fit_resamples()` will be attempted
#> i 2 of 3 resampling: plus_holidays_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations A: x1
#> There were issues with some computations A: x1
#>
#> ✔ 2 of 3 resampling: plus_holidays_lm (601ms)
#> i 3 of 3 tuning: plus_pca_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations A: x4
#> There were issues with some computations A: x10
#> There were issues with some computations A: x16
#> There were issues with some computations A: x18
#>
#> ✔ 3 of 3 tuning: plus_pca_lm (8.9s)
chi_features_res_new
#> # A workflow set/tibble: 3 × 4
#> wflow_id info option result
#> <chr> <list> <list> <list>
#> 1 date_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 3 plus_pca_lm <tibble [1 × 4]> <opts[3]> <tune[+]>
相关用法
- R workflowsets workflow_set 从预处理和模型对象生成一组工作流对象
- R workflowsets extract_workflow_set_result 提取工作流集的元素
- R workflowsets comment_add 为工作流程添加注释和评论
- R workflowsets option_add 添加和编辑工作流程集中保存的选项
- R workflowsets fit_best.workflow_set 将模型拟合到数值最优配置
- R workflowsets leave_var_out_formulas 创建没有每个预测变量的公式
- R workflowsets collect_metrics.workflow_set 获取并格式化通过调整工作流集函数生成的结果
- R workflowsets as_workflow_set 将现有对象转换为工作流集
- R workflowsets option_list 制作一个分类的选项列表
- R workflowsets rank_results 按指标对结果进行排名
- R workflowsets pull_workflow_set_result 从工作流集中提取元素
- R workflowsets autoplot.workflow_set 绘制工作流程集的结果
- R workflowsets update_workflow_model 更新工作流集中的工作流组件
- R workflows add_model 将模型添加到工作流程
- R workflows workflow 创建工作流程
- R workflows extract-workflow 提取工作流程的元素
- R workflows add_variables 将变量添加到工作流程
- R workflows add_formula 将公式术语添加到工作流程
- R workflows predict-workflow 从工作流程进行预测
- R workflows augment.workflow 通过预测增强数据
- R workflows add_recipe 将配方添加到工作流程
- R workflows glance.workflow 工作流程模型一览
- R workflows is_trained_workflow 确定工作流程是否经过训练
- R workflows fit-workflow 适合工作流对象
- R workflows add_case_weights 将案例权重添加到工作流程
注:本文由纯净天空筛选整理自Max Kuhn等大神的英文原创作品 Process a series of workflows。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。