当前位置: 首页>>代码示例 >>用法及示例精选 >>正文


R workflowsets workflow_map 处理一系列工作流程


workflow_map() 将在集合中的工作流中执行相同的函数。可以使用各种 tune_*() 函数以及 tune::fit_resamples()

用法

workflow_map(
  object,
  fn = "tune_grid",
  verbose = FALSE,
  seed = sample.int(10^4, 1),
  ...
)

参数

object

工作流程集。

fn

要运行的函数的名称,作为字符。可接受的值为: "tune_grid""tune_bayes""fit_resamples""tune_race_anova""tune_race_win_loss""tune_sim_anneal" 。请注意,用户无需在此参数中提供名称空间或括号,例如提供 "tune_grid" 而不是 "tune::tune_grid""tune_grid()"

verbose

记录进度的逻辑。

seed

在每个函数执行之前设置的单个整数。

...

传递给建模函数的选项。请参阅下面的详细信息。

更新的工作流程集。 option 列将使用为 workflow_map() 提供的 tune 包函数的任何选项进行更新。此外,结果将添加到 result 列。如果工作流的计算失败,将保存 try-catch 对象来代替结果(不停止执行)。

细节

传递选项时,... 中传递的任何内容都将与 option 列中的任何值组合。 ... 中的值将覆盖该列的值,并且新选项将添加到 options 列中。

执行中的任何失败都会导致 results 的相应行包含 try-error 对象。

如果模型没有调整参数映射到调整函数之一,则将使用 tune::fit_resamples() 代替,并且如果 verbose = TRUE 则会发出警告。

如果工作流需要未安装的包,则会打印一条消息,并且 workflow_map() 继续执行下一个工作流(如果有)。

注意

该软件包提供两个预生成的工作流程集 two_class_setchi_features_set ,以及适合 two_class_reschi_features_res 的相关模型集。

two_class_* 对象基于使用 modeldata 包中的 two_class_dat 数据的二元分类问题。这六个模型利用裸公式或基本配方,利用 recipes::step_YeoJohnson() 作为预处理器,以及决策树、逻辑回归或 MARS 模型规范。有关源代码,请参阅?two_class_set

chi_features_* 对象基于使用 modeldata 包中的 Chicago 数据的回归问题。这三个模型均采用线性回归模型规范,具有不同复杂性的三种不同配方。这些对象旨在近似 Kuhn 和 Johnson (2019) 第 1.3 节中构建的模型序列。有关源代码,请参阅?chi_features_set

例子

library(workflowsets)
library(workflows)
library(modeldata)
library(recipes)
library(parsnip)
library(dplyr)
library(rsample)
library(tune)
library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Use the argument `event_level = "second"` to alter this as needed.
library(dials)
#> Loading required package: scales
#> 
#> Attaching package: ‘scales’
#> The following object is masked from ‘package:purrr’:
#> 
#>     discard

# An example of processed results
chi_features_res
#> # A workflow set/tibble: 3 × 4
#>   wflow_id         info             option    result   
#>   <chr>            <list>           <list>    <list>   
#> 1 date_lm          <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 3 plus_pca_lm      <tibble [1 × 4]> <opts[3]> <tune[+]>

# Recreating them:

# ---------------------------------------------------------------------------
data(Chicago)
Chicago <- Chicago[1:1195,]

time_val_split <-
   sliding_period(
      Chicago,
      date,
      "month",
      lookback = 38,
      assess_stop = 1
   )

# ---------------------------------------------------------------------------

base_recipe <-
   recipe(ridership ~ ., data = Chicago) %>%
   # create date features
   step_date(date) %>%
   step_holiday(date) %>%
   # remove date from the list of predictors
   update_role(date, new_role = "id") %>%
   # create dummy variables from factor columns
   step_dummy(all_nominal()) %>%
   # remove any columns with a single unique value
   step_zv(all_predictors()) %>%
   step_normalize(all_predictors())

date_only <-
   recipe(ridership ~ ., data = Chicago) %>%
   # create date features
   step_date(date) %>%
   update_role(date, new_role = "id") %>%
   # create dummy variables from factor columns
   step_dummy(all_nominal()) %>%
   # remove any columns with a single unique value
   step_zv(all_predictors())

date_and_holidays <-
   recipe(ridership ~ ., data = Chicago) %>%
   # create date features
   step_date(date) %>%
   step_holiday(date) %>%
   # remove date from the list of predictors
   update_role(date, new_role = "id") %>%
   # create dummy variables from factor columns
   step_dummy(all_nominal()) %>%
   # remove any columns with a single unique value
   step_zv(all_predictors())

date_and_holidays_and_pca <-
   recipe(ridership ~ ., data = Chicago) %>%
   # create date features
   step_date(date) %>%
   step_holiday(date) %>%
   # remove date from the list of predictors
   update_role(date, new_role = "id") %>%
   # create dummy variables from factor columns
   step_dummy(all_nominal()) %>%
   # remove any columns with a single unique value
   step_zv(all_predictors()) %>%
   step_pca(!!stations, num_comp = tune())

# ---------------------------------------------------------------------------

lm_spec <- linear_reg() %>% set_engine("lm")

# ---------------------------------------------------------------------------

pca_param <-
   parameters(num_comp()) %>%
   update(num_comp = num_comp(c(0, 20)))

# ---------------------------------------------------------------------------

chi_features_set <-
   workflow_set(
      preproc = list(date = date_only,
                     plus_holidays = date_and_holidays,
                     plus_pca = date_and_holidays_and_pca),
      models = list(lm = lm_spec),
      cross = TRUE
   )

# ---------------------------------------------------------------------------

chi_features_res_new <-
   chi_features_set %>%
   option_add(param_info = pca_param, id = "plus_pca_lm") %>%
   workflow_map(resamples = time_val_split, grid = 21, seed = 1, verbose = TRUE)
#> i	No tuning parameters. `fit_resamples()` will be attempted
#> i 1 of 3 resampling: date_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations   A: x1
#> There were issues with some computations   A: x1
#> 
#> ✔ 1 of 3 resampling: date_lm (547ms)
#> i	No tuning parameters. `fit_resamples()` will be attempted
#> i 2 of 3 resampling: plus_holidays_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations   A: x1
#> There were issues with some computations   A: x1
#> 
#> ✔ 2 of 3 resampling: plus_holidays_lm (601ms)
#> i 3 of 3 tuning:     plus_pca_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations   A: x4
#> There were issues with some computations   A: x10
#> There were issues with some computations   A: x16
#> There were issues with some computations   A: x18
#> 
#> ✔ 3 of 3 tuning:     plus_pca_lm (8.9s)

chi_features_res_new
#> # A workflow set/tibble: 3 × 4
#>   wflow_id         info             option    result   
#>   <chr>            <list>           <list>    <list>   
#> 1 date_lm          <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 3 plus_pca_lm      <tibble [1 × 4]> <opts[3]> <tune[+]>
源代码:R/workflow_map.R

相关用法


注:本文由纯净天空筛选整理自Max Kuhn等大神的英文原创作品 Process a series of workflows。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。