當前位置: 首頁>>代碼示例 >>用法及示例精選 >>正文


R workflowsets workflow_map 處理一係列工作流程


workflow_map() 將在集合中的工作流中執行相同的函數。可以使用各種 tune_*() 函數以及 tune::fit_resamples()

用法

workflow_map(
  object,
  fn = "tune_grid",
  verbose = FALSE,
  seed = sample.int(10^4, 1),
  ...
)

參數

object

工作流程集。

fn

要運行的函數的名稱,作為字符。可接受的值為: "tune_grid""tune_bayes""fit_resamples""tune_race_anova""tune_race_win_loss""tune_sim_anneal" 。請注意,用戶無需在此參數中提供名稱空間或括號,例如提供 "tune_grid" 而不是 "tune::tune_grid""tune_grid()"

verbose

記錄進度的邏輯。

seed

在每個函數執行之前設置的單個整數。

...

傳遞給建模函數的選項。請參閱下麵的詳細信息。

更新的工作流程集。 option 列將使用為 workflow_map() 提供的 tune 包函數的任何選項進行更新。此外,結果將添加到 result 列。如果工作流的計算失敗,將保存 try-catch 對象來代替結果(不停止執行)。

細節

傳遞選項時,... 中傳遞的任何內容都將與 option 列中的任何值組合。 ... 中的值將覆蓋該列的值,並且新選項將添加到 options 列中。

執行中的任何失敗都會導致 results 的相應行包含 try-error 對象。

如果模型沒有調整參數映射到調整函數之一,則將使用 tune::fit_resamples() 代替,並且如果 verbose = TRUE 則會發出警告。

如果工作流需要未安裝的包,則會打印一條消息,並且 workflow_map() 繼續執行下一個工作流(如果有)。

注意

該軟件包提供兩個預生成的工作流程集 two_class_setchi_features_set ,以及適合 two_class_reschi_features_res 的相關模型集。

two_class_* 對象基於使用 modeldata 包中的 two_class_dat 數據的二元分類問題。這六個模型利用裸公式或基本配方,利用 recipes::step_YeoJohnson() 作為預處理器,以及決策樹、邏輯回歸或 MARS 模型規範。有關源代碼,請參閱?two_class_set

chi_features_* 對象基於使用 modeldata 包中的 Chicago 數據的回歸問題。這三個模型均采用線性回歸模型規範,具有不同複雜性的三種不同配方。這些對象旨在近似 Kuhn 和 Johnson (2019) 第 1.3 節中構建的模型序列。有關源代碼,請參閱?chi_features_set

例子

library(workflowsets)
library(workflows)
library(modeldata)
library(recipes)
library(parsnip)
library(dplyr)
library(rsample)
library(tune)
library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Use the argument `event_level = "second"` to alter this as needed.
library(dials)
#> Loading required package: scales
#> 
#> Attaching package: ‘scales’
#> The following object is masked from ‘package:purrr’:
#> 
#>     discard

# An example of processed results
chi_features_res
#> # A workflow set/tibble: 3 × 4
#>   wflow_id         info             option    result   
#>   <chr>            <list>           <list>    <list>   
#> 1 date_lm          <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 3 plus_pca_lm      <tibble [1 × 4]> <opts[3]> <tune[+]>

# Recreating them:

# ---------------------------------------------------------------------------
data(Chicago)
Chicago <- Chicago[1:1195,]

time_val_split <-
   sliding_period(
      Chicago,
      date,
      "month",
      lookback = 38,
      assess_stop = 1
   )

# ---------------------------------------------------------------------------

base_recipe <-
   recipe(ridership ~ ., data = Chicago) %>%
   # create date features
   step_date(date) %>%
   step_holiday(date) %>%
   # remove date from the list of predictors
   update_role(date, new_role = "id") %>%
   # create dummy variables from factor columns
   step_dummy(all_nominal()) %>%
   # remove any columns with a single unique value
   step_zv(all_predictors()) %>%
   step_normalize(all_predictors())

date_only <-
   recipe(ridership ~ ., data = Chicago) %>%
   # create date features
   step_date(date) %>%
   update_role(date, new_role = "id") %>%
   # create dummy variables from factor columns
   step_dummy(all_nominal()) %>%
   # remove any columns with a single unique value
   step_zv(all_predictors())

date_and_holidays <-
   recipe(ridership ~ ., data = Chicago) %>%
   # create date features
   step_date(date) %>%
   step_holiday(date) %>%
   # remove date from the list of predictors
   update_role(date, new_role = "id") %>%
   # create dummy variables from factor columns
   step_dummy(all_nominal()) %>%
   # remove any columns with a single unique value
   step_zv(all_predictors())

date_and_holidays_and_pca <-
   recipe(ridership ~ ., data = Chicago) %>%
   # create date features
   step_date(date) %>%
   step_holiday(date) %>%
   # remove date from the list of predictors
   update_role(date, new_role = "id") %>%
   # create dummy variables from factor columns
   step_dummy(all_nominal()) %>%
   # remove any columns with a single unique value
   step_zv(all_predictors()) %>%
   step_pca(!!stations, num_comp = tune())

# ---------------------------------------------------------------------------

lm_spec <- linear_reg() %>% set_engine("lm")

# ---------------------------------------------------------------------------

pca_param <-
   parameters(num_comp()) %>%
   update(num_comp = num_comp(c(0, 20)))

# ---------------------------------------------------------------------------

chi_features_set <-
   workflow_set(
      preproc = list(date = date_only,
                     plus_holidays = date_and_holidays,
                     plus_pca = date_and_holidays_and_pca),
      models = list(lm = lm_spec),
      cross = TRUE
   )

# ---------------------------------------------------------------------------

chi_features_res_new <-
   chi_features_set %>%
   option_add(param_info = pca_param, id = "plus_pca_lm") %>%
   workflow_map(resamples = time_val_split, grid = 21, seed = 1, verbose = TRUE)
#> i	No tuning parameters. `fit_resamples()` will be attempted
#> i 1 of 3 resampling: date_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations   A: x1
#> There were issues with some computations   A: x1
#> 
#> ✔ 1 of 3 resampling: date_lm (547ms)
#> i	No tuning parameters. `fit_resamples()` will be attempted
#> i 2 of 3 resampling: plus_holidays_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations   A: x1
#> There were issues with some computations   A: x1
#> 
#> ✔ 2 of 3 resampling: plus_holidays_lm (601ms)
#> i 3 of 3 tuning:     plus_pca_lm
#> → A | warning: prediction from a rank-deficient fit may be misleading
#> There were issues with some computations   A: x4
#> There were issues with some computations   A: x10
#> There were issues with some computations   A: x16
#> There were issues with some computations   A: x18
#> 
#> ✔ 3 of 3 tuning:     plus_pca_lm (8.9s)

chi_features_res_new
#> # A workflow set/tibble: 3 × 4
#>   wflow_id         info             option    result   
#>   <chr>            <list>           <list>    <list>   
#> 1 date_lm          <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 3 plus_pca_lm      <tibble [1 × 4]> <opts[3]> <tune[+]>
源代碼:R/workflow_map.R

相關用法


注:本文由純淨天空篩選整理自Max Kuhn等大神的英文原創作品 Process a series of workflows。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。