R recipes step_mutate 使用 dplyr 添加新變量

step_mutate() 創建配方步驟的規範，該步驟將使用 dplyr::mutate() 添加變量。

用法

step_mutate(
  recipe,
  ...,
  role = "predictor",
  trained = FALSE,
  inputs = NULL,
  skip = FALSE,
  id = rand_id("mutate")
)

參數

recipe: 一個菜譜對象。該步驟將添加到此配方的操作序列中。
...: Name-value 表達式對。請參閱dplyr::mutate()。
role: 對於此步驟創建的模型項，應為其分配什麽分析角色？默認情況下，此步驟根據原始變量創建的新列將用作模型中的預測變量。
trained: 指示預處理數量是否已估計的邏輯。
inputs: ... 的引用。
skip: 一個合乎邏輯的。當bake() 烘焙食譜時是否應該跳過此步驟？雖然所有操作都是在 prep() 運行時烘焙的，但某些操作可能無法對新數據進行(例如處理結果變量)。使用skip = TRUE時應小心，因為它可能會影響後續操作的計算。
id: 該步驟特有的字符串，用於標識它。

值

recipe 的更新版本，將新步驟添加到任何現有操作的序列中。

細節

使用此靈活步驟時，請格外小心，以避免預處理中的數據泄漏。例如，考慮轉換 x = w > mean(w) 。當應用於新數據或測試數據時，此轉換將使用新數據中 w 的平均值，而不是訓練數據中 w 的平均值。

當定義新變量的表達式中引用用戶全局環境中的對象時，最好使用準引用(例如 !! )將該對象的值嵌入到表達式中(以便在會議)。請參閱示例。

如果前麵的步驟刪除了 step_mutate() 中按名稱選擇的列，則使用 prep() 估計配方時將會出錯。

整理

當您 tidy() 此步驟時，將返回帶有 values 列的 tibble，其中包含字符串形式的 mutate() 表達式(並且不可重新解析)。

箱重

底層操作不允許使用案例權重。

也可以看看

其他單獨的轉換步驟：step_BoxCox() , step_YeoJohnson() , step_bs() , step_harmonic() , step_hyperbolic() , step_inverse() , step_invlogit() , step_logit() , step_log() , step_ns() , step_percentile() , step_poly() , step_relu() , step_sqrt()

其他 dplyr 步驟：step_arrange() , step_filter() , step_mutate_at() , step_rename_at() , step_rename() , step_sample() , step_select() , step_slice()

例子

rec <-
  recipe(~., data = iris) %>%
  step_mutate(
    dbl_width = Sepal.Width * 2,
    half_length = Sepal.Length / 2
  )

prepped <- prep(rec, training = iris %>% slice(1:75))

library(dplyr)

dplyr_train <-
  iris %>%
  as_tibble() %>%
  slice(1:75) %>%
  mutate(
    dbl_width = Sepal.Width * 2,
    half_length = Sepal.Length / 2
  )

rec_train <- bake(prepped, new_data = NULL)
all.equal(dplyr_train, rec_train)
#> [1] TRUE

dplyr_test <-
  iris %>%
  as_tibble() %>%
  slice(76:150) %>%
  mutate(
    dbl_width = Sepal.Width * 2,
    half_length = Sepal.Length / 2
  )
rec_test <- bake(prepped, iris %>% slice(76:150))
all.equal(dplyr_test, rec_test)
#> [1] TRUE

# Embedding objects:
const <- 1.414

qq_rec <-
  recipe(~., data = iris) %>%
  step_mutate(
    bad_approach = Sepal.Width * const,
    best_approach = Sepal.Width * !!const
  ) %>%
  prep(training = iris)

bake(qq_rec, new_data = NULL, contains("appro")) %>% slice(1:4)
#> # A tibble: 4 × 2
#>   bad_approach best_approach
#>          <dbl>         <dbl>
#> 1         4.95          4.95
#> 2         4.24          4.24
#> 3         4.52          4.52
#> 4         4.38          4.38

# The difference:
tidy(qq_rec, number = 1)
#> # A tibble: 2 × 3
#>   terms         value               id          
#>   <chr>         <chr>               <chr>       
#> 1 bad_approach  Sepal.Width * const mutate_p75TX
#> 2 best_approach Sepal.Width * 1.414 mutate_p75TX

源代碼：R/mutate.R

相關用法

注：本文由純淨天空篩選整理自Max Kuhn等大神的英文原創作品 Add new variables using dplyr。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。