當前位置: 首頁>>代碼示例 >>用法及示例精選 >>正文


R recipes roles 手動更改角色


update_role() 更改配方中的現有角色或將初始角色分配給尚未聲明角色的變量。

add_role() 為配方中已有角色的變量添加一個附加角色。它不會覆蓋舊角色,因為單個變量可以具有多個角色。

remove_role() 消除了配方中的單個現有角色。

用法

add_role(recipe, ..., new_role = "predictor", new_type = NULL)

update_role(recipe, ..., new_role = "predictor", old_role = NULL)

remove_role(recipe, ..., old_role)

參數

recipe

現有的 recipe()

...

一個或多個選擇器函數用於選擇要為哪些變量分配角色。有關更多詳細信息,請參閱selections()

new_role

單個角色的字符串。

new_type

變量應被識別為的特定類型的字符串。如果保留為 NULL ,該類型將自動識別為您在 summary(recipe) 中看到的該變量的第一個類型。

old_role

用於為 ... 選擇的變量更新的特定角色的字符串。隻要變量僅具有單一角色,update_role() 就接受 NULL

更新的配方對象。

細節

變量可以具有任意角色(請參閱示例),但有兩個特殊的標準角色: "predictor""outcome" 。擬合模型時通常需要這兩個角色。

當變量當前在配方中沒有作用時,應使用 update_role() ,或者用 new_role 替換 old_roleadd_role() 隻會向已有角色的變量添加額外的角色,並且當當前角色丟失時會拋出錯誤(即 NA )。

使用 add_role() 時,如果選擇的變量已具有 new_role ,則會發出警告並跳過該變量,因此不會添加重複的角色。

添加或更新角色是對不屬於標準 "predictor" 存儲桶的某些變量進行分組的有用方法。您可以使用選擇器 has_role() 對具有自定義角色的所有變量執行步驟。

非標準角色的影響

配方可以標記和保留數據集中不應被視為結果或預測變量的列。唯一標識符列或一些其他輔助數據可用於解決模型開發期間的問題,但可能不是結果或預測變量。

例如,modeldata::biomass 數據集有一個名為 sample 的列,其中包含有關特定樣本類型的信息。我們可以改變這個角色:

library(recipes)

data(biomass, package = "modeldata")
biomass_train <- biomass[1:100,]
biomass_test <- biomass[101:200,]

rec <- recipe(HHV ~ ., data = biomass_train) %>%
  update_role(sample, new_role = "id variable") %>%
  step_center(carbon)

rec <- prep(rec, biomass_train)

這意味著 sample 不再被視為 "predictor" (提供給 recipe() 的公式右側列的默認角色),並且不會用於模型擬合或分析,但會仍然保留在數據集中。

如果您確實沒有在配方中使用 sample,我們建議您先從數據集中刪除 sample,然後再將其傳遞給 recipe() 。其原因是因為配方假定在 bake() 時間(或 predict() 時間,如果您使用工作流)需要所有非標準角色。由於您在配方的任何步驟中都沒有使用 sample ,因此您可能認為不需要將其傳遞給 bake() ,但事實並非如此,因為配方不知道您沒有這樣做用它:

biomass_test$sample <- NULL

bake(rec, biomass_test)
#> Error in `bake()`:
#> ! The following required columns are missing from `new_data`: "sample".
#> i These columns have one of the following roles, which are required at `bake()` time: "id variable".
#> i If these roles are not required at `bake()` time, use `update_role_requirements(role = "your_role", bake = FALSE)`.

正如我們之前提到的,避免此問題的最佳方法是甚至不使用角色,隻需在調用 recipe() 之前從 biomass 中刪除 sample 列即可。一般來說,提供給 recipe() 的預測變量和非標準角色應該同時出現在 prep()bake() 時間。

如果由於某種原因無法刪除 sample,那麽解決此問題的第二個最佳方法是告訴配方在 bake() 時不需要 "id variable" 角色。您可以使用 update_role_requirements() 來做到這一點:

rec <- recipe(HHV ~ ., data = biomass_train) %>%
  update_role(sample, new_role = "id variable") %>%
  update_role_requirements("id variable", bake = FALSE) %>%
  step_center(carbon)

rec <- prep(rec, biomass_train)

# No errors!
biomass_test_baked <- bake(rec, biomass_test)

您很少需要此函數。

例子

library(recipes)
data(biomass, package = "modeldata")

# Using the formula method, roles are created for any outcomes and predictors:
recipe(HHV ~ ., data = biomass) %>%
  summary()
#> # A tibble: 8 × 4
#>   variable type      role      source  
#>   <chr>    <list>    <chr>     <chr>   
#> 1 sample   <chr [3]> predictor original
#> 2 dataset  <chr [3]> predictor original
#> 3 carbon   <chr [2]> predictor original
#> 4 hydrogen <chr [2]> predictor original
#> 5 oxygen   <chr [2]> predictor original
#> 6 nitrogen <chr [2]> predictor original
#> 7 sulfur   <chr [2]> predictor original
#> 8 HHV      <chr [2]> outcome   original

# However `sample` and `dataset` aren't predictors. Since they already have
# roles, `update_role()` can be used to make changes, to any arbitrary role:
recipe(HHV ~ ., data = biomass) %>%
  update_role(sample, new_role = "id variable") %>%
  update_role(dataset, new_role = "splitting variable") %>%
  summary()
#> # A tibble: 8 × 4
#>   variable type      role               source  
#>   <chr>    <list>    <chr>              <chr>   
#> 1 sample   <chr [3]> id variable        original
#> 2 dataset  <chr [3]> splitting variable original
#> 3 carbon   <chr [2]> predictor          original
#> 4 hydrogen <chr [2]> predictor          original
#> 5 oxygen   <chr [2]> predictor          original
#> 6 nitrogen <chr [2]> predictor          original
#> 7 sulfur   <chr [2]> predictor          original
#> 8 HHV      <chr [2]> outcome            original

# `update_role()` cannot set a role to NA, use `remove_role()` for that
if (FALSE) {
recipe(HHV ~ ., data = biomass) %>%
  update_role(sample, new_role = NA_character_)
}

# ------------------------------------------------------------------------------

# Variables can have more than one role. `add_role()` can be used
# if the column already has at least one role:
recipe(HHV ~ ., data = biomass) %>%
  add_role(carbon, sulfur, new_role = "something") %>%
  summary()
#> # A tibble: 10 × 4
#>    variable type      role      source  
#>    <chr>    <list>    <chr>     <chr>   
#>  1 sample   <chr [3]> predictor original
#>  2 dataset  <chr [3]> predictor original
#>  3 carbon   <chr [2]> predictor original
#>  4 carbon   <chr [2]> something original
#>  5 hydrogen <chr [2]> predictor original
#>  6 oxygen   <chr [2]> predictor original
#>  7 nitrogen <chr [2]> predictor original
#>  8 sulfur   <chr [2]> predictor original
#>  9 sulfur   <chr [2]> something original
#> 10 HHV      <chr [2]> outcome   original

# `update_role()` has an argument called `old_role` that is required to
# unambiguously update a role when the column currently has multiple roles.
recipe(HHV ~ ., data = biomass) %>%
  add_role(carbon, new_role = "something") %>%
  update_role(carbon, new_role = "something else", old_role = "something") %>%
  summary()
#> # A tibble: 9 × 4
#>   variable type      role           source  
#>   <chr>    <list>    <chr>          <chr>   
#> 1 sample   <chr [3]> predictor      original
#> 2 dataset  <chr [3]> predictor      original
#> 3 carbon   <chr [2]> predictor      original
#> 4 carbon   <chr [2]> something else original
#> 5 hydrogen <chr [2]> predictor      original
#> 6 oxygen   <chr [2]> predictor      original
#> 7 nitrogen <chr [2]> predictor      original
#> 8 sulfur   <chr [2]> predictor      original
#> 9 HHV      <chr [2]> outcome        original

# `carbon` has two roles at the end, so the last `update_roles()` fails since
# `old_role` was not given.
if (FALSE) {
recipe(HHV ~ ., data = biomass) %>%
  add_role(carbon, sulfur, new_role = "something") %>%
  update_role(carbon, new_role = "something else")
}

# ------------------------------------------------------------------------------

# To remove a role, `remove_role()` can be used to remove a single role.
recipe(HHV ~ ., data = biomass) %>%
  add_role(carbon, new_role = "something") %>%
  remove_role(carbon, old_role = "something") %>%
  summary()
#> # A tibble: 8 × 4
#>   variable type      role      source  
#>   <chr>    <list>    <chr>     <chr>   
#> 1 sample   <chr [3]> predictor original
#> 2 dataset  <chr [3]> predictor original
#> 3 carbon   <chr [2]> predictor original
#> 4 hydrogen <chr [2]> predictor original
#> 5 oxygen   <chr [2]> predictor original
#> 6 nitrogen <chr [2]> predictor original
#> 7 sulfur   <chr [2]> predictor original
#> 8 HHV      <chr [2]> outcome   original

# To remove all roles, call `remove_role()` multiple times to reset to `NA`
recipe(HHV ~ ., data = biomass) %>%
  add_role(carbon, new_role = "something") %>%
  remove_role(carbon, old_role = "something") %>%
  remove_role(carbon, old_role = "predictor") %>%
  summary()
#> # A tibble: 8 × 4
#>   variable type      role      source  
#>   <chr>    <list>    <chr>     <chr>   
#> 1 sample   <chr [3]> predictor original
#> 2 dataset  <chr [3]> predictor original
#> 3 carbon   <chr [2]> NA        original
#> 4 hydrogen <chr [2]> predictor original
#> 5 oxygen   <chr [2]> predictor original
#> 6 nitrogen <chr [2]> predictor original
#> 7 sulfur   <chr [2]> predictor original
#> 8 HHV      <chr [2]> outcome   original

# ------------------------------------------------------------------------------

# If the formula method is not used, all columns have a missing role:
recipe(biomass) %>%
  summary()
#> # A tibble: 8 × 4
#>   variable type      role  source  
#>   <chr>    <list>    <chr> <chr>   
#> 1 sample   <chr [3]> NA    original
#> 2 dataset  <chr [3]> NA    original
#> 3 carbon   <chr [2]> NA    original
#> 4 hydrogen <chr [2]> NA    original
#> 5 oxygen   <chr [2]> NA    original
#> 6 nitrogen <chr [2]> NA    original
#> 7 sulfur   <chr [2]> NA    original
#> 8 HHV      <chr [2]> NA    original
源代碼:R/roles.R

相關用法


注:本文由純淨天空篩選整理自Max Kuhn等大神的英文原創作品 Manually Alter Roles。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。