当前位置: 首页>>代码示例 >>用法及示例精选 >>正文


R recipes roles 手动更改角色


update_role() 更改配方中的现有角色或将初始角色分配给尚未声明角色的变量。

add_role() 为配方中已有角色的变量添加一个附加角色。它不会覆盖旧角色,因为单个变量可以具有多个角色。

remove_role() 消除了配方中的单个现有角色。

用法

add_role(recipe, ..., new_role = "predictor", new_type = NULL)

update_role(recipe, ..., new_role = "predictor", old_role = NULL)

remove_role(recipe, ..., old_role)

参数

recipe

现有的 recipe()

...

一个或多个选择器函数用于选择要为哪些变量分配角色。有关更多详细信息,请参阅selections()

new_role

单个角色的字符串。

new_type

变量应被识别为的特定类型的字符串。如果保留为 NULL ,该类型将自动识别为您在 summary(recipe) 中看到的该变量的第一个类型。

old_role

用于为 ... 选择的变量更新的特定角色的字符串。只要变量仅具有单一角色,update_role() 就接受 NULL

更新的配方对象。

细节

变量可以具有任意角色(请参阅示例),但有两个特殊的标准角色: "predictor""outcome" 。拟合模型时通常需要这两个角色。

当变量当前在配方中没有作用时,应使用 update_role() ,或者用 new_role 替换 old_roleadd_role() 只会向已有角色的变量添加额外的角色,并且当当前角色丢失时会抛出错误(即 NA )。

使用 add_role() 时,如果选择的变量已具有 new_role ,则会发出警告并跳过该变量,因此不会添加重复的角色。

添加或更新角色是对不属于标准 "predictor" 存储桶的某些变量进行分组的有用方法。您可以使用选择器 has_role() 对具有自定义角色的所有变量执行步骤。

非标准角色的影响

配方可以标记和保留数据集中不应被视为结果或预测变量的列。唯一标识符列或一些其他辅助数据可用于解决模型开发期间的问题,但可能不是结果或预测变量。

例如,modeldata::biomass 数据集有一个名为 sample 的列,其中包含有关特定样本类型的信息。我们可以改变这个角色:

library(recipes)

data(biomass, package = "modeldata")
biomass_train <- biomass[1:100,]
biomass_test <- biomass[101:200,]

rec <- recipe(HHV ~ ., data = biomass_train) %>%
  update_role(sample, new_role = "id variable") %>%
  step_center(carbon)

rec <- prep(rec, biomass_train)

这意味着 sample 不再被视为 "predictor" (提供给 recipe() 的公式右侧列的默认角色),并且不会用于模型拟合或分析,但会仍然保留在数据集中。

如果您确实没有在配方中使用 sample,我们建议您先从数据集中删除 sample,然后再将其传递给 recipe() 。其原因是因为配方假定在 bake() 时间(或 predict() 时间,如果您使用工作流)需要所有非标准角色。由于您在配方的任何步骤中都没有使用 sample ,因此您可能认为不需要将其传递给 bake() ,但事实并非如此,因为配方不知道您没有这样做用它:

biomass_test$sample <- NULL

bake(rec, biomass_test)
#> Error in `bake()`:
#> ! The following required columns are missing from `new_data`: "sample".
#> i These columns have one of the following roles, which are required at `bake()` time: "id variable".
#> i If these roles are not required at `bake()` time, use `update_role_requirements(role = "your_role", bake = FALSE)`.

正如我们之前提到的,避免此问题的最佳方法是甚至不使用角色,只需在调用 recipe() 之前从 biomass 中删除 sample 列即可。一般来说,提供给 recipe() 的预测变量和非标准角色应该同时出现在 prep()bake() 时间。

如果由于某种原因无法删除 sample,那么解决此问题的第二个最佳方法是告诉配方在 bake() 时不需要 "id variable" 角色。您可以使用 update_role_requirements() 来做到这一点:

rec <- recipe(HHV ~ ., data = biomass_train) %>%
  update_role(sample, new_role = "id variable") %>%
  update_role_requirements("id variable", bake = FALSE) %>%
  step_center(carbon)

rec <- prep(rec, biomass_train)

# No errors!
biomass_test_baked <- bake(rec, biomass_test)

您很少需要此函数。

例子

library(recipes)
data(biomass, package = "modeldata")

# Using the formula method, roles are created for any outcomes and predictors:
recipe(HHV ~ ., data = biomass) %>%
  summary()
#> # A tibble: 8 × 4
#>   variable type      role      source  
#>   <chr>    <list>    <chr>     <chr>   
#> 1 sample   <chr [3]> predictor original
#> 2 dataset  <chr [3]> predictor original
#> 3 carbon   <chr [2]> predictor original
#> 4 hydrogen <chr [2]> predictor original
#> 5 oxygen   <chr [2]> predictor original
#> 6 nitrogen <chr [2]> predictor original
#> 7 sulfur   <chr [2]> predictor original
#> 8 HHV      <chr [2]> outcome   original

# However `sample` and `dataset` aren't predictors. Since they already have
# roles, `update_role()` can be used to make changes, to any arbitrary role:
recipe(HHV ~ ., data = biomass) %>%
  update_role(sample, new_role = "id variable") %>%
  update_role(dataset, new_role = "splitting variable") %>%
  summary()
#> # A tibble: 8 × 4
#>   variable type      role               source  
#>   <chr>    <list>    <chr>              <chr>   
#> 1 sample   <chr [3]> id variable        original
#> 2 dataset  <chr [3]> splitting variable original
#> 3 carbon   <chr [2]> predictor          original
#> 4 hydrogen <chr [2]> predictor          original
#> 5 oxygen   <chr [2]> predictor          original
#> 6 nitrogen <chr [2]> predictor          original
#> 7 sulfur   <chr [2]> predictor          original
#> 8 HHV      <chr [2]> outcome            original

# `update_role()` cannot set a role to NA, use `remove_role()` for that
if (FALSE) {
recipe(HHV ~ ., data = biomass) %>%
  update_role(sample, new_role = NA_character_)
}

# ------------------------------------------------------------------------------

# Variables can have more than one role. `add_role()` can be used
# if the column already has at least one role:
recipe(HHV ~ ., data = biomass) %>%
  add_role(carbon, sulfur, new_role = "something") %>%
  summary()
#> # A tibble: 10 × 4
#>    variable type      role      source  
#>    <chr>    <list>    <chr>     <chr>   
#>  1 sample   <chr [3]> predictor original
#>  2 dataset  <chr [3]> predictor original
#>  3 carbon   <chr [2]> predictor original
#>  4 carbon   <chr [2]> something original
#>  5 hydrogen <chr [2]> predictor original
#>  6 oxygen   <chr [2]> predictor original
#>  7 nitrogen <chr [2]> predictor original
#>  8 sulfur   <chr [2]> predictor original
#>  9 sulfur   <chr [2]> something original
#> 10 HHV      <chr [2]> outcome   original

# `update_role()` has an argument called `old_role` that is required to
# unambiguously update a role when the column currently has multiple roles.
recipe(HHV ~ ., data = biomass) %>%
  add_role(carbon, new_role = "something") %>%
  update_role(carbon, new_role = "something else", old_role = "something") %>%
  summary()
#> # A tibble: 9 × 4
#>   variable type      role           source  
#>   <chr>    <list>    <chr>          <chr>   
#> 1 sample   <chr [3]> predictor      original
#> 2 dataset  <chr [3]> predictor      original
#> 3 carbon   <chr [2]> predictor      original
#> 4 carbon   <chr [2]> something else original
#> 5 hydrogen <chr [2]> predictor      original
#> 6 oxygen   <chr [2]> predictor      original
#> 7 nitrogen <chr [2]> predictor      original
#> 8 sulfur   <chr [2]> predictor      original
#> 9 HHV      <chr [2]> outcome        original

# `carbon` has two roles at the end, so the last `update_roles()` fails since
# `old_role` was not given.
if (FALSE) {
recipe(HHV ~ ., data = biomass) %>%
  add_role(carbon, sulfur, new_role = "something") %>%
  update_role(carbon, new_role = "something else")
}

# ------------------------------------------------------------------------------

# To remove a role, `remove_role()` can be used to remove a single role.
recipe(HHV ~ ., data = biomass) %>%
  add_role(carbon, new_role = "something") %>%
  remove_role(carbon, old_role = "something") %>%
  summary()
#> # A tibble: 8 × 4
#>   variable type      role      source  
#>   <chr>    <list>    <chr>     <chr>   
#> 1 sample   <chr [3]> predictor original
#> 2 dataset  <chr [3]> predictor original
#> 3 carbon   <chr [2]> predictor original
#> 4 hydrogen <chr [2]> predictor original
#> 5 oxygen   <chr [2]> predictor original
#> 6 nitrogen <chr [2]> predictor original
#> 7 sulfur   <chr [2]> predictor original
#> 8 HHV      <chr [2]> outcome   original

# To remove all roles, call `remove_role()` multiple times to reset to `NA`
recipe(HHV ~ ., data = biomass) %>%
  add_role(carbon, new_role = "something") %>%
  remove_role(carbon, old_role = "something") %>%
  remove_role(carbon, old_role = "predictor") %>%
  summary()
#> # A tibble: 8 × 4
#>   variable type      role      source  
#>   <chr>    <list>    <chr>     <chr>   
#> 1 sample   <chr [3]> predictor original
#> 2 dataset  <chr [3]> predictor original
#> 3 carbon   <chr [2]> NA        original
#> 4 hydrogen <chr [2]> predictor original
#> 5 oxygen   <chr [2]> predictor original
#> 6 nitrogen <chr [2]> predictor original
#> 7 sulfur   <chr [2]> predictor original
#> 8 HHV      <chr [2]> outcome   original

# ------------------------------------------------------------------------------

# If the formula method is not used, all columns have a missing role:
recipe(biomass) %>%
  summary()
#> # A tibble: 8 × 4
#>   variable type      role  source  
#>   <chr>    <list>    <chr> <chr>   
#> 1 sample   <chr [3]> NA    original
#> 2 dataset  <chr [3]> NA    original
#> 3 carbon   <chr [2]> NA    original
#> 4 hydrogen <chr [2]> NA    original
#> 5 oxygen   <chr [2]> NA    original
#> 6 nitrogen <chr [2]> NA    original
#> 7 sulfur   <chr [2]> NA    original
#> 8 HHV      <chr [2]> NA    original
源代码:R/roles.R

相关用法


注:本文由纯净天空筛选整理自Max Kuhn等大神的英文原创作品 Manually Alter Roles。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。