当前位置: 首页>>代码示例 >>用法及示例精选 >>正文


R hardhat scream ? 尖叫。


scream() 确保 data 的结构与原型 ptype 相同。在底层,使用 vctrs::vec_cast() ,它将 data 的每一列转换为与 ptype 中相应列相同的类型。

该铸造强制执行许多重要的结构检查,包括但不限于:

  • 数据类 - 检查 data 中每列的类是否与 ptype 中相应列的类相同。

  • 新级别 - 检查 data 中的因子列与 ptype 列相比是否没有任何新级别。如果有新级别,则会发出警告,并将它们强制为 NA 。此检查是可选的,可以使用 allow_novel_levels = TRUE 关闭。

  • 级别恢复 - 检查 data 中的因子列与 ptype 列相比是否缺少任何因子级别。如果存在缺失的关卡,则会将其恢复。

用法

scream(data, ptype, allow_novel_levels = FALSE)

参数

data

包含要检查其结构的新数据的 DataFrame 。

ptype

data 转换为的 DataFrame 原型。这通常是训练集的 0 行切片。

allow_novel_levels

是否应该允许 data 中的新因子水平?最安全的方法是默认方法,当发现新级别时会发出警告,并将它们强制为 NA 值。将此参数设置为 TRUE 将忽略所有新级别。这个论点不适用于有序因子。有序因子中不允许出现新颖的级别,因为级别排序是该类型的关键部分。

在进行任何所需的结构修改后,包含所需列的小标题。

细节

scream()forge()shrink() 之后但在实际处理完成之前调用。一般来说,你不需要直接调用scream(),因为forge()会为你做这件事。

如果 scream() 用作独立函数,则最好在其之前调用 shrink(),因为 scream() 中没有检查来确保 data 中实际存在所有必需的列名称。这些检查存在于 shrink() 中。

因子水平

scream() 尝试通过恢复缺失的因子水平并警告新水平来提供帮助。下图概述了从 data 中的列强制到 ptype 中的列时 scream() 如何处理因子级别。

请注意,有序因子处理比因子处理严格得多。 data 中的有序因子必须与 ptype 中的有序因子具有完全相同的级别。

例子

# ---------------------------------------------------------------------------
# Setup

train <- iris[1:100, ]
test <- iris[101:150, ]

# mold() is run at model fit time
# and a formula preprocessing blueprint is recorded
x <- mold(log(Sepal.Width) ~ Species, train)

# Inside the result of mold() are the prototype tibbles
# for the predictors and the outcomes
ptype_pred <- x$blueprint$ptypes$predictors
ptype_out <- x$blueprint$ptypes$outcomes

# ---------------------------------------------------------------------------
# shrink() / scream()

# Pass the test data, along with a prototype, to
# shrink() to extract the prototype columns
test_shrunk <- shrink(test, ptype_pred)

# Now pass that to scream() to perform validation checks
# If no warnings / errors are thrown, the checks were
# successful!
scream(test_shrunk, ptype_pred)
#> # A tibble: 50 × 1
#>    Species  
#>    <fct>    
#>  1 virginica
#>  2 virginica
#>  3 virginica
#>  4 virginica
#>  5 virginica
#>  6 virginica
#>  7 virginica
#>  8 virginica
#>  9 virginica
#> 10 virginica
#> # ℹ 40 more rows

# ---------------------------------------------------------------------------
# Outcomes

# To also extract the outcomes, use the outcome prototype
test_outcome <- shrink(test, ptype_out)
scream(test_outcome, ptype_out)
#> # A tibble: 50 × 1
#>    Sepal.Width
#>          <dbl>
#>  1         3.3
#>  2         2.7
#>  3         3  
#>  4         2.9
#>  5         3  
#>  6         3  
#>  7         2.5
#>  8         2.9
#>  9         2.5
#> 10         3.6
#> # ℹ 40 more rows

# ---------------------------------------------------------------------------
# Casting

# scream() uses vctrs::vec_cast() to intelligently convert
# new data to the prototype automatically. This means
# it can automatically perform certain conversions, like
# coercing character columns to factors.
test2 <- test
test2$Species <- as.character(test2$Species)

test2_shrunk <- shrink(test2, ptype_pred)
scream(test2_shrunk, ptype_pred)
#> # A tibble: 50 × 1
#>    Species  
#>    <fct>    
#>  1 virginica
#>  2 virginica
#>  3 virginica
#>  4 virginica
#>  5 virginica
#>  6 virginica
#>  7 virginica
#>  8 virginica
#>  9 virginica
#> 10 virginica
#> # ℹ 40 more rows

# It can also recover missing factor levels.
# For example, it is plausible that the test data only had the
# "virginica" level
test3 <- test
test3$Species <- factor(test3$Species, levels = "virginica")

test3_shrunk <- shrink(test3, ptype_pred)
test3_fixed <- scream(test3_shrunk, ptype_pred)

# scream() recovered the missing levels
levels(test3_fixed$Species)
#> [1] "setosa"     "versicolor" "virginica" 

# ---------------------------------------------------------------------------
# Novel levels

# When novel levels with any data are present in `data`, the default
# is to coerce them to `NA` values with a warning.
test4 <- test
test4$Species <- as.character(test4$Species)
test4$Species[1] <- "new_level"

test4$Species <- factor(
  test4$Species,
  levels = c(levels(test$Species), "new_level")
)

test4 <- shrink(test4, ptype_pred)

# Warning is thrown
test4_removed <- scream(test4, ptype_pred)
#> Warning: Novel levels found in column 'Species': 'new_level'. The levels have been removed, and values have been coerced to 'NA'.

# Novel level is removed
levels(test4_removed$Species)
#> [1] "setosa"     "versicolor" "virginica" 

# No warning is thrown
test4_kept <- scream(test4, ptype_pred, allow_novel_levels = TRUE)

# Novel level is kept
levels(test4_kept$Species)
#> [1] "setosa"     "versicolor" "virginica"  "new_level" 
源代码:R/scream.R

相关用法


注:本文由纯净天空筛选整理自Davis Vaughan等大神的英文原创作品 ? Scream.。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。