R hardhat mold 用于建模的模具数据

mold() 应用将训练数据输入模型所需的适当处理步骤。它通过使用各种蓝图来实现这一点，这些蓝图了解如何预处理各种形式的数据，例如公式或配方。

所有蓝图都具有与其他蓝图一致的返回值，但每个蓝图都足够独特，有自己的帮助页面。单击下面了解如何将每一个与 mold() 结合使用。

XY 方法 - default_xy_blueprint()
公式方法 - default_formula_blueprint()
食谱方法 - default_recipe_blueprint()

用法

mold(x, ...)

参数

x: 一个东西。有关更多信息，请参阅说明中链接的方法特定实现。
...: 不曾用过。

值

包含 4 个元素的命名列表：

predictors：包含要在模型中使用的模制预测变量的 tibble。
outcome：包含模型中使用的成型结果的标题。
blueprint ：进行预测时使用的特定于方法的"hardhat_blueprint" 对象。
extras ：如果蓝图不返回额外信息，则为NULL，或者包含额外信息的命名列表。

例子

# See the method specific documentation linked in Description
# for the details of each blueprint, and more examples.

# XY
mold(iris["Sepal.Width"], iris$Species)
#> $predictors
#> # A tibble: 150 × 1
#>    Sepal.Width
#>          <dbl>
#>  1         3.5
#>  2         3  
#>  3         3.2
#>  4         3.1
#>  5         3.6
#>  6         3.9
#>  7         3.4
#>  8         3.4
#>  9         2.9
#> 10         3.1
#> # ℹ 140 more rows
#> 
#> $outcomes
#> # A tibble: 150 × 1
#>    .outcome
#>    <fct>   
#>  1 setosa  
#>  2 setosa  
#>  3 setosa  
#>  4 setosa  
#>  5 setosa  
#>  6 setosa  
#>  7 setosa  
#>  8 setosa  
#>  9 setosa  
#> 10 setosa  
#> # ℹ 140 more rows
#> 
#> $blueprint
#> XY blueprint: 
#>  
#> # Predictors: 1 
#>   # Outcomes: 1 
#>    Intercept: FALSE 
#> Novel Levels: FALSE 
#>  Composition: tibble 
#> 
#> $extras
#> NULL
#> 

# Formula
mold(Species ~ Sepal.Width, iris)
#> $predictors
#> # A tibble: 150 × 1
#>    Sepal.Width
#>          <dbl>
#>  1         3.5
#>  2         3  
#>  3         3.2
#>  4         3.1
#>  5         3.6
#>  6         3.9
#>  7         3.4
#>  8         3.4
#>  9         2.9
#> 10         3.1
#> # ℹ 140 more rows
#> 
#> $outcomes
#> # A tibble: 150 × 1
#>    Species
#>    <fct>  
#>  1 setosa 
#>  2 setosa 
#>  3 setosa 
#>  4 setosa 
#>  5 setosa 
#>  6 setosa 
#>  7 setosa 
#>  8 setosa 
#>  9 setosa 
#> 10 setosa 
#> # ℹ 140 more rows
#> 
#> $blueprint
#> Formula blueprint: 
#>  
#> # Predictors: 1 
#>   # Outcomes: 1 
#>    Intercept: FALSE 
#> Novel Levels: FALSE 
#>  Composition: tibble 
#>   Indicators: traditional 
#> 
#> $extras
#> $extras$offset
#> NULL
#> 
#> 

# Recipe
library(recipes)
mold(recipe(Species ~ Sepal.Width, iris), iris)
#> $predictors
#> # A tibble: 150 × 1
#>    Sepal.Width
#>          <dbl>
#>  1         3.5
#>  2         3  
#>  3         3.2
#>  4         3.1
#>  5         3.6
#>  6         3.9
#>  7         3.4
#>  8         3.4
#>  9         2.9
#> 10         3.1
#> # ℹ 140 more rows
#> 
#> $outcomes
#> # A tibble: 150 × 1
#>    Species
#>    <fct>  
#>  1 setosa 
#>  2 setosa 
#>  3 setosa 
#>  4 setosa 
#>  5 setosa 
#>  6 setosa 
#>  7 setosa 
#>  8 setosa 
#>  9 setosa 
#> 10 setosa 
#> # ℹ 140 more rows
#> 
#> $blueprint
#> Recipe blueprint: 
#>  
#> # Predictors: 1 
#>   # Outcomes: 1 
#>    Intercept: FALSE 
#> Novel Levels: FALSE 
#>  Composition: tibble 
#> 
#> $extras
#> $extras$roles
#> NULL
#> 
#>

源代码：R/mold.R

相关用法

注：本文由纯净天空筛选整理自Davis Vaughan等大神的英文原创作品 Mold data for modeling。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。