R hardhat shrink 仅对所需列进行子集化

shrink() 子集 data 仅包含原型 ptype 指定的所需列。

用法

shrink(data, ptype)

参数

data: 包含要子集的数据的 DataFrame 。
ptype: 包含所需列的 DataFrame 原型。

值

包含所需列的小标题。

细节

shrink() 在 scream() 之前且在实际处理完成之前由 forge() 调用。

例子

# ---------------------------------------------------------------------------
# Setup

train <- iris[1:100, ]
test <- iris[101:150, ]

# ---------------------------------------------------------------------------
# shrink()

# mold() is run at model fit time
# and a formula preprocessing blueprint is recorded
x <- mold(log(Sepal.Width) ~ Species, train)

# Inside the result of mold() are the prototype tibbles
# for the predictors and the outcomes
ptype_pred <- x$blueprint$ptypes$predictors
ptype_out <- x$blueprint$ptypes$outcomes

# Pass the test data, along with a prototype, to
# shrink() to extract the prototype columns
shrink(test, ptype_pred)
#> # A tibble: 50 × 1
#>    Species  
#>    <fct>    
#>  1 virginica
#>  2 virginica
#>  3 virginica
#>  4 virginica
#>  5 virginica
#>  6 virginica
#>  7 virginica
#>  8 virginica
#>  9 virginica
#> 10 virginica
#> # ℹ 40 more rows

# To extract the outcomes, just use the
# outcome prototype
shrink(test, ptype_out)
#> # A tibble: 50 × 1
#>    Sepal.Width
#>          <dbl>
#>  1         3.3
#>  2         2.7
#>  3         3  
#>  4         2.9
#>  5         3  
#>  6         3  
#>  7         2.5
#>  8         2.9
#>  9         2.5
#> 10         3.6
#> # ℹ 40 more rows

# shrink() makes sure that the columns
# required by `ptype` actually exist in the data
# and errors nicely when they don't
test2 <- subset(test, select = -Species)
try(shrink(test2, ptype_pred))
#> Error in validate_column_names(data, cols) : 
#>   The following required columns are missing: 'Species'.

源代码：R/shrink.R

相关用法

注：本文由纯净天空筛选整理自Davis Vaughan等大神的英文原创作品 Subset only required columns。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。