当前位置: 首页>>代码示例 >>用法及示例精选 >>正文


R hardhat shrink 仅对所需列进行子集化


shrink() 子集 data 仅包含原型 ptype 指定的所需列。

用法

shrink(data, ptype)

参数

data

包含要子集的数据的 DataFrame 。

ptype

包含所需列的 DataFrame 原型。

包含所需列的小标题。

细节

shrink()scream() 之前且在实际处理完成之前由 forge() 调用。

例子

# ---------------------------------------------------------------------------
# Setup

train <- iris[1:100, ]
test <- iris[101:150, ]

# ---------------------------------------------------------------------------
# shrink()

# mold() is run at model fit time
# and a formula preprocessing blueprint is recorded
x <- mold(log(Sepal.Width) ~ Species, train)

# Inside the result of mold() are the prototype tibbles
# for the predictors and the outcomes
ptype_pred <- x$blueprint$ptypes$predictors
ptype_out <- x$blueprint$ptypes$outcomes

# Pass the test data, along with a prototype, to
# shrink() to extract the prototype columns
shrink(test, ptype_pred)
#> # A tibble: 50 × 1
#>    Species  
#>    <fct>    
#>  1 virginica
#>  2 virginica
#>  3 virginica
#>  4 virginica
#>  5 virginica
#>  6 virginica
#>  7 virginica
#>  8 virginica
#>  9 virginica
#> 10 virginica
#> # ℹ 40 more rows

# To extract the outcomes, just use the
# outcome prototype
shrink(test, ptype_out)
#> # A tibble: 50 × 1
#>    Sepal.Width
#>          <dbl>
#>  1         3.3
#>  2         2.7
#>  3         3  
#>  4         2.9
#>  5         3  
#>  6         3  
#>  7         2.5
#>  8         2.9
#>  9         2.5
#> 10         3.6
#> # ℹ 40 more rows

# shrink() makes sure that the columns
# required by `ptype` actually exist in the data
# and errors nicely when they don't
test2 <- subset(test, select = -Species)
try(shrink(test2, ptype_pred))
#> Error in validate_column_names(data, cols) : 
#>   The following required columns are missing: 'Species'.
源代码:R/shrink.R

相关用法


注:本文由纯净天空筛选整理自Davis Vaughan等大神的英文原创作品 Subset only required columns。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。