R rsample tidy.rsplit 整洁的重采样对象

tidy函数从扫帚包可用于rset和rsplit用于生成 tibbles 的对象，其中的行位于分析和评估集中。

用法

# S3 method for rsplit
tidy(x, unique_ind = TRUE, ...)

# S3 method for rset
tidy(x, unique_ind = TRUE, ...)

# S3 method for vfold_cv
tidy(x, ...)

# S3 method for nested_cv
tidy(x, unique_ind = TRUE, ...)

参数

x: rset 或 rsplit 对象
unique_ind: 是否应该返回唯一的行标识符？例如，如果FALSE，则引导结果将包括原始数据中同一行的样本中的多行。
...: 这些点用于将来的扩展，并且必须为空。

值

包含 Row 和 Data 列的小标题。后者的可能值是"Analysis" 或"Assessment"。对于 rset 输入，还会返回标识列，但它们的名称和值取决于重采样的类型。 vfold_cv 包含一列 "Fold"，如果使用重复，则另一列称为 "Repeats"。 bootstraps 和mc_cv 使用列"Resample"。

细节

请注意，对于嵌套重采样，名为 inner_Row 的内部重采样的行是相对行索引，并且与原始数据集中的行不对应。

例子

library(ggplot2)
theme_set(theme_bw())

set.seed(4121)
cv <- tidy(vfold_cv(mtcars, v = 5))
ggplot(cv, aes(x = Fold, y = Row, fill = Data)) +
  geom_tile() +
  scale_fill_brewer()


set.seed(4121)
rcv <- tidy(vfold_cv(mtcars, v = 5, repeats = 2))
ggplot(rcv, aes(x = Fold, y = Row, fill = Data)) +
  geom_tile() +
  facet_wrap(~Repeat) +
  scale_fill_brewer()


set.seed(4121)
mccv <- tidy(mc_cv(mtcars, times = 5))
ggplot(mccv, aes(x = Resample, y = Row, fill = Data)) +
  geom_tile() +
  scale_fill_brewer()


set.seed(4121)
bt <- tidy(bootstraps(mtcars, time = 5))
ggplot(bt, aes(x = Resample, y = Row, fill = Data)) +
  geom_tile() +
  scale_fill_brewer()


dat <- data.frame(day = 1:30)
# Resample by week instead of day
ts_cv <- rolling_origin(dat,
  initial = 7, assess = 7,
  skip = 6, cumulative = FALSE
)
ts_cv <- tidy(ts_cv)
ggplot(ts_cv, aes(x = Resample, y = factor(Row), fill = Data)) +
  geom_tile() +
  scale_fill_brewer()

源代码：R/tidy.R

相关用法

注：本文由纯净天空筛选整理自Hannah Frick等大神的英文原创作品 Tidy Resampling Object。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。