R rsample nested_cv 嵌套或雙重重采樣

nested_cv 可用於獲取一個重采樣過程的結果，並在每次分割內進行進一步的重采樣。可以使用 rsample 中使用的任何類型的重采樣。

用法

nested_cv(data, outside, inside)

參數

data: 一個 DataFrame 。
outside: 初始重采樣規範。這可以是已創建的對象或新對象的表達式(請參見下麵的示例)。如果使用後者，則不需要指定 data 參數，如果給出，則將被忽略。
inside: 要在初始過程中進行的重采樣類型的表達式。

值

帶有 nested_cv 類和外部重采樣過程通常包含的任何其他類的 tibble。結果包括外部數據拆分對象的列、一個或多個 id 列以及一列名為 inner_resamples 的嵌套 tibbles 列(帶有附加重新采樣)。

細節

使用引導作為外部重采樣過程是一個壞主意(請參見下麵的示例)

例子

## Using expressions for the resampling procedures:
nested_cv(mtcars, outside = vfold_cv(v = 3), inside = bootstraps(times = 5))
#> # Nested resampling:
#> #  outer: 3-fold cross-validation
#> #  inner: Bootstrap sampling
#> # A tibble: 3 × 3
#>   splits          id    inner_resamples
#>   <list>          <chr> <list>         
#> 1 <split [21/11]> Fold1 <boot [5 × 2]> 
#> 2 <split [21/11]> Fold2 <boot [5 × 2]> 
#> 3 <split [22/10]> Fold3 <boot [5 × 2]> 

## Using an existing object:
folds <- vfold_cv(mtcars)
nested_cv(mtcars, folds, inside = bootstraps(times = 5))
#> # Nested resampling:
#> #  outer: `folds`
#> #  inner: Bootstrap sampling
#> # A tibble: 10 × 3
#>    splits         id     inner_resamples
#>    <list>         <chr>  <list>         
#>  1 <split [28/4]> Fold01 <boot [5 × 2]> 
#>  2 <split [28/4]> Fold02 <boot [5 × 2]> 
#>  3 <split [29/3]> Fold03 <boot [5 × 2]> 
#>  4 <split [29/3]> Fold04 <boot [5 × 2]> 
#>  5 <split [29/3]> Fold05 <boot [5 × 2]> 
#>  6 <split [29/3]> Fold06 <boot [5 × 2]> 
#>  7 <split [29/3]> Fold07 <boot [5 × 2]> 
#>  8 <split [29/3]> Fold08 <boot [5 × 2]> 
#>  9 <split [29/3]> Fold09 <boot [5 × 2]> 
#> 10 <split [29/3]> Fold10 <boot [5 × 2]> 

## The dangers of outer bootstraps:
set.seed(2222)
bad_idea <- nested_cv(mtcars,
  outside = bootstraps(times = 5),
  inside = vfold_cv(v = 3)
)
#> Warning: Using bootstrapping as the outer resample is dangerous since the inner resample might have the same data point in both the analysis and assessment set.

first_outer_split <- bad_idea$splits[[1]]
outer_analysis <- as.data.frame(first_outer_split)
sum(grepl("Volvo 142E", rownames(outer_analysis)))
#> [1] 0

## For the 3-fold CV used inside of each bootstrap, how are the replicated
## `Volvo 142E` data partitioned?
first_inner_split <- bad_idea$inner_resamples[[1]]$splits[[1]]
inner_analysis <- as.data.frame(first_inner_split)
inner_assess <- as.data.frame(first_inner_split, data = "assessment")

sum(grepl("Volvo 142E", rownames(inner_analysis)))
#> [1] 0
sum(grepl("Volvo 142E", rownames(inner_assess)))
#> [1] 0

源代碼：R/nest.R

相關用法

注：本文由純淨天空篩選整理自Hannah Frick等大神的英文原創作品 Nested or Double Resampling。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。