R rsample int_pctl 自举置信区间

使用各种方法计算引导程序置信区间。

用法

int_pctl(.data, ...)

# S3 method for bootstraps
int_pctl(.data, statistics, alpha = 0.05, ...)

int_t(.data, ...)

# S3 method for bootstraps
int_t(.data, statistics, alpha = 0.05, ...)

int_bca(.data, ...)

# S3 method for bootstraps
int_bca(.data, statistics, alpha = 0.05, .fn, ...)

参数

.data: 包含使用 bootstraps() 创建的引导重采样的数据帧。对于 t- 和 BCa-intervals，apparent 参数应设置为 TRUE 。即使对于百分位数方法将 apparent 参数设置为 TRUE，表观数据也不会用于计算百分位数置信区间。
...: 要传递给 .fn 的参数(仅限 int_bca())。
statistics: 不带引号的列名称或 dplyr 选择器，用于标识包含各个引导估计的数据集中的单个列。这必须是整齐的 tibbles 列表列(包含 term 和 estimate 列)。对于t-intervals，需要一个标准的整齐列(通常称为std.err)。请参阅下面的示例。
alpha: 重要性程度。
.fn: 计算兴趣统计量的函数。该函数应采用 rsplit 作为第一个参数，并且需要 ...。

值

每个函数返回一个包含列 .lower 、 .estimate 、 .upper 、 .alpha 、 .method 和 term 的 tibble。 .method 是间隔类型(例如"percentile"、"student-t" 或"BCa")。 term 是估计的名称。请注意从 int_pctl() 返回的 .estimate

是引导重采样的估计值的平均值，而不是表观模型的估计值。

细节

百分位数间隔是获得置信区间的标准方法，但需要数千次重采样才能准确。 T-intervals 可能需要更少的重采样，但需要相应的方差估计。偏差校正和加速间隔需要用于创建感兴趣的统计数据的原始函数，并且计算量很大。

参考

https://rsample.tidymodels.org/articles/Applications/Intervals.html

戴维森，A. 和欣克利，D. (1997)。 Bootstrap 方法及其应用。剑桥：剑桥大学出版社。号码：10.1017/CBO9780511802843

也可以看看

reg_intervals()

例子

# \donttest{
library(broom)
library(dplyr)
library(purrr)
library(tibble)

lm_est <- function(split, ...) {
  lm(mpg ~ disp + hp, data = analysis(split)) %>%
    tidy()
}

set.seed(52156)
car_rs <-
  bootstraps(mtcars, 500, apparent = TRUE) %>%
  mutate(results = map(splits, lm_est))

int_pctl(car_rs, results)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms: `(Intercept)`, `disp`, `hp`.
#> # A tibble: 3 × 6
#>   term         .lower .estimate   .upper .alpha .method   
#>   <chr>         <dbl>     <dbl>    <dbl>  <dbl> <chr>     
#> 1 (Intercept) 27.5      30.7    33.6       0.05 percentile
#> 2 disp        -0.0440   -0.0300 -0.0162    0.05 percentile
#> 3 hp          -0.0572   -0.0260 -0.00840   0.05 percentile
int_t(car_rs, results)
#> # A tibble: 3 × 6
#>   term         .lower .estimate   .upper .alpha .method  
#>   <chr>         <dbl>     <dbl>    <dbl>  <dbl> <chr>    
#> 1 (Intercept) 28.1      30.7    34.6       0.05 student-t
#> 2 disp        -0.0446   -0.0300 -0.0170    0.05 student-t
#> 3 hp          -0.0449   -0.0260 -0.00337   0.05 student-t
int_bca(car_rs, results, .fn = lm_est)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms: `(Intercept)`, `disp`, `hp`.
#> # A tibble: 3 × 6
#>   term         .lower .estimate   .upper .alpha .method
#>   <chr>         <dbl>     <dbl>    <dbl>  <dbl> <chr>  
#> 1 (Intercept) 27.7      30.7    33.7       0.05 BCa    
#> 2 disp        -0.0446   -0.0300 -0.0172    0.05 BCa    
#> 3 hp          -0.0576   -0.0260 -0.00843   0.05 BCa    

# putting results into a tidy format
rank_corr <- function(split) {
  dat <- analysis(split)
  tibble(
    term = "corr",
    estimate = cor(dat$sqft, dat$price, method = "spearman"),
    # don't know the analytical std.err so no t-intervals
    std.err = NA_real_
  )
}

set.seed(69325)
data(Sacramento, package = "modeldata")
bootstraps(Sacramento, 1000, apparent = TRUE) %>%
  mutate(correlations = map(splits, rank_corr)) %>%
  int_pctl(correlations)
#> # A tibble: 1 × 6
#>   term  .lower .estimate .upper .alpha .method   
#>   <chr>  <dbl>     <dbl>  <dbl>  <dbl> <chr>     
#> 1 corr   0.737     0.768  0.796   0.05 percentile
# }

源代码：R/bootci.R

相关用法

注：本文由纯净天空筛选整理自Hannah Frick等大神的英文原创作品 Bootstrap confidence intervals。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。