R broom augment.rqs 使用來自 (n) 個 rqs 對象的信息來增強數據

Augment 接受模型對象和數據集，並添加有關數據集中每個觀察值的信息。最常見的是，這包括 .fitted 列中的預測值、.resid 列中的殘差以及 .se.fit 列中擬合值的標準誤差。新列始終以 . 前綴開頭，以避免覆蓋原始數據集中的列。

用戶可以通過 data 參數或 newdata 參數傳遞數據以進行增強。如果用戶將數據傳遞給 data 參數，則它必須正是用於擬合模型對象的數據。將數據集傳遞給 newdata 以擴充模型擬合期間未使用的數據。這仍然要求至少存在用於擬合模型的所有預測變量列。如果用於擬合模型的原始結果變量未包含在 newdata 中，則輸出中不會包含 .resid 列。

根據是否給出 data 或 newdata，增強的行為通常會有所不同。這是因為通常存在與訓練觀察(例如影響或相關)測量相關的信息，而這些信息對於新觀察沒有有意義的定義。

為了方便起見，許多增強方法提供默認的 data 參數，以便 augment(fit) 將返回增強的訓練數據。在這些情況下，augment 嘗試根據模型對象重建原始數據，並取得了不同程度的成功。

增強數據集始終以 tibble::tibble 形式返回，其行數與傳遞的數據集相同。這意味著傳遞的數據必須可強製轉換為 tibble。如果預測變量將模型作為協變量矩陣的一部分輸入，例如當模型公式使用 splines::ns() 、 stats::poly() 或 survival::Surv() 時，它會表示為矩陣列。

我們正在定義適合各種 na.action 參數的模型的行為，但目前不保證數據丟失時的行為。

用法

# S3 method for rqs
augment(x, data = model.frame(x), newdata, ...)

參數

x

從 quantreg::rq() 返回的 rqs 對象。

data

base::data.frame 或 tibble::tibble() 包含用於生成對象 x 的原始數據。默認為stats::model.frame(x)，以便augment(my_fit) 返回增強的原始數據。不要將新數據傳遞給 data 參數。增強將報告傳遞給 data 參數的數據的影響和烹飪距離等信息。這些度量僅針對原始訓練數據定義。

newdata

base::data.frame() 或 tibble::tibble() 包含用於創建 x 的所有原始預測變量。默認為 NULL ，表示沒有任何內容傳遞給 newdata 。如果指定了newdata，則data 參數將被忽略。

...

參數傳遞給quantreg::predict.rq

object: rq 生成的類 rq 或 rqs 或 rq.process 的對象
interval: 所需的間隔類型：默認為'none'，當設置為'confidence'時，該函數返回一個矩陣預測，其中包含每個'newdata'點的點預測以及置信上限和下限。
level: 'confidence' 區間的收斂概率。
type: 對於 predict.rq ， 'confidence' 間隔的方法(如果需要)。如果'percentile'，則使用一種引導方法為每個預測生成百分位數區間，如果'direct'，則使用 Portnoy 和 Zhou (1998) 方法的版本，否則參數估計的估計協方差矩陣為用過的。確定選擇引導方法或協方差矩陣估計的進一步參數可以通過 ... 參數傳遞。對於 predict.rqs 和 predict.rq.process，當 stepfun = TRUE 時，type 是 "Qhat"、"Fhat" 或 "fhat"，具體取決於用戶是否想要分別估計條件分位數、分布或密度函數。如下所述，前兩個估計可以使用函數 rearrange 進行單調化。調用 "fhat" 選項時，將根據 akj 和 approxfun 中實現的 Silverman 自適應內核方法返回條件密度函數列表。
na.action: 函數確定應如何處理'newdata'中的缺失值。默認是預測'NA'。

細節

根據通過 ... 傳遞給 predict.rq 的參數，還會根據擬合值計算置信區間，從而生成列 .lower 和 .upper 。當通過 newdata 參數指定數據時，不提供置信區間。

也可以看看

augment、quantreg::rq()、quantreg::predict.rqs()

其他 quantreg 整理器： augment.nlrq() 、 augment.rq() 、 glance.nlrq() 、 glance.rq() 、 tidy.nlrq() 、 tidy.rqs() 、 tidy.rq()

例子


# load modeling library and data
library(quantreg)

data(stackloss)

# median (l1) regression fit for the stackloss data.
mod1 <- rq(stack.loss ~ stack.x, .5)

# weighted sample median
mod2 <- rq(rnorm(50) ~ 1, weights = runif(50))

# summarize model fit with tidiers
tidy(mod1)
#> # A tibble: 4 × 5
#>   term              estimate conf.low conf.high   tau
#>   <chr>                <dbl>    <dbl>     <dbl> <dbl>
#> 1 (Intercept)       -39.7     -53.8    -24.5      0.5
#> 2 stack.xAir.Flow     0.832     0.509    1.17     0.5
#> 3 stack.xWater.Temp   0.574     0.272    3.04     0.5
#> 4 stack.xAcid.Conc.  -0.0609   -0.278    0.0153   0.5
glance(mod1)
#> # A tibble: 1 × 5
#>     tau logLik      AIC   BIC df.residual
#>   <dbl> <logLik>  <dbl> <dbl>       <int>
#> 1   0.5 -50.15272  108.  112.          17
augment(mod1)
#> # A tibble: 21 × 5
#>    stack.loss stack.x[,"Air.Flow"] [,"Water.Temp"]    .resid .fitted  .tau
#>         <dbl>                <dbl>           <dbl>     <dbl>   <dbl> <dbl>
#>  1         42                   80              27  5.06e+ 0    36.9   0.5
#>  2         37                   80              27 -1.42e-14    37     0.5
#>  3         37                   75              25  5.43e+ 0    31.6   0.5
#>  4         28                   62              24  7.63e+ 0    20.4   0.5
#>  5         18                   62              22 -1.22e+ 0    19.2   0.5
#>  6         18                   62              23 -1.79e+ 0    19.8   0.5
#>  7         19                   62              24 -1.00e+ 0    20     0.5
#>  8         20                   62              24 -7.11e-15    20     0.5
#>  9         15                   58              23 -1.46e+ 0    16.5   0.5
#> 10         14                   58              18 -2.03e- 2    14.0   0.5
#> # ℹ 11 more rows
#> # ℹ 1 more variable: stack.x[3] <dbl>

tidy(mod2)
#> # A tibble: 1 × 5
#>   term        estimate conf.low conf.high   tau
#>   <chr>          <dbl> <lgl>    <lgl>     <dbl>
#> 1 (Intercept)   0.0744 NA       NA          0.5
glance(mod2)
#> # A tibble: 1 × 5
#>     tau logLik     AIC   BIC df.residual
#>   <dbl> <logLik> <dbl> <dbl>       <int>
#> 1   0.5 -72.9869  148.  150.          49
augment(mod2)
#> # A tibble: 50 × 5
#>    `rnorm(50)` `(weights)` .resid .fitted  .tau
#>          <dbl>       <dbl>  <dbl>   <dbl> <dbl>
#>  1       1.25       0.192   1.18   0.0744   0.5
#>  2       0.458      0.321   0.383  0.0744   0.5
#>  3       0.765      0.0297  0.691  0.0744   0.5
#>  4       0.392      0.870   0.317  0.0744   0.5
#>  5      -0.547      0.647  -0.622  0.0744   0.5
#>  6      -0.468      0.319  -0.542  0.0744   0.5
#>  7      -1.11       0.293  -1.18   0.0744   0.5
#>  8       0.786      0.669   0.711  0.0744   0.5
#>  9      -0.648      0.408  -0.722  0.0744   0.5
#> 10       1.07       0.664   1.00   0.0744   0.5
#> # ℹ 40 more rows

# varying tau to generate an rqs object
mod3 <- rq(stack.loss ~ stack.x, tau = c(.25, .5))

tidy(mod3)
#> # A tibble: 8 × 5
#>   term               estimate conf.low conf.high   tau
#>   <chr>                 <dbl>    <dbl>     <dbl> <dbl>
#> 1 (Intercept)       -3.6 e+ 1  -59.0     -7.84    0.25
#> 2 stack.xAir.Flow    5.00e- 1    0.229    0.970   0.25
#> 3 stack.xWater.Temp  1.00e+ 0    0.286    2.26    0.25
#> 4 stack.xAcid.Conc. -4.58e-16   -0.643    0.0861  0.25
#> 5 (Intercept)       -3.97e+ 1  -53.8    -24.5     0.5 
#> 6 stack.xAir.Flow    8.32e- 1    0.509    1.17    0.5 
#> 7 stack.xWater.Temp  5.74e- 1    0.272    3.04    0.5 
#> 8 stack.xAcid.Conc. -6.09e- 2   -0.278    0.0153  0.5 
augment(mod3)
#> # A tibble: 42 × 5
#>    stack.loss stack.x[,"Air.Flow"] [,"Water.Temp"] .tau     .resid .fitted
#>         <dbl>                <dbl>           <dbl> <chr>     <dbl>   <dbl>
#>  1         42                   80              27 0.25   1.10e+ 1    31.0
#>  2         42                   80              27 0.5    5.06e+ 0    36.9
#>  3         37                   80              27 0.25   6.00e+ 0    31.0
#>  4         37                   80              27 0.5   -1.42e-14    37  
#>  5         37                   75              25 0.25   1.05e+ 1    26.5
#>  6         37                   75              25 0.5    5.43e+ 0    31.6
#>  7         28                   62              24 0.25   9.00e+ 0    19  
#>  8         28                   62              24 0.5    7.63e+ 0    20.4
#>  9         18                   62              22 0.25   1.00e+ 0    17.0
#> 10         18                   62              22 0.5   -1.22e+ 0    19.2
#> # ℹ 32 more rows
#> # ℹ 1 more variable: stack.x[3] <dbl>

# glance cannot handle rqs objects like `mod3`--use a purrr
# `map`-based workflow instead

源代碼：R/quantreg-rqs-tidiers.R

相關用法

注：本文由純淨天空篩選整理自等大神的英文原創作品 Augment data with information from a(n) rqs object。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。