R yardstick mase 平均绝对比例误差

计算平均绝对比例误差。该指标与尺度无关且对称。它通常用于比较时间序列设置中的预测误差。由于该指标的时间序列性质，有必要按时间升序对观测值进行排序。

用法

mase(data, ...)

# S3 method for data.frame
mase(
  data,
  truth,
  estimate,
  m = 1L,
  mae_train = NULL,
  na_rm = TRUE,
  case_weights = NULL,
  ...
)

mase_vec(
  truth,
  estimate,
  m = 1L,
  mae_train = NULL,
  na_rm = TRUE,
  case_weights = NULL,
  ...
)

参数

data: data.frame 包含由 truth 和 estimate 参数指定的列。
...: 目前未使用。
truth: 真实结果的列标识符(即 numeric )。这应该是一个不带引号的列名，尽管此参数是通过表达式传递的并且支持quasiquotation(您可以不带引号的列名)。对于 _vec() 函数，一个 numeric 向量。
estimate: 预测结果的列标识符(也是 numeric )。与 truth 一样，可以通过不同的方式指定，但主要方法是使用不带引号的变量名称。对于 _vec() 函数，一个 numeric 向量。
m: 用于计算 in-sample 季节性朴素误差的滞后数的整数值。默认值用于非季节性时间序列。如果每个观测值均处于每日水平并且数据显示每周季节性，则 m = 7L 将是 7 天季节性朴素计算的合理选择。
mae_train: 允许用户提供 in-sample 季节性朴素平均绝对误差的数值。如果未提供此值，则将从 truth 计算并使用 out-of-sample 季节性朴素平均绝对误差。
na_rm: logical 值，指示在计算继续之前是否应剥离 NA 值。
case_weights: 案例权重的可选列标识符。这应该是一个不带引号的列名称，其计算结果为 data 中的数字列。对于 _vec() 函数，一个数值向量。

值

tibble 包含列 .metric 、 .estimator 和 .estimate 以及 1 行值。

对于分组 DataFrame ，返回的行数将与组数相同。

对于 mase_vec() ，单个 numeric 值(或 NA )。

细节

mase() 与大多数数字指标不同。 mase() 的原始实现要求使用 in-sample 朴素平均绝对误差来计算缩放误差。它使用此错误而不是 out-of-sample 错误，因为在预测非常短的时间范围(即样本外大小仅为 1 或 2)时，有可能无法计算 out-of-sample 错误。但是，yardstick 只知道out-of-sample、truth 和estimate 值。因此，默认情况下在计算中使用out-of-sample 错误。如果需要并已知 in-sample 朴素平均绝对误差，则可以在 mae_train 参数中传递它，并使用它来代替。如果 in-sample 数据可用，则可以使用 mae(data, truth, lagged_truth) 轻松计算朴素平均绝对误差。

参考

罗布·J·海德曼 (2006)。再看看FORECAST-ACCURACY 间歇性需求指标。远见, 4, 46.

也可以看看

其他数字指标：ccc() , huber_loss_pseudo() , huber_loss() , iic() , mae() , mape() , mpe() , msd() , poisson_log_loss() , rmse() , rpd() , rpiq() , rsq_trad() , rsq() , smape()

其他准确度指标：ccc() , huber_loss_pseudo() , huber_loss() , iic() , mae() , mape() , mpe() , msd() , poisson_log_loss() , rmse() , smape()

作者

亚历克斯·哈勒姆

例子

# Supply truth and predictions as bare column names
mase(solubility_test, solubility, prediction)
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 mase    standard        3.56

library(dplyr)

set.seed(1234)
size <- 100
times <- 10

# create 10 resamples
solubility_resampled <- bind_rows(
  replicate(
    n = times,
    expr = sample_n(solubility_test, size, replace = TRUE),
    simplify = FALSE
  ),
  .id = "resample"
)

# Compute the metric by group
metric_results <- solubility_resampled %>%
  group_by(resample) %>%
  mase(solubility, prediction)

metric_results
#> # A tibble: 10 × 4
#>    resample .metric .estimator .estimate
#>    <chr>    <chr>   <chr>          <dbl>
#>  1 1        mase    standard       0.256
#>  2 10       mase    standard       0.240
#>  3 2        mase    standard       0.238
#>  4 3        mase    standard       0.219
#>  5 4        mase    standard       0.229
#>  6 5        mase    standard       0.261
#>  7 6        mase    standard       0.217
#>  8 7        mase    standard       0.267
#>  9 8        mase    standard       0.216
#> 10 9        mase    standard       0.251

# Resampled mean estimate
metric_results %>%
  summarise(avg_estimate = mean(.estimate))
#> # A tibble: 1 × 1
#>   avg_estimate
#>          <dbl>
#> 1        0.240

源代码：R/num-mase.R

相关用法

注：本文由纯净天空筛选整理自Max Kuhn等大神的英文原创作品 Mean absolute scaled error。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。