R shash Sinh-arcsinh 位置比例和形状模型族

R语言 shash 位于 mgcv 包(package)。

说明

shash 系列实现了 Jones 和 Pewsey (2009) 的 four-parameter sinh-arcsinh(哈希)分布。密度的位置、尺度、偏度和峰度可以取决于加性平滑预测因子。仅可与 gam 一起使用，线性预测变量通过公式列表指定。在使用这种灵活模型之前，值得仔细考虑数据是否足以支持估计。

用法

shash(link = list("identity", "logeb", "identity", "identity"), 
      b = 1e-2, phiPen = 1e-3)

参数

`link`	由四个字符组成的向量，指示位置、尺度、偏度和峰度参数的链接函数。
`b`	logeb 链接函数的正参数，请参阅详细信息。
`phiPen`	峰度参数的脊罚分的正乘数。除非您知道自己在做什么，否则请勿触摸它，请参阅详细信息。

细节

shash族的密度函数是

其中 C(z)=\{1+S(z)^2\}^{1/2} 、 S(z)=\sinh\{\delta \sinh^{-1}(z)-\epsilon\} 和 z = (y - \mu)/(\sigma \delta) 。这里\mu和\sigma > 0分别控制位置和比例，\epsilon确定偏度，而\delta > 0控制尾重。 shash 可以对任一侧的偏度进行建模，具体取决于 \epsilon 的符号。此外，shash 的尾部可以比普通尾部更轻 ( \delta>1 ) 或更重 ( 0<\delta<1 )。出于拟合目的，这里我们使用 \tau = \log(\sigma) 和 \phi = \log(\delta) 。

密度基于第 4.1 节第二行给出的表达式以及 Jones 和 Pewsey (2009) 的方程 (2)，并使用第 4.3 节给出的简单重新参数化。

用于 \tau 的链接函数是 logeb ，其中是 \eta = \log \{\exp(\tau)-b\} ，因此反向链接是 \tau = \log(\sigma) = \log\{\exp(\eta)+b\} 。关键是我们不允许 \sigma 变得小于一个小常数 b。可能性包括岭罚分 - phiPen * \phi^2 ，它将 \phi 缩小到零。当有足够的数据可用时，岭罚分不会对拟合产生太大影响，但在将模型拟合到小数据集时将其包含在内很有用，以避免 \phi 发散到无穷大(Jones 和 Pewsey 已经确定了这个问题( 2009))。

值

继承自类 general.family 的对象。

例子


###############
# Shash dataset
###############
##  Simulate some data from shash
set.seed(847)
n <- 1000
x <- seq(-4, 4, length.out = n)

X <- cbind(1, x, x^2)
beta <- c(4, 1, 1)
mu <- X %*% beta 

sigma =  .5+0.4*(x+4)*.5            # Scale
eps = 2*sin(x)                      # Skewness
del = 1 + 0.2*cos(3*x)              # Kurtosis

dat <-  mu + (del*sigma)*sinh((1/del)*asinh(qnorm(runif(n))) + (eps/del))
dataf <- data.frame(cbind(dat, x))
names(dataf) <- c("y", "x")
plot(x, dat, xlab = "x", ylab = "y")

## Fit model
fit <- gam(list(y ~ s(x), # <- model for location 
                  ~ s(x),   # <- model for log-scale
                  ~ s(x),   # <- model for skewness
                  ~ s(x, k = 20)), # <- model for log-kurtosis
           data = dataf, 
           family = shash, # <- new family 
           optimizer = "efs")

## Plotting truth and estimates for each parameters of the density 
muE <- fit$fitted[ , 1]
sigE <- exp(fit$fitted[ , 2])
epsE <- fit$fitted[ , 3]
delE <- exp(fit$fitted[ , 4])

par(mfrow = c(2, 2))
plot(x, muE, type = 'l', ylab = expression(mu(x)), lwd = 2)
lines(x, mu, col = 2, lty = 2, lwd = 2)
legend("top", c("estimated", "truth"), col = 1:2, lty = 1:2, lwd = 2)

plot(x, sigE, type = 'l', ylab = expression(sigma(x)), lwd = 2)
lines(x, sigma, col = 2, lty = 2, lwd = 2)

plot(x, epsE, type = 'l', ylab = expression(epsilon(x)), lwd = 2)
lines(x, eps, col = 2, lty = 2, lwd = 2)

plot(x, delE, type = 'l', ylab = expression(delta(x)), lwd = 2)
lines(x, del, col = 2, lty = 2, lwd = 2)

## Plotting true and estimated conditional density
par(mfrow = c(1, 1))
plot(x, dat, pch = '.', col = "grey", ylab = "y", ylim = c(-35, 70))
for(qq in c(0.001, 0.01, 0.1, 0.5, 0.9, 0.99, 0.999)){
  est <- fit$family$qf(p=qq, mu = fit$fitted)
  true <- mu + (del * sigma) * sinh((1/del) * asinh(qnorm(qq)) + (eps/del))
  lines(x, est, type = 'l', col = 1, lwd = 2)
  lines(x, true, type = 'l', col = 2, lwd = 2, lty = 2)
}
legend("topleft", c("estimated", "truth"), col = 1:2, lty = 1:2, lwd = 2)

#####################
## Motorcycle example
#####################

# Here shash is overkill, in fact the fit is not good, relative
# to what we would get with mgcv::gaulss
library(MASS)

b <- gam(list(accel~s(times, k=20, bs = "ad"), ~s(times, k = 10), ~1, ~1),
         data=mcycle, family=shash)

par(mfrow = c(1, 1))
xSeq <- data.frame(cbind("accel" = rep(0, 1e3),
                   "times" = seq(2, 58, length.out = 1e3)))
pred <- predict(b, newdata = xSeq)
plot(mcycle$times, mcycle$accel, ylim = c(-180, 100))
for(qq in c(0.1, 0.3, 0.5, 0.7, 0.9)){
  est <- b$family$qf(p=qq, mu = pred)
  lines(xSeq$times, est, type = 'l', col = 2)
}

plot(b, pages = 1, scale = FALSE)

作者

Matteo Fasiolo <matteo.fasiolo@gmail.com> and Simon N. Wood.

参考

Jones, M. and A. Pewsey (2009). Sinh-arcsinh distributions. Biometrika 96 (4), 761-780. doi:10.1093/biomet/asp053

Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association 111, 1548-1575 doi:10.1080/01621459.2016.1180986

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 Sinh-arcsinh location scale and shape model family。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。