R smooth.construct.bs.smooth.spec GAM 中的懲罰 B 樣條

R語言 smooth.construct.bs.smooth.spec 位於 mgcv 包(package)。

說明

gam 可以使用基於單變量 B-spline 基數的平滑樣條以及基於導數的懲罰，通過 s(x,bs="bs",m=c(3,2)) 等術語指定。 m[1] 控製樣條線順序，m[1]=3 是三次樣條線，m[1]=2 是二次樣條線，依此類推。 m[2] 階導數的積分平方用作懲罰。所以m=c(3,2) 是傳統的三次樣條。 m 的任何其他元素(在前 2 個元素之後)定義進一步懲罰中導數的順序。如果 m 作為單個數字提供，則它被視為 m[1] 和 m[2]=m[1]-1 ，這隻是 m=3 三次樣條情況下的傳統平滑樣條。請注意，m[1] 的樣條階數定義很直觀，但與 tprs 和 p.spline 基數所使用的定義不同。請參閱用於控製評估懲罰的時間間隔的選項的詳細信息(如果需要推斷，這可能很重要)。

用法

## S3 method for class 'bs.smooth.spec'
smooth.construct(object, data, knots)
## S3 method for class 'Bspline.smooth'
Predict.matrix(object, data)

參數

`object`	平滑規範對象，通常由術語 `s(x,bs="bs",...)` 生成
`data`	僅包含該術語所需的數據(包括任何 `by` 變量)的列表，其名稱對應於 `object$term` (和 `object$by` )。 `by` 變量是最後一個元素。
`knots`	包含為基礎設置提供的任何結的列表 - 與 `data` 具有相同的順序和相同的名稱。可以是 `NULL` 。請參閱詳細信息以獲取更多信息。

細節

基礎和懲罰是稀疏的(盡管不使用稀疏矩陣來表示它們)。 m[2]>m[1] 將生成錯誤，因為在這種情況下，懲罰將基於基的未定義導數，這是沒有意義的。這些項可以具有不同階數的多個懲罰，例如s(x,bs="bs",m=c(3,2,1,0))指定具有3個懲罰的三次基：常規三次樣條懲罰、一階導數懲罰的積分平方和函數值懲罰的積分平方。

默認基本尺寸 k 是 10 和 m[1] 中的較大者。 m[1]是基礎尺寸的下限。如果提供了結，則提供的結的數量應為 k + m[1] + 1 ，並且中間 k-m[1]+1 結的範圍應包括所有協變量值。或者，可以提供 2 個結，表示可以評估樣條線的下限和上限(使該範圍太寬意味著沒有有關某些基係數的信息，因為相應的基函數具有不包含數據的跨度)。與 P-splines 不同，具有基於導數的懲罰的樣條線可以具有不均勻的結間距，沒有問題。

另一種選擇是提供 4 節。然後，外部 2 定義了要評估懲罰的區間，而內部 2 則定義了一個區間，除了最外麵的 2 個結之外的所有結都應位於該區間內。通常，外部 2 個節是可能需要預測的區間，而內部 2 個節定義數據所在的區間。此選項允許在比數據更寬的區間內應用懲罰，同時仍將大部分基函數放置在數據所在的位置。這在需要用平滑稍微進行推斷的情況下非常有用。僅在包含數據的區間上應用懲罰相當於一個模型，其中函數在區間外的平滑度可能低於區間內的平滑度，並導致非常寬的外推置信區間。然而，評估整個實數線上的懲罰的替代方法相當於斷言函數具有一些遠離數據的導數，這同樣是不合理的。最好建立一個模型，其中相同的平滑度假設適用於數據和外推間隔，但不適用於整個實線。請參閱示例代碼以獲取實際說明。

線性外推用於需要外推的預測(即內部 k-m[1]+1 結範圍之外的預測 - 評估懲罰的區間)。這種外推法在基礎構建中是不允許的，但在預測時是允許的。

可以在平滑規範或平滑對象中設置 deriv 標誌，以便模型或預測矩陣生成所需的樣條曲線導數，而不是對其進行評估。

值

類 "Bspline.smooth" 的對象。有關該對象將包含的元素，請參閱smooth.construct。

警告

m[1]在這裏直接控製樣條線順序，直觀上合理，但與其他基不同。

例子

  require(mgcv)
  set.seed(5)
  dat <- gamSim(1,n=400,dist="normal",scale=2)
  bs <- "bs"
  ## note the double penalty on the s(x2) term...
  b <- gam(y~s(x0,bs=bs,m=c(4,2))+s(x1,bs=bs)+s(x2,k=15,bs=bs,m=c(4,3,0))+
           s(x3,bs=bs,m=c(1,0)),data=dat,method="REML")
  plot(b,pages=1)

  ## Extrapolation example, illustrating the importance of considering
  ## the penalty carefully if extrapolating...
  f3 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 * (10 * x)^3 * 
              (1 - x)^10 ## test function
  n <- 100;x <- runif(n)
  y <- f3(x) + rnorm(n)*2
  ## first a model with first order penalty over whole real line (red)
  b0 <- gam(y~s(x,m=1,k=20),method="ML")
  ## now a model with first order penalty evaluated over (-.5,1.5) (black)
  op <- options(warn=-1)
  b <- gam(y~s(x,bs="bs",m=c(3,1),k=20),knots=list(x=c(-.5,0,1,1.5)),
           method="ML")
  options(op)
  ## and the equivalent with same penalty over data range only (blue)
  b1 <- gam(y~s(x,bs="bs",m=c(3,1),k=20),method="ML")
  pd <- data.frame(x=seq(-.7,1.7,length=200))
  fv <- predict(b,pd,se=TRUE)
  ul <- fv$fit + fv$se.fit*2; ll <- fv$fit - fv$se.fit*2
  plot(x,y,xlim=c(-.7,1.7),ylim=range(c(y,ll,ul)),main=
  "Order 1 penalties: red tps; blue bs on (0,1); black bs on (-.5,1.5)")
  ## penalty defined on (-.5,1.5) gives plausible predictions and intervals
  ## over this range...
  lines(pd$x,fv$fit);lines(pd$x,ul,lty=2);lines(pd$x,ll,lty=2)
  fv <- predict(b0,pd,se=TRUE)
  ul <- fv$fit + fv$se.fit*2; ll <- fv$fit - fv$se.fit*2
  ## penalty defined on whole real line gives constant width intervals away
  ## from data, as slope there must be zero, to avoid infinite penalty:
  lines(pd$x,fv$fit,col=2)
  lines(pd$x,ul,lty=2,col=2);lines(pd$x,ll,lty=2,col=2)
  fv <- predict(b1,pd,se=TRUE)
  ul <- fv$fit + fv$se.fit*2; ll <- fv$fit - fv$se.fit*2
  ## penalty defined only over the data interval (0,1) gives wild and wide
  ## extrapolation since penalty has been `turned off' outside data range:
  lines(pd$x,fv$fit,col=4)
  lines(pd$x,ul,lty=2,col=4);lines(pd$x,ll,lty=2,col=4)

  ## construct smooth of x. Model matrix sm$X and penalty 
  ## matrix sm$S[[1]] will have many zero entries...
  x <- seq(0,1,length=100)
  sm <- smoothCon(s(x,bs="bs"),data.frame(x))[[1]]

  ## another example checking penalty numerically...
  m <- c(4,2); k <- 15; b <- runif(k)
  sm <- smoothCon(s(x,bs="bs",m=m,k=k),data.frame(x),
                  scale.penalty=FALSE)[[1]]
  sm$deriv <- m[2]
  h0 <- 1e-3; xk <- sm$knots[(m[1]+1):(k+1)]
  Xp <- PredictMat(sm,data.frame(x=seq(xk[1]+h0/2,max(xk)-h0/2,h0)))
  sum((Xp%*%b)^2*h0) ## numerical approximation to penalty
  b%*%sm$S[[1]]%*%b  ## `exact' version

  ## ...repeated with uneven knot spacing...
  m <- c(4,2); k <- 15; b <- runif(k)
  ## produce the required 20 unevenly spaced knots...
  knots <- data.frame(x=c(-.4,-.3,-.2,-.1,-.001,.05,.15,
        .21,.3,.32,.4,.6,.65,.75,.9,1.001,1.1,1.2,1.3,1.4))
  sm <- smoothCon(s(x,bs="bs",m=m,k=k),data.frame(x),
        knots=knots,scale.penalty=FALSE)[[1]]
  sm$deriv <- m[2]
  h0 <- 1e-3; xk <- sm$knots[(m[1]+1):(k+1)]
  Xp <- PredictMat(sm,data.frame(x=seq(xk[1]+h0/2,max(xk)-h0/2,h0)))
  sum((Xp%*%b)^2*h0) ## numerical approximation to penalty
  b%*%sm$S[[1]]%*%b  ## `exact' version

作者

Simon N. Wood simon.wood@r-project.org. Extrapolation ideas joint with David Miller.

參考

Wood, S.N. (2017) P-splines with derivative based penalties and tensor product smoothing of unevenly distributed data. Statistics and Computing. 27(4) 985-989 https://arxiv.org/abs/1605.02446 doi:10.1007/s11222-016-9666-x

也可以看看

p.spline

相關用法

注：本文由純淨天空篩選整理自R-devel大神的英文原創作品 Penalized B-splines in GAMs。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。