R cox.ph 附加 Cox 比例风险模型

R语言 cox.ph 位于 mgcv 包(package)。

说明

cox.ph 系列实现了 Cox 比例风险模型，其中包含 Peto 对关系的校正、可选分层以及通过惩罚部分似然最大化进行的估计，与 gam 一起使用。在模型公式中，事件时间是响应。在分层下，响应有两列：时间和层的数字索引。 weights 向量提供审查信息(0 表示审查，1 表示事件)。 cox.ph 处理每个受试者都有一个事件/审查时间和一行协变量值的情况。当每个受试者有多个时间相关协变量时，请参阅cox.pht。

请参阅下面的条件逻辑回归示例。

用法

cox.ph(link="identity")

参数

link

目前(可能永远)仅支持"identity"。

细节

与 gam 一起使用，使 Cox 比例风险模型适合生存数据。模型公式的左侧将包含事件/审查时间，右侧将包含线性预测器规范。审查信息由 weights 参数提供给 gam ，其中 1 表示事件，0 表示审查。

分层是可能的，允许不同层存在不同的基线危险。在这种情况下，响应有两列：第一列是事件/审查时间，第二列是数字层索引。请参阅下面的示例。

使用 type="response" 从拟合模型对象(使用 predict 方法)进行的预测将在幸存者函数尺度上进行预测。这需要提供评估时间以及协变量(参见示例)。另请参阅下面的示例代码，以直接提取累积基线危险/生存率。模型对象中存储的 fitted.values 是每个受试者在事件/审查时间的生存函数估计。

可以提取 deviance 、 martingale 、 score 或 schoenfeld 残差。参见 Klein amd Moeschberger (2003) 的说明。分数残差以与模型矩阵相同维度的矩阵形式返回，具有 "terms" 属性，该属性是一个列表，指示哪些模型矩阵列属于哪些模型项。分数残差被缩放。对于参数项，这是由相关模型系数的标准偏差决定的。对于平滑项，该项的分数残差子矩阵后乘以该项系数的协方差矩阵的转置 Cholesky 因子。这是一种使系数近似独立的变换，根据需要使分数残差与事件时间的关系图可解释以检查比例风险假设(参见 Klein amd Moeschberger，2003，p376)。惩罚会导致分数残差发生漂移，该漂移也被删除，以允许残差近似解释为未惩罚的分数残差。 Schoenfeld 和分数残差按分层计算。有关详细信息，请参阅通过绘制分数残差进行简单 PH 假设检查的示例，以及 Klein amd Moeschberger(2003 年，第 11.4 节)。请注意，术语之间的高度相关性可能会破坏这些检查。

模型系数的估计是通过最大化受平滑惩罚惩罚的 log-partial 似然来实现的。参见例如Hastie 和 Tibshirani，1990 年，第 8.3 节。对于所使用的部分似然(使用 Peto 近似关系)，但请注意，部分似然的优化并不遵循 Hastie 和 Tibshirani。有关残差、累积基线风险、生存函数和相关标准误差的估计，请参阅 Klein amd Moeschberger (2003)(生存标准误差表达式有一个拼写错误)。

Cox PH 模型报告的偏差百分比解释基于偏差残差的平方和(作为模型偏差)以及协变量效应设置为零时的偏差残差的平方和(作为零偏差)。两者使用相同的基线危险估计。

该系列有效地处理每个受试者具有一个事件/审查时间和一行协变量值的情况。对于每个受试者有多个时变协变量测量的研究，则应使用 bam(...,discrete=TRUE) 将等效泊松模型拟合到合适的伪数据。请参阅cox.pht。

值

继承自类 general.family 的对象。

例子

library(mgcv)
library(survival) ## for data
col1 <- colon[colon$etype==1,] ## concentrate on single event
col1$differ <- as.factor(col1$differ)
col1$sex <- as.factor(col1$sex)

b <- gam(time~s(age,by=sex)+sex+s(nodes)+perfor+rx+obstruct+adhere,
         family=cox.ph(),data=col1,weights=status)

summary(b) 

plot(b,pages=1,all.terms=TRUE) ## plot effects

plot(b$linear.predictors,residuals(b))

## plot survival function for patient j...

np <- 300;j <- 6
newd <- data.frame(time=seq(0,3000,length=np))
dname <- names(col1)
for (n in dname) newd[[n]] <- rep(col1[[n]][j],np)
newd$time <- seq(0,3000,length=np)
fv <- predict(b,newdata=newd,type="response",se=TRUE)
plot(newd$time,fv$fit,type="l",ylim=c(0,1),xlab="time",ylab="survival")
lines(newd$time,fv$fit+2*fv$se.fit,col=2)
lines(newd$time,fv$fit-2*fv$se.fit,col=2)

## crude plot of baseline survival...

plot(b$family$data$tr,exp(-b$family$data$h),type="l",ylim=c(0,1),
     xlab="time",ylab="survival")
lines(b$family$data$tr,exp(-b$family$data$h + 2*b$family$data$q^.5),col=2)
lines(b$family$data$tr,exp(-b$family$data$h - 2*b$family$data$q^.5),col=2)
lines(b$family$data$tr,exp(-b$family$data$km),lty=2) ## Kaplan Meier

## Checking the proportional hazards assumption via scaled score plots as
## in Klein and Moeschberger Section 11.4 p374-376... 

ph.resid <- function(b,stratum=1) {
## convenience function to plot scaled score residuals against time,
## by term. Reference lines at 5% exceedance prob for Brownian bridge
## (see KS test statistic distribution).
  rs <- residuals(b,"score");term <- attr(rs,"term")
  if (is.matrix(b$y)) {
    ii <- b$y[,2] == stratum;b$y <- b$y[ii,1];rs <- rs[ii,]
  }
  oy <- order(b$y)
  for (i in 1:length(term)) {
    ii <- term[[i]]; m <- length(ii)
    plot(b$y[oy],rs[oy,ii[1]],ylim=c(-3,3),type="l",ylab="score residuals",
         xlab="time",main=names(term)[i])
    if (m>1) for (k in 2:m) lines(b$y[oy],rs[oy,ii[k]],col=k);
    abline(-1.3581,0,lty=2);abline(1.3581,0,lty=2) 
  }  
}
par(mfrow=c(2,2))
ph.resid(b)

## stratification example, with 2 randomly allocated strata
## so that results should be similar to previous....
col1$strata <- sample(1:2,nrow(col1),replace=TRUE) 
bs <- gam(cbind(time,strata)~s(age,by=sex)+sex+s(nodes)+perfor+rx+obstruct
          +adhere,family=cox.ph(),data=col1,weights=status)
plot(bs,pages=1,all.terms=TRUE) ## plot effects

## baseline survival plots by strata...

for (i in 1:2) { ## loop over strata
## create index selecting elements of stored hazard info for stratum i...
ind <- which(bs$family$data$tr.strat == i)
if (i==1) plot(bs$family$data$tr[ind],exp(-bs$family$data$h[ind]),type="l",
      ylim=c(0,1),xlab="time",ylab="survival",lwd=2,col=i) else
      lines(bs$family$data$tr[ind],exp(-bs$family$data$h[ind]),lwd=2,col=i)
lines(bs$family$data$tr[ind],exp(-bs$family$data$h[ind] +
      2*bs$family$data$q[ind]^.5),lty=2,col=i) ## upper ci
lines(bs$family$data$tr[ind],exp(-bs$family$data$h[ind] -
      2*bs$family$data$q[ind]^.5),lty=2,col=i) ## lower ci
lines(bs$family$data$tr[ind],exp(-bs$family$data$km[ind]),col=i) ## KM
}


## Simple simulated known truth example...
ph.weibull.sim <- function(eta,gamma=1,h0=.01,t1=100) { 
  lambda <- h0*exp(eta) 
  n <- length(eta)
  U <- runif(n)
  t <- (-log(U)/lambda)^(1/gamma)
  d <- as.numeric(t <= t1)
  t[!d] <- t1
  list(t=t,d=d)
}
n <- 500;set.seed(2)
x0 <- runif(n, 0, 1);x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1);x3 <- runif(n, 0, 1)
f0 <- function(x) 2 * sin(pi * x)
f1 <- function(x) exp(2 * x)
f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
f3 <- function(x) 0*x
f <- f0(x0) + f1(x1)  + f2(x2)
g <- (f-mean(f))/5
surv <- ph.weibull.sim(g)
surv$x0 <- x0;surv$x1 <- x1;surv$x2 <- x2;surv$x3 <- x3

b <- gam(t~s(x0)+s(x1)+s(x2,k=15)+s(x3),family=cox.ph,weights=d,data=surv)

plot(b,pages=1)

## Another one, including a violation of proportional hazards for
## effect of x2...

set.seed(2)
h <- exp((f0(x0)+f1(x1)+f2(x2)-10)/5)
t <- rexp(n,h);d <- as.numeric(t<20)

## first with no violation of PH in the simulation...
b <- gam(t~s(x0)+s(x1)+s(x2)+s(x3),family=cox.ph,weights=d)
plot(b,pages=1)
ph.resid(b) ## fine

## Now violate PH for x2 in the simulation...
ii <- t>1.5
h1 <- exp((f0(x0)+f1(x1)+3*f2(x2)-10)/5)
t[ii] <- 1.5 + rexp(sum(ii),h1[ii]);d <- as.numeric(t<20)

b <- gam(t~s(x0)+s(x1)+s(x2)+s(x3),family=cox.ph,weights=d)
plot(b,pages=1)
ph.resid(b) ## The checking plot picks up the problem in s(x2) 


## conditional logistic regression models are often estimated using the 
## cox proportional hazards partial likelihood with a strata for each
## case-control group. A dummy vector of times is created (all equal). 
## The following compares to 'clogit' for a simple case. Note that
## the gam log likelihood is not exact if there is more than one case
## per stratum, corresponding to clogit's approximate method.
library(survival);library(mgcv)
infert$dumt <- rep(1,nrow(infert))
mg <- gam(cbind(dumt,stratum) ~ spontaneous + induced, data=infert,
          family=cox.ph,weights=case)
ms <- clogit(case ~ spontaneous + induced + strata(stratum), data=infert,
             method="approximate")
summary(mg)$p.table[1:2,]; ms

参考

Hastie and Tibshirani (1990) Generalized Additive Models, Chapman and Hall.

Klein, J.P and Moeschberger, M.L. (2003) Survival Analysis: Techniques for Censored and Truncated Data (2nd ed.) Springer.

Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association 111, 1548-1575 doi:10.1080/01621459.2016.1180986

也可以看看

cox.pht , cnorm

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 Additive Cox Proportional Hazard Model。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。