R censboot 用于审查数据的引导程序

R语言 censboot 位于 boot 包(package)。

说明

此函数应用建议处理right-censored数据的引导重采样类型。它还可以使用 Cox 回归模型进行基于模型的重采样。

用法

censboot(data, statistic, R, F.surv, G.surv, strata = matrix(1,n,2),
         sim = "ordinary", cox = NULL, index = c(1, 2), ...,
         parallel = c("no", "multicore", "snow"),
         ncpus = getOption("boot.ncpus", 1L), cl = NULL)

参数

`data`	包含数据的 DataFrame 或矩阵。它必须至少有两列，其中一列包含时间，另一列包含审查指标。允许根据需要拥有任意数量的其他列(尽管大量列的效率会降低)，但 `sim = "weird"` 除外，因为它应该只有两列 - 时间和审查指标。 `index` 的组件引用的 `data` 的列被视为时间和审查指标。
`statistic`	对 DataFrame 进行操作并返回所需统计数据的函数。它的第一个参数必须是数据。它需要的任何其他参数都可以使用 `...` 参数传递。对于 `sim = "weird"` ，传递到 `statistic` 的数据仅包含时间和审查指示符，而不管 `data` 中的实际列数。在所有其他情况下，传递给统计的数据将与原始数据具有相同的形式。当 `sim = "weird"` 时，重采样数据集中的实际观测值数量可能与 `data` 中的数量不同。因此，如果提供`sim = "weird"` 和`strata`，`statistic` 还应采用指示层的数值向量。如果需要，这允许统计数据取决于层。
`R`	引导程序重复的数量。
`F.surv`	调用 `survfit` 返回的对象，提供数据的幸存者函数。这是必需的参数，除非缺少 `sim = "ordinary"` 或 `sim = "model"` 和 `cox`。
`G.surv`	调用 `survfit` 返回的另一个对象，但审查指标相反，以给出审查分布的 product-limit 估计值。请注意，为了保持一致性，在调用 `survfit` 时，未经审查的时间应减少少量。每当提供`sim = "cond"` 或`sim = "model"` 和`cox` 时，这都是必需的参数。
`strata`	调用 `survfit` 时使用的层。它可以是向量或具有 2 列的矩阵。如果它是一个向量，则假设它是生存分布的层，并且假设所有观测值的审查分布都相同。如果它是一个矩阵，那么第一列是生存分布的层，第二列是审查分布的层。当 `sim = "weird"` 时，仅使用生存分布的层，因为审查时间被认为是固定的。当 `sim = "ordinary"` 时，仅使用一组层对观测值进行分层，当它是矩阵时，这被视为 `strata` 的第一列。
`sim`	模拟类型。可能的类型是`"ordinary"`(大小写重采样)、`"model"`(如果缺少`cox`，则相当于`"ordinary"`，否则是基于模型的重采样)、`"weird"`(奇怪的引导程序。如果提供了`cox`)和`"cond"`(条件引导程序，其中审查时间是从条件审查分布中重新采样的)。
`cox`	从 `coxph` 返回的对象。如果提供了它，则 `F.surv` 应该由 `survfit(cox)` 形式的调用生成。
`index`	长度为 2 的向量，给出 `data` 中的列位置，分别对应于时间和审查指标。
`...`	其他命名参数在每次调用时都会原封不动地传递给 `statistic`。 `statistic` 的任何此类参数必须遵循模拟所需的 `statistic` 参数。请注意与上面列出的 `censboot` 参数的部分匹配，并且名为 `X` 和 `FUN` 的参数会导致 `boot` 的某些版本(但不是这个)发生冲突。
`parallel` , `ncpus` , `cl`	请参阅 `boot` 的帮助。

细节

Davison 和 Hinkley (1997) 的 3.5 和 7.3 节说明了各种类型的重采样。最简单的是案例重采样，它只是通过观察值的替换进行重采样。

条件引导程序根据生存分布的估计来模拟故障时间。然后，对于每个观察，如果观察被审查，则其模拟审查时间等于观察到的审查时间，并且根据估计审查分布生成，条件是大于观察到的失败时间(如果观察未经审查)。如果最大值被审查，则其名义审查时间为 Inf ，相反，如果未经审查，则其名义审查时间为 Inf 。为了使最大的观察结果出现在重采样中，这是必要的。

如果 Cox 回归模型适合数据并提供，则使用该模型根据生存分布生成故障时间。在这种情况下，审查时间可以根据估计的审查分布(sim = "model")或从上一段(sim = "cond")中的条件审查分布来模拟。

奇怪的引导程序将经过审查的观察结果以及观察到的故障时间保持为固定状态。然后，它使用均值为 1 的二项式分布生成每个故障时间的事件数，分母为原始数据集中当时可能发生的故障数。在我们的实现中，我们坚持认为每个引导数据集的每个层中至少有一个模拟事件。

当涉及地层并且sim是"model"或"cond"时，情况变得更加困难。由于生存分布和审查分布的层不同，因此对于某些观察，模拟故障时间和模拟审查时间可能都是无限的。要了解这一点，请考虑在 1F 层中观察生存分布，在 1G 层中观察审查分布。现在，如果层 1F 中的最大值被审查，则给出标称故障时间 Inf ，同样，如果层 1G 中的最大值未经审查，则给出标称审查时间 Inf ，因此模拟故障和审查时间可能是无限的。当发生这种情况时，模拟值被认为是生存分布层中观察到的最大故障时间时的故障。

当未提供parallel = "snow"和cl时，library(survival)在每个工作进程中运行。

值

"boot" 类的对象包含以下组件：

`t0`	应用于原始数据时`statistic` 的值。
`t`	`statistic` 值的引导复制矩阵。
`R`	执行的引导复制次数。
`sim`	使用的模拟类型。这通常是 `sim` 的输入值，除非是 `"model"` 但未提供 `cox`，在这种情况下它将是 `"ordinary"` 。
`data`	用于引导程序的数据。这通常是 `data` 的输入值，除非 `sim = "weird"` ，在这种情况下，它只是包含时间和审查指标的列。
`seed`	`censboot`开始工作时`.Random.seed`的值。
`statistic`	`statistic` 的输入值。
`strata`	重采样中使用的地层。当`sim = "ordinary"`时，这将是对观测值进行分层的向量，当`sim = "weird"`时，它是生存分布的层，在所有其他情况下，它是包含生存分布和审查分布的层的矩阵。
`call`	对 `censboot` 的原始调用。

例子

library(survival)
# Example 3.9 of Davison and Hinkley (1997) does a bootstrap on some
# remission times for patients with a type of leukaemia.  The patients
# were divided into those who received maintenance chemotherapy and 
# those who did not.  Here we are interested in the median remission 
# time for the two groups.
data(aml, package = "boot") # not the version in survival.
aml.fun <- function(data) {
     surv <- survfit(Surv(time, cens) ~ group, data = data)
     out <- NULL
     st <- 1
     for (s in 1:length(surv$strata)) {
          inds <- st:(st + surv$strata[s]-1)
          md <- min(surv$time[inds[1-surv$surv[inds] >= 0.5]])
          st <- st + surv$strata[s]
          out <- c(out, md)
     }
     out
}
aml.case <- censboot(aml, aml.fun, R = 499, strata = aml$group)

# Now we will look at the same statistic using the conditional 
# bootstrap and the weird bootstrap.  For the conditional bootstrap 
# the survival distribution is stratified but the censoring 
# distribution is not. 

aml.s1 <- survfit(Surv(time, cens) ~ group, data = aml)
aml.s2 <- survfit(Surv(time-0.001*cens, 1-cens) ~ 1, data = aml)
aml.cond <- censboot(aml, aml.fun, R = 499, strata = aml$group,
     F.surv = aml.s1, G.surv = aml.s2, sim = "cond")


# For the weird bootstrap we must redefine our function slightly since
# the data will not contain the group number.
aml.fun1 <- function(data, str) {
     surv <- survfit(Surv(data[, 1], data[, 2]) ~ str)
     out <- NULL
     st <- 1
     for (s in 1:length(surv$strata)) {
          inds <- st:(st + surv$strata[s] - 1)
          md <- min(surv$time[inds[1-surv$surv[inds] >= 0.5]])
          st <- st + surv$strata[s]
          out <- c(out, md)
     }
     out
}
aml.wei <- censboot(cbind(aml$time, aml$cens), aml.fun1, R = 499,
     strata = aml$group,  F.surv = aml.s1, sim = "weird")

# Now for an example where a cox regression model has been fitted
# the data we will look at the melanoma data of Example 7.6 from 
# Davison and Hinkley (1997).  The fitted model assumes that there
# is a different survival distribution for the ulcerated and 
# non-ulcerated groups but that the thickness of the tumour has a
# common effect.  We will also assume that the censoring distribution
# is different in different age groups.  The statistic of interest
# is the linear predictor.  This is returned as the values at a
# number of equally spaced points in the range of interest.
data(melanoma, package = "boot")
library(splines)# for ns
mel.cox <- coxph(Surv(time, status == 1) ~ ns(thickness, df=4) + strata(ulcer),
                 data = melanoma)
mel.surv <- survfit(mel.cox)
agec <- cut(melanoma$age, c(0, 39, 49, 59, 69, 100))
mel.cens <- survfit(Surv(time - 0.001*(status == 1), status != 1) ~
                    strata(agec), data = melanoma)
mel.fun <- function(d) { 
     t1 <- ns(d$thickness, df=4)
     cox <- coxph(Surv(d$time, d$status == 1) ~ t1+strata(d$ulcer))
     ind <- !duplicated(d$thickness)
     u <- d$thickness[!ind]
     eta <- cox$linear.predictors[!ind]
     sp <- smooth.spline(u, eta, df=20)
     th <- seq(from = 0.25, to = 10, by = 0.25)
     predict(sp, th)$y
}
mel.str <- cbind(melanoma$ulcer, agec)

# this is slow!
mel.mod <- censboot(melanoma, mel.fun, R = 499, F.surv = mel.surv,
     G.surv = mel.cens, cox = mel.cox, strata = mel.str, sim = "model")
# To plot the original predictor and a 95% pointwise envelope for it
mel.env <- envelope(mel.mod)$point
th <- seq(0.25, 10, by = 0.25)
plot(th, mel.env[1, ],  ylim = c(-2, 2),
     xlab = "thickness (mm)", ylab = "linear predictor", type = "n")
lines(th, mel.mod$t0, lty = 1)
matlines(th, t(mel.env), lty = 2)

作者

Angelo J. Canty. Parallel extensions by Brian Ripley

参考

Andersen, P.K., Borgan, O., Gill, R.D. and Keiding, N. (1993) Statistical Models Based on Counting Processes. Springer-Verlag.

Burr, D. (1994) A comparison of certain bootstrap confidence intervals in the Cox model. Journal of the American Statistical Association, 89, 1290-1302.

Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.

Efron, B. (1981) Censored data and the bootstrap. Journal of the American Statistical Association, 76, 312-319.

Hjort, N.L. (1985) Bootstrapping Cox's regression model. Technical report NSF-241, Dept. of Statistics, Stanford University.

也可以看看

boot、coxph、survfit

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 Bootstrap for Censored Data。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。