R censboot 用於審查數據的引導程序

R語言 censboot 位於 boot 包(package)。

說明

此函數應用建議處理right-censored數據的引導重采樣類型。它還可以使用 Cox 回歸模型進行基於模型的重采樣。

用法

censboot(data, statistic, R, F.surv, G.surv, strata = matrix(1,n,2),
         sim = "ordinary", cox = NULL, index = c(1, 2), ...,
         parallel = c("no", "multicore", "snow"),
         ncpus = getOption("boot.ncpus", 1L), cl = NULL)

參數

`data`	包含數據的 DataFrame 或矩陣。它必須至少有兩列，其中一列包含時間，另一列包含審查指標。允許根據需要擁有任意數量的其他列(盡管大量列的效率會降低)，但 `sim = "weird"` 除外，因為它應該隻有兩列 - 時間和審查指標。 `index` 的組件引用的 `data` 的列被視為時間和審查指標。
`statistic`	對 DataFrame 進行操作並返回所需統計數據的函數。它的第一個參數必須是數據。它需要的任何其他參數都可以使用 `...` 參數傳遞。對於 `sim = "weird"` ，傳遞到 `statistic` 的數據僅包含時間和審查指示符，而不管 `data` 中的實際列數。在所有其他情況下，傳遞給統計的數據將與原始數據具有相同的形式。當 `sim = "weird"` 時，重采樣數據集中的實際觀測值數量可能與 `data` 中的數量不同。因此，如果提供`sim = "weird"` 和`strata`，`statistic` 還應采用指示層的數值向量。如果需要，這允許統計數據取決於層。
`R`	引導程序重複的數量。
`F.surv`	調用 `survfit` 返回的對象，提供數據的幸存者函數。這是必需的參數，除非缺少 `sim = "ordinary"` 或 `sim = "model"` 和 `cox`。
`G.surv`	調用 `survfit` 返回的另一個對象，但審查指標相反，以給出審查分布的 product-limit 估計值。請注意，為了保持一致性，在調用 `survfit` 時，未經審查的時間應減少少量。每當提供`sim = "cond"` 或`sim = "model"` 和`cox` 時，這都是必需的參數。
`strata`	調用 `survfit` 時使用的層。它可以是向量或具有 2 列的矩陣。如果它是一個向量，則假設它是生存分布的層，並且假設所有觀測值的審查分布都相同。如果它是一個矩陣，那麽第一列是生存分布的層，第二列是審查分布的層。當 `sim = "weird"` 時，僅使用生存分布的層，因為審查時間被認為是固定的。當 `sim = "ordinary"` 時，僅使用一組層對觀測值進行分層，當它是矩陣時，這被視為 `strata` 的第一列。
`sim`	模擬類型。可能的類型是`"ordinary"`(大小寫重采樣)、`"model"`(如果缺少`cox`，則相當於`"ordinary"`，否則是基於模型的重采樣)、`"weird"`(奇怪的引導程序。如果提供了`cox`)和`"cond"`(條件引導程序，其中審查時間是從條件審查分布中重新采樣的)。
`cox`	從 `coxph` 返回的對象。如果提供了它，則 `F.surv` 應該由 `survfit(cox)` 形式的調用生成。
`index`	長度為 2 的向量，給出 `data` 中的列位置，分別對應於時間和審查指標。
`...`	其他命名參數在每次調用時都會原封不動地傳遞給 `statistic`。 `statistic` 的任何此類參數必須遵循模擬所需的 `statistic` 參數。請注意與上麵列出的 `censboot` 參數的部分匹配，並且名為 `X` 和 `FUN` 的參數會導致 `boot` 的某些版本(但不是這個)發生衝突。
`parallel` , `ncpus` , `cl`	請參閱 `boot` 的幫助。

細節

Davison 和 Hinkley (1997) 的 3.5 和 7.3 節說明了各種類型的重采樣。最簡單的是案例重采樣，它隻是通過觀察值的替換進行重采樣。

條件引導程序根據生存分布的估計來模擬故障時間。然後，對於每個觀察，如果觀察被審查，則其模擬審查時間等於觀察到的審查時間，並且根據估計審查分布生成，條件是大於觀察到的失敗時間(如果觀察未經審查)。如果最大值被審查，則其名義審查時間為 Inf ，相反，如果未經審查，則其名義審查時間為 Inf 。為了使最大的觀察結果出現在重采樣中，這是必要的。

如果 Cox 回歸模型適合數據並提供，則使用該模型根據生存分布生成故障時間。在這種情況下，審查時間可以根據估計的審查分布(sim = "model")或從上一段(sim = "cond")中的條件審查分布來模擬。

奇怪的引導程序將經過審查的觀察結果以及觀察到的故障時間保持為固定狀態。然後，它使用均值為 1 的二項式分布生成每個故障時間的事件數，分母為原始數據集中當時可能發生的故障數。在我們的實現中，我們堅持認為每個引導數據集的每個層中至少有一個模擬事件。

當涉及地層並且sim是"model"或"cond"時，情況變得更加困難。由於生存分布和審查分布的層不同，因此對於某些觀察，模擬故障時間和模擬審查時間可能都是無限的。要了解這一點，請考慮在 1F 層中觀察生存分布，在 1G 層中觀察審查分布。現在，如果層 1F 中的最大值被審查，則給出標稱故障時間 Inf ，同樣，如果層 1G 中的最大值未經審查，則給出標稱審查時間 Inf ，因此模擬故障和審查時間可能是無限的。當發生這種情況時，模擬值被認為是生存分布層中觀察到的最大故障時間時的故障。

當未提供parallel = "snow"和cl時，library(survival)在每個工作進程中運行。

值

"boot" 類的對象包含以下組件：

`t0`	應用於原始數據時`statistic` 的值。
`t`	`statistic` 值的引導複製矩陣。
`R`	執行的引導複製次數。
`sim`	使用的模擬類型。這通常是 `sim` 的輸入值，除非是 `"model"` 但未提供 `cox`，在這種情況下它將是 `"ordinary"` 。
`data`	用於引導程序的數據。這通常是 `data` 的輸入值，除非 `sim = "weird"` ，在這種情況下，它隻是包含時間和審查指標的列。
`seed`	`censboot`開始工作時`.Random.seed`的值。
`statistic`	`statistic` 的輸入值。
`strata`	重采樣中使用的地層。當`sim = "ordinary"`時，這將是對觀測值進行分層的向量，當`sim = "weird"`時，它是生存分布的層，在所有其他情況下，它是包含生存分布和審查分布的層的矩陣。
`call`	對 `censboot` 的原始調用。

例子

library(survival)
# Example 3.9 of Davison and Hinkley (1997) does a bootstrap on some
# remission times for patients with a type of leukaemia.  The patients
# were divided into those who received maintenance chemotherapy and 
# those who did not.  Here we are interested in the median remission 
# time for the two groups.
data(aml, package = "boot") # not the version in survival.
aml.fun <- function(data) {
     surv <- survfit(Surv(time, cens) ~ group, data = data)
     out <- NULL
     st <- 1
     for (s in 1:length(surv$strata)) {
          inds <- st:(st + surv$strata[s]-1)
          md <- min(surv$time[inds[1-surv$surv[inds] >= 0.5]])
          st <- st + surv$strata[s]
          out <- c(out, md)
     }
     out
}
aml.case <- censboot(aml, aml.fun, R = 499, strata = aml$group)

# Now we will look at the same statistic using the conditional 
# bootstrap and the weird bootstrap.  For the conditional bootstrap 
# the survival distribution is stratified but the censoring 
# distribution is not. 

aml.s1 <- survfit(Surv(time, cens) ~ group, data = aml)
aml.s2 <- survfit(Surv(time-0.001*cens, 1-cens) ~ 1, data = aml)
aml.cond <- censboot(aml, aml.fun, R = 499, strata = aml$group,
     F.surv = aml.s1, G.surv = aml.s2, sim = "cond")


# For the weird bootstrap we must redefine our function slightly since
# the data will not contain the group number.
aml.fun1 <- function(data, str) {
     surv <- survfit(Surv(data[, 1], data[, 2]) ~ str)
     out <- NULL
     st <- 1
     for (s in 1:length(surv$strata)) {
          inds <- st:(st + surv$strata[s] - 1)
          md <- min(surv$time[inds[1-surv$surv[inds] >= 0.5]])
          st <- st + surv$strata[s]
          out <- c(out, md)
     }
     out
}
aml.wei <- censboot(cbind(aml$time, aml$cens), aml.fun1, R = 499,
     strata = aml$group,  F.surv = aml.s1, sim = "weird")

# Now for an example where a cox regression model has been fitted
# the data we will look at the melanoma data of Example 7.6 from 
# Davison and Hinkley (1997).  The fitted model assumes that there
# is a different survival distribution for the ulcerated and 
# non-ulcerated groups but that the thickness of the tumour has a
# common effect.  We will also assume that the censoring distribution
# is different in different age groups.  The statistic of interest
# is the linear predictor.  This is returned as the values at a
# number of equally spaced points in the range of interest.
data(melanoma, package = "boot")
library(splines)# for ns
mel.cox <- coxph(Surv(time, status == 1) ~ ns(thickness, df=4) + strata(ulcer),
                 data = melanoma)
mel.surv <- survfit(mel.cox)
agec <- cut(melanoma$age, c(0, 39, 49, 59, 69, 100))
mel.cens <- survfit(Surv(time - 0.001*(status == 1), status != 1) ~
                    strata(agec), data = melanoma)
mel.fun <- function(d) { 
     t1 <- ns(d$thickness, df=4)
     cox <- coxph(Surv(d$time, d$status == 1) ~ t1+strata(d$ulcer))
     ind <- !duplicated(d$thickness)
     u <- d$thickness[!ind]
     eta <- cox$linear.predictors[!ind]
     sp <- smooth.spline(u, eta, df=20)
     th <- seq(from = 0.25, to = 10, by = 0.25)
     predict(sp, th)$y
}
mel.str <- cbind(melanoma$ulcer, agec)

# this is slow!
mel.mod <- censboot(melanoma, mel.fun, R = 499, F.surv = mel.surv,
     G.surv = mel.cens, cox = mel.cox, strata = mel.str, sim = "model")
# To plot the original predictor and a 95% pointwise envelope for it
mel.env <- envelope(mel.mod)$point
th <- seq(0.25, 10, by = 0.25)
plot(th, mel.env[1, ],  ylim = c(-2, 2),
     xlab = "thickness (mm)", ylab = "linear predictor", type = "n")
lines(th, mel.mod$t0, lty = 1)
matlines(th, t(mel.env), lty = 2)

作者

Angelo J. Canty. Parallel extensions by Brian Ripley

參考

Andersen, P.K., Borgan, O., Gill, R.D. and Keiding, N. (1993) Statistical Models Based on Counting Processes. Springer-Verlag.

Burr, D. (1994) A comparison of certain bootstrap confidence intervals in the Cox model. Journal of the American Statistical Association, 89, 1290-1302.

Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.

Efron, B. (1981) Censored data and the bootstrap. Journal of the American Statistical Association, 76, 312-319.

Hjort, N.L. (1985) Bootstrapping Cox's regression model. Technical report NSF-241, Dept. of Statistics, Stanford University.

也可以看看

boot、coxph、survfit

相關用法

注：本文由純淨天空篩選整理自R-devel大神的英文原創作品 Bootstrap for Censored Data。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。