Estimate calibration curves for a multistate model using pseudo-values.
calib_pv.Rd
Creates the underlying data for the calibration curves. calib_pv
estimates the
observed event probabilities for a given set of predicted transition probabilities
in a cohort of interest. This is done using techniques for assessing calibration of binary logistic regression models,
in combination with inverse probability of censoring weights and landmarking.
Usage
calib_pv(
data.mstate,
data.raw,
j,
s,
t,
tp.pred,
curve.type = "rcs",
rcs.nk = 3,
loess.span = 0.75,
loess.degree = 2,
group.vars = NULL,
n.pctls = NULL,
CI = FALSE,
CI.type = "parametric",
CI.R.boot = NULL,
data.pred.plot = NULL,
transitions.out = NULL
)
Arguments
- data.mstate
Validation data in
msdata
format- data.raw
Validation data in
data.frame
(one row per individual)- j
Landmark state at which predictions were made
- s
Landmark time at which predictions were made
- t
Follow up time at which calibration is to be assessed
- tp.pred
Matrix of predicted transition probabilities at time t, if in state j at time s. There must be a seperate column for the predicted transition probabilities into every state, even if these predicted transition probabilities are 0.
- curve.type
Whether calibration curves are estimated using restricted cubic splines ('rcs') or loess smoothers ('loess')
- rcs.nk
Number of knots when curves are estimated using restricted cubic splines
- loess.span
Span when curves are estimated using loess smoothers
- loess.degree
Degree when curves are estimated. using loess smoothers
- group.vars
Baseline variables to define groups within which to estimate pseudo-values
- n.pctls
Number of percentiles to group individuals by with respect to predicted transition probabilities when estimating pseudo-values
- CI
Size of confidence intervals as a %
- CI.type
Method for estimating confidence interval (
bootstrap
orparametric
)- CI.R.boot
Number of bootstrap replicates when estimating the confidence interval for the calibration curve using bootstrapping
- data.pred.plot
Data frame or matrix of predicted risks for each possible transition over which to plot the calibration curves. Must have one column for every possible transition.
- transitions.out
Transitions for which to calculate calibration curves. Will do all possible transitions if left as NULL.
Value
calib_pv
returns a list containing two elements:
plotdata
and metadata
. The plotdata
element contains the
data for the calibration curves. This will itself be a list with each element
containing calibration plot data for the transition probabilities into each of the possible
states. Each list element contains patient ids (id
) from data.raw
, the predicted
transition probabilities (pred
) and the estimated observed event
probabilities (obs
). If a confidence interval is requested, upper (obs.upper
)
and lower (obs.lower
) bounds for the observed event probabilities are also returned.
If data.pred.plot is defined manually, column (id
) is not returned.
The metadata
element contains metadata including: a vector of the possible transitions,
a vector of which transitions calibration curves have been estimated for, the
size of the confidence interval, the method for estimating the calibration curve
and other user specified information.
Details
Observed event probabilities at time t
are estimated for predicted
transition probabilities tp.pred
out of state j
at time s
.
calib_pv
estimates the observed event probabilities using pseudo-values (Andersen PK, Pohar Perme M, 2010)
calculated using the Aalen-Johansen estimator (Aalen OO, Johansen S, 1978)
Calibration curves are generated by regressing the pseudo-values on the predicted transition probabilities.
Currently calibration curves can be produced using loess smoothers or restricted cubic splines.
This will be updated to include restricted cubic splines. Landmarking (van Houwelingen HC, 2007) is applied to only assess calibration
in individuals who are uncensored and in state j
at time s
.
Two datasets for the same cohort of inidividuals must be provided. Firstly data.mstate
must be a dataset of class msdata
,
generated using the [mstate]
package. This dataset is used to apply the landmarking. Secondly, data.raw
must be
a data.frame
with one row per individual, containing the desired variables for
calculating pseudo-values within (no baseline variables required if group.vars = NULL
).
Confidence intervals for the calibration curves can be estimated using bootstrapping.
The calibration curves can be plotted using plot.calib_pv
.
References
Aalen OO, Johansen S. An Empirical Transition Matrix for Non-Homogeneous Markov Chains Based on Censored Observations. Scand J Stat. 1978;5(3):141-150.
Andersen PK, Pohar Perme M. Pseudo-observations in survival analysis. Stat Methods Med Res. 2010;19(1):71-99. doi:10.1177/0962280209105020
van Houwelingen HC (2007). “Dynamic Prediction by Landmarking in Event History Analysis.” Scandinavian Journal of Statistics, 34(1), 70–85.
Examples
# Using competing risks data out of initial state.
# See vignette: comparison-with-graphical-calibration-curves-in-competing-risk-setting.
# Estimate pseudo-value calibration curves for the predicted transition
# probabilities at time t = 1826, when predictions were made at time
# s = 0 in state j = 1. These predicted transition probabilities are stored in tp.cmprsk.j0.
# To minimise example time we reduce the datasets to 50 individuals.
# Extract the predicted transition probabilities out of state j = 1 for first 50 individuals
tp.pred <- tp.cmprsk.j0 |>
dplyr::filter(id %in% 1:50) |>
dplyr::select(any_of(paste("pstate", 1:6, sep = "")))
# Reduce ebmtcal to first 50 individuals
ebmtcal <- ebmtcal |> dplyr::filter(id %in% 1:50)
# Reduce msebmtcal.cmprsk to first 50 individuals
msebmtcal.cmprsk <- msebmtcal.cmprsk |> dplyr::filter(id %in% 1:50)
# Now estimate the observed event probabilities for each possible transition.
dat.calib.pv <- calib_pv(data.mstate = msebmtcal.cmprsk,
data.raw = ebmtcal,
j = 1,
s = 0,
t = 1826,
tp.pred = tp.pred,
curve.type = "loess",
loess.span = 1,
loess.degree = 1)
# The data for each calibration curve are stored in the "plotdata" list
# element.
str(dat.calib.pv)
#> List of 2
#> $ plotdata:List of 5
#> ..$ state1:'data.frame': 50 obs. of 3 variables:
#> .. ..$ id : num [1:50] 1 2 3 4 5 6 7 8 9 10 ...
#> .. ..$ pred: num [1:50] 0.114 0.114 0.113 0.138 0.123 ...
#> .. ..$ obs : num [1:50] 0.0123 0.0123 0.0121 0.0271 0.0184 ...
#> ..$ state2:'data.frame': 50 obs. of 3 variables:
#> .. ..$ id : num [1:50] 1 2 3 4 5 6 7 8 9 10 ...
#> .. ..$ pred: num [1:50] 0.409 0.411 0.412 0.387 0.428 ...
#> .. ..$ obs : num [1:50] 0.395 0.396 0.397 0.381 0.406 ...
#> ..$ state3:'data.frame': 50 obs. of 3 variables:
#> .. ..$ id : num [1:50] 1 2 3 4 5 6 7 8 9 10 ...
#> .. ..$ pred: num [1:50] 0.396 0.394 0.394 0.373 0.35 ...
#> .. ..$ obs : num [1:50] 0.671 0.653 0.655 0.474 0.315 ...
#> ..$ state5:'data.frame': 50 obs. of 3 variables:
#> .. ..$ id : num [1:50] 1 2 3 4 5 6 7 8 9 10 ...
#> .. ..$ pred: num [1:50] 0.0269 0.0269 0.0268 0.0403 0.0313 ...
#> .. ..$ obs : num [1:50] 0.0727 0.0727 0.0728 0.0597 0.0689 ...
#> ..$ state6:'data.frame': 50 obs. of 3 variables:
#> .. ..$ id : num [1:50] 1 2 3 4 5 6 7 8 9 10 ...
#> .. ..$ pred: num [1:50] 0.0538 0.0538 0.0537 0.0621 0.0685 ...
#> .. ..$ obs : num [1:50] 0 0 0 0 0 0 0 0 0 0 ...
#> $ metadata:List of 11
#> ..$ valid.transitions : num [1:5] 1 2 3 5 6
#> ..$ assessed.transitions: num [1:5] 1 2 3 5 6
#> ..$ curve.type : chr "loess"
#> ..$ CI : logi FALSE
#> ..$ CI.type : chr "parametric"
#> ..$ CI.R.boot : NULL
#> ..$ j : num 1
#> ..$ s : num 0
#> ..$ t : num 1826
#> ..$ group.vars : NULL
#> ..$ n.pctls : NULL
#> - attr(*, "class")= chr "calib_pv"