Main Content

Cox Proportional Hazards Model with Time-Dependent Covariates

This example shows how to convert survival data to counting process form and then construct a Cox proportional hazards model with time-dependent covariates.

Step 1. Compare standard layout and counting process form.

A Cox model with time-dependent covariates requires survival data to be in counting process form and not in standard layout. To see the difference between survival data in standard layout and in counting process form, load the following sample data.

load simplesurvivaldata

This sample data contains two tables: relapseS and relapseCP. These two tables represent the same simple survival data in standard layout and in counting process form, respectively.

Display the data in standard layout.

relapseS
relapseS=2×5 table
    ID    Time    Censoring    Age    StopTreatment
    __    ____    _________    ___    _____________

    1       5         0        20          NaN     
    2      20         1        30           12     

This data represents two patients whose treatment status changes over time. Patient 1 was not taking treatment for the interval from week 0 to 5 and relapsed at the end of the interval. Patient 2 was taking treatment for the interval from week 0 to 12, but not for the interval from week 12 to 20. Patient 2 did not relapse and left the study after week 20.

Now display the same data in counting process form.

relapseCP
relapseCP=3×6 table
    ID    tStart    tStop    Censoring    Age    TreatmentStatus
    __    ______    _____    _________    ___    _______________

    1        0        5          0        20            0       
    2        0       12          1        30            1       
    2       12       20          1        30            0       

In counting process form, each row represents the risk interval (tStart,tStop] instead of a single value of an event time. Censoring is 0 if the event is observed at the end of the risk interval, and 1 if it is not. TreatmentStatus corresponds to a time-dependent covariate, which represents the same information with StopTreatment in standard layout. Note that a Cox model assumes time-dependent covariates to be constant in each risk interval.

Step 2. Load sample data.

Next, load sample data to convert.

load survivaldatacp

This sample data contains a table labS, which is simulated survival data including repeated measurement for each patient in standard layout.

Display the simulated survival data in standard layout.

labS
labS=6×7 table
    ID    Time    Censoring    Sex    Lab_0    Lab_50    Lab_100
    __    ____    _________    ___    _____    ______    _______

    1      46         0         1      0.3       NaN       NaN  
    2     138         1         0      0.2      0.23      0.39  
    3      94         0         1     0.18      0.22       NaN  
    4      50         0         0     0.21       0.2       NaN  
    5     106         0         0     0.25      0.21      0.42  
    6      98         0         0     0.21      0.22       NaN  

In standard layout, each row of the table shows information for one patient.

  • ID indicates the ID of a patient. You do not include ID as an input of a Cox model. Include ID in a data set to confirm that the data set is correctly converted to counting process form.

  • Time represents time to event in days, which corresponds to a response variable.

  • Censoring has the censorship information for each patient, where 1 indicates censored data and 0 indicates that the exact time to event is observed at the end of the observation period.

  • Sex is a time-independent predictor where 1 indicates female, and 0 indicates male.

  • Lab_0, Lab_50, and Lab_100 represent three consecutive laboratory results measured at day 0, 50, and 100, which correspond to a time-dependent predictor.

Step 3. Convert survival data to counting process form.

To convert the survival data labS to counting process form, execute the code below. This code converts Time to a risk interval (tStart,tStop] and combines three vectors of the time-dependent predictor, Lab_0, Lab_50, and Lab_100, into one vector, Lab.

mTime = [0 50 100]; % Measurement time
threeLabs = [labS.Lab_0 labS.Lab_50 labS.Lab_100]; 
nLabMeasure = sum(sum(~isnan(threeLabs))); % Number of lab measurements
data = zeros(nLabMeasure,6); % One row for each observation
oID = 0; % Observation ID 
for i = 1 : size(labS,1)
    idx = find(mTime <= labS.Time(i));
    for j = 1 : length(idx)-1
        oID = oID + 1;
        data(oID,:) = [labS.ID(i) mTime(j:j+1) 1 labS.Sex(i) threeLabs(i,j)];
    end
    oID = oID + 1;
    data(oID,:) = [labS.ID(i) mTime(length(idx)) labS.Time(i) ... 
            labS.Censoring(i) labS.Sex(i) threeLabs(i,length(idx))];
end
labCP = table(data(:,1),data(:,2),data(:,3),data(:,4),data(:,5),data(:,6), ...
    'VariableNames', {'ID','tStart','tStop','Censoring','Sex','Lab'});

Display the survival data in counting process form.

labCP
labCP=13×6 table
    ID    tStart    tStop    Censoring    Sex    Lab 
    __    ______    _____    _________    ___    ____

    1        0        46         0         1      0.3
    2        0        50         1         0      0.2
    2       50       100         1         0     0.23
    2      100       138         1         0     0.39
    3        0        50         1         1     0.18
    3       50        94         0         1     0.22
    4        0        50         1         0     0.21
    4       50        50         0         0      0.2
    5        0        50         1         0     0.25
    5       50       100         1         0     0.21
    5      100       106         0         0     0.42
    6        0        50         1         0     0.21
    6       50        98         0         0     0.22

In counting process form, each row of table labCP shows information of one observation corresponding to one risk interval. Note that a Cox model assumes Lab to be constant in the risk interval (tStart,tStop]. The value in Censoring is 0 if an event is observed at the end of the risk interval, and 1 if an event is not observed.

For example, patient 3 has two laboratory measurements at day 0 and 50, so there are two rows of data for patient 3 in counting process form. A Cox model assumes the lab results 0.18 and 0.22 to be constant in the interval (0,50] and (50,94], respectively. Censoring is 1 in (0,50] and 0 in (50,94] because the exact event time of patient 3 is observed at day 94.

Step 4. Adjust zero-length risk interval.

Find a patient who has a zero-length risk interval.

idxInvalid = labCP.ID(find(labCP.tStart == labCP.tStop))
idxInvalid = 
4

Review the data for patient 4.

labCP(find(labCP.ID==idxInvalid),:)
ans=2×6 table
    ID    tStart    tStop    Censoring    Sex    Lab 
    __    ______    _____    _________    ___    ____

    4        0       50          1         0     0.21
    4       50       50          0         0      0.2

The time to event of patient 4 coincides with the measurement day 50. However, (50,50] is an invalid risk interval for a Cox model because the model does not accept a zero length interval. Adjust the risk interval to be valid. You can choose any value less than the time unit as an adjustment amount. The choice of an adjustment amount is arbitrary, and it does not change the result.

idxAdjust = find(labCP.ID==idxInvalid);
labCP.tStop(idxAdjust(1)) = labCP.tStop(idxAdjust(1))-0.5;
labCP.tStart(idxAdjust(2)) = labCP.tStart(idxAdjust(2))-0.5;
labCP(idxAdjust,:)
ans=2×6 table
    ID    tStart    tStop    Censoring    Sex    Lab 
    __    ______    _____    _________    ___    ____

    4         0     49.5         1         0     0.21
    4      49.5       50         0         0      0.2

Step 5. Construct a Cox proportional hazards model.

Fit a Cox proportional hazards model with the time-independent variable Sex and time-dependent variable Lab.

X = [labCP.Sex labCP.Lab];
T = [labCP.tStart labCP.tStop];
b = coxphfit(X,T,'Censoring',labCP.Censoring,'Baseline',0)
b = 2×1

    2.0054
   29.7530

For details on how to assess a Cox proportional hazards model, see Cox Proportional Hazards Model for Censored Data.

See Also

Related Examples

More About