When applied to people in a medical context, life data analysis is often referred to as survival
analysis. Here we present such an example.
This is a hypothetical case study of the influence of various patient characteristics on survival rates for
breast cancer. The survival analysis technique employed is Cox Regression. This technique is useful in situations
where we have censored observations--that is, where some of the patients do not die during the observation period.
(If all patients had died during the observation period, then we could have used another technique, such as linear
regression, to generate a predictive model of survival times.)
Data and Method
The observation period runs for 133.8 months. The modeling sample contains 746 patients, including 50 patients who
died during the observation period and 696 who survived beyond the end of the observation period.
Our dependent variable (or "status" variable) has two values: "survived" vs. "died." In this simple
example, we are testing only four predictors:
- Age, in years, at the start of the observation period
- Pathological tumor size, in centimeters
- Number of positive axillary lymph nodes
- Estrogen receptor status (positive vs. negative)
Here are the value ranges for the predictor variables:
- Age: 22 to 88
- Pathological tumor size: 0.10 to 7.00 centimeters
- Number of positive lymph nodes: zero to 35
- Estrogen receptor status: positive vs. negative
Results
First, for those who have a statistical background, the Cox Regression used a backward stepwise likelihood-ratio
variable selection method, based on maximum partial likelihood estimates (-2 log likelihood). Significance
criteria were set at 0.05 for inclusion in the model, and 0.10 for removal from the model.
Here is some of the actual computer printout from the final step of the stepwise regression analysis:

Since this is intended to be a non-technical discussion, we will not explain all the statistics in this
table. But some key things to note are:
- Estrogen status was removed as a predictor because it did not reach the 0.05 significance criterion for
inclusion, and it showed no appreciable correlation with the dependent variable. (The column labeled
"Sig" shows the statistical significance of included variables; the column labeled "R" shows the degree of
unique correlation with the dependent variable.)
- Number of positive axillary lymph nodes was the strongest predictor of survival rates over the course of
the observation period (R=.1443 / Sig=.0001)
- Pathological tumor size was the second-best predictor (R=.1259 / Sig.=.0007), and is nearly as strong a
predictor as number of positive axillary lymph nodes
- Age, although significant, is somewhat less influential than the other two predictors
(R= -.0893 / Sig.= .0094)
Note that both the number of positive axillary lymph nodes and the pathological tumor size are positively
correlated with the dependent variable, which means that they are directly associated with more rapid mortality. In
contrast, age is negatively correlated with the dependent variable, which means that younger age is predictive of
somewhat longer survival.
The following chart shows the cumulative survival function during the observation period:

Several things are immediately apparent from this chart:
- All patients survive through the tenth month of the observation period, at which time we begin to observe a
fairly constant mortality rate which runs through the fortieth month
- At the fortieth month, the mortality rate increases and continues at this fairly constant increased rate
through the forty-fifth month
- At the forty-fifth month, there is a five-month period without additional mortality, after which time the
mortality continues at a fairly constant rate until the end of the observaton period, by which time
approximately 11% of the original sample has died
Conclusions and Implications
The case study presented here is relatively simple, and is for illustrative purposes only. However, with the
addition of more candidate predictors (e.g., progesterone receptor status, histologic grade, etc.), an even more
powerful model could emerge.
By understanding the influence of patient characteristics on mortality rates over time, we are in a better
position to estimate survival times for individual patients, and to defend using different or more aggressive
therapeutic approaches for some patients.
Back to the Life Data Analysis page
The foregoing case study is an edited version of one originally furnished by SPSS, and is used with
their permission.
|