![]() |
DATA MINING
Desktop Survival Guide by Graham Williams |
![]() |
|||
|
Consider the situation of customer churn. We note though that those who have not churned in fact have not yet churned! They may churn in the future. We don't know. In such a situation we have what is called censored data and so survival analysis is used.
Survival analysis is an ordinary regression with the response as the time variable and associated with each time is an event.
Survival analysis is analysis of the time to an event. Methods used for survival analysis take into account the fact that we only have partial information available to us. The partial information for customer 2, for example, is that we know they have been with us for 5 months, but we don't know whether they might be just about to churn or not.
Time to event modelling often uses Survival Analysis. Klein and
Moeschberger, 2003, Second Edition, Survival Analysis: Techniques for
Censored and Truncated Data, Springer. The examples below illustrate
steps from Applied Survival Analysis, by Hosmer and Lemeshow, 2008. Survival
analysis models the time to the occurrence of an event (e.g., time to
death, time to failure, time to lodgment, time to churn, etc.). It is
particularly useful when we have censored observations. The general
idea approach introduces a survival function and a hazard rate
function
. These describe the status of an entity's
survival during the period of observation. The survival function gives
the probability of surviving beyond a certain point
. The hazard
rate function gives the instantaneous risk of non-survival (i.e.,
death, churn, lodgment, failure) at time
given survival to time
.
Data usually looks like: start time, stop time, event status (1=event occurred, 0=event did not occur). Another format: time to event, status. This latter format is generally used here.
In R we first create a []Surv object using the Surv function from the survival.