Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Lung

A second dataset is the Mayo Clinic Lung Cancer data, available from the survival package. We can make it available for Rattle using the data command or directly within Rattle using the Library option of the Data tab. The target (in Rattle's terms, though usually referred to as the time variable) is the variable time and the risk (again in Rattle's terms, though usually referred to as the event or status indicator) is the status:



> data(lung)
> names(lung)



 [1] "inst"      "time"      "status"    "age"      
 [5] "sex"       "ph.ecog"   "ph.karno"  "pat.karno"
 [9] "meal.cal"  "wt.loss"



> table(lung$status)



  1   2 
 63 165

A sipmle plot. with(lung, plot(age, time, pch=status, col=status+1)) legend("topleft", c("alive", "dead"), pch=1:2, col=2:3)

We create a survival object using the Surv function.



> l.Surv <- with(lung, Surv(time, status))
> head(lung[2:3])



  time status
1  306      2
2  455      2
3 1010      1
4  210      2
5  883      2
6 1022      1



> head(l.Surv)



[1]  306   455  1010+  210   883  1022+



> l.Surv[1:6,1:2]



     time status
[1,]  306      1
[2,]  455      1
[3,] 1010      0
[4,]  210      1
[5,]  883      1
[6,] 1022      0

Again, those observations of entities that are still alive are marked with a + when printed. Internally we can see the survival object is storing the information in a matrix.



Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010