Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Saving Data

All R objects can be saved using the save function and then restored at a later time using the load function. The data will be saved into a .RData file. To illustrate this we make use of a standard dataset called iris.

We create a random sample of 20 entities from the dataset. This is done by randomly sampling 20 numbers between 1 and the number of rows (nrow) in the iris dataset, using the sample function. The list of numbers generated by sample is then used to index the iris dataset, to select the sample of rows, by supplying this list of rows as the first argument in the square brackets. The second argument in the square brackets is left blank, indicating that all columns are required in our new dataset. We then save the dataset to file using the save function which compresses the data for storage:

> rows <- sample(nrow(iris), 20)
> myiris <- iris[rows,]
> dim(myiris)
[1] 20  5
> save(myiris, file="myiris.RData", compress=TRUE)

At a later date you can load your dataset back into R with the load function:

> load("myiris.RData")
> dim(myiris)
[1] 20  5

Using the Roption[]compress option will reduce disk space required to store the dataset.

You can save any objects in an R binary file. For example, suppose you have built a model and want to save it for later exploration:

> library(rpart)
> iris.rp <- rpart(Species ~ ., data=iris)
> save(iris.rp, file="irisrp.RData", compress=TRUE)

At a later stage, perhaps on a fresh start of R, you can load the model:

> load("irisrp.RData")
> iris.rp
n= 150 

node), split, n, loss, yval, (yprob)
      * denotes terminal node

1) root 150 100 setosa (0.33333333 0.33333333 0.33333333)  
  2) Petal.Length< 2.45 50   0 setosa (1.00000000 0.00000000 0.00000000) *
  3) Petal.Length>=2.45 100  50 versicolor (0.00000000 0.50000000 0.50000000)  
    6) Petal.Width< 1.75 54   5 versicolor (0.00000000 0.90740741 0.09259259) *
    7) Petal.Width>=1.75 46   1 virginica (0.00000000 0.02173913 0.97826087) *

To identify what is saved into an RData file you can attach the file and then get a listing of its contents:

attach("irisrp.RData")
ls(2)
...
detach(2)



Subsections
Copyright © 2004-2010 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010