Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Manipulating Data

Whilst this book does not provide a systematic guide to using R itself, there are times when we find ourselves want to manipulate our data in ways not supported directly by Rattle. It is therefore useful to know some basic data manipulation operations in R. The alternative is to use tools you are familiar with, such as a spreadsheet or database.

In this section we will review some basic data manipulation operations, which may in fact be sufficient for our basic needs.

A dataset is generally stored as a data frame in R. A data frame is formally a list of vectors.

When we index a data frame with single brackets, as in weather[2] or weather[4:7] we are retrieving a ``subset'' of the list, and hence the resulting object is also a data frame (i.e., a list). Compare this to weather[[2]] which returns an element of the list, in this case a vector.

Have a look at the is function:



> is(weather)



[1] "data.frame" "list"       "oldClass"   "vector"



> class(weather)



[1] "data.frame"



> is.list(weather)



[1] TRUE



> is.vector(weather)



[1] FALSE



> is.list(weather[2])



[1] TRUE



> is.vector(weather[2])



[1] FALSE



> is.list(weather[[2]])



[1] FALSE



> is.vector(weather[[2]])



[1] FALSE



> is(weather[2])



[1] "data.frame" "list"       "oldClass"   "vector"



> is(weather[[2]])



[1] "factor"   "integer"  "oldClass" "numeric"  "vector"

Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010