Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Loading Our First Data Set

Using Rattle we will now load a sample dataset in preparation for modelling.

The first task is to start up R as described above. We will be presented with an R console displaying the > prompt with a blinking cursor. The R console is awaiting our commands.

Rattle is available as an R package, just like over 2000 other packages that exist for R. If you have followed the instructions in Appendix [*] we start Rattle by loading the package into the R library, and then calling the Rfunction[]rattle function, with an empty argument list:



> library(rattle)
> rattle()

We will then see the Rattle GUI displayed, as in Figure 2.2.

Figure 2.2: The initial Rattle window displays a welcome message and a little introduction to Rattle and R.
Image rattle_startup

Now, simply click the Execute button in the toolbar. Rattle will notice that no CSV file has been specified (notice the ``(None)'' in the Filename: chooser) and will ask whether we wish to use one of the sample datasets supplied with the package. Click on Yes to do so. We will then see the data summarised, as shown in Figure 2.3.

Figure 2.3: The sample weather.csv file has been loaded into memory as a dataset ready for mining. The dataset consists of 366 observations and 23 variables. Notice some variables have roles other than the default Input role. Rattle uses heuristics to initialise the roles.
Image rattle_startup_weather

The dataset summary provides a list of the variables, their data types, default roles, and other useful information. The types will generally be Numeric or Categoric. We also see an Ident (identifier).

Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 700 pages).
Brought to you by Togaware. This page generated: Sunday, 13 September 2009