DATA MINING
Desktop Survival Guide
by Graham Williams

The Initial Interface

We generally start up Rattle from a running instance of R. Packaged versions of Rattle (including RStat) may provide an icon or button that hide the initiation of R and simply appear to display the Rattle application. Nonetheless, they all must do the following:

> library(rattle) > rattle()

The user interface for Rattle follows a typical data mining process. The idea is to progress through the Tabs that form the primary mechanism for operating with Rattle. We work our way from the left most tab (the Data tab where we identify the source of data to be mined) to the right most tab (the Log tab where we can review all steps of our mining and save it to file as a script that can be rerun at a later time).

We introduce data mining in this book using the simple interface provided by Rattle for the common case of what we call the Two Class paradigm in Section . We limit ourselves to just two classes to ensure we develop a good understanding of the technology, but the ideas and algorithms generalise to the Multiple Class paradigm.

It is well reported that a data mining project involves a lot more time than just the time spent building models. The commonly recognised six phases are Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation, and Deployment.

The typical work flow process for data mining, in the context of Rattle, can be summarised as:

Load a Dataset;
Select variables and entites for exploring and mining;
Explore the data to understand how it is distributed or spread;
Transform the data to suit our data mining purposes;
Build our Models;
Evaluate the models;
Review the Log of the data mining process.

Pictorially, we illustrate a typical work flow that is embodied in the Rattle interface in Figure 3.1.

**Figure 3.1:** Initial steps of the data mining process (Tony Nolan)

**Figure 3.2:** The data mining process

Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 700 pages).
Brought to you by Togaware. This page generated: Sunday, 13 September 2009