Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


The Initial Interface

The user interface for Rattle follows a typical data mining process. The idea is to progress through the Tabs that form the primary mechanism for operating with Rattle. We work our way from the left most tab (the Data tab where we identify the source of data to be mined) to the right most tab (the Log tab where we can review all steps of our mining and save it to file as a script that can be rerun at a later time).

We introduce data mining in this book using the simple interface provided by Rattle for the common case of what we call the Two Class paradigm in Section 1.2. We limit ourselves to just two classes to ensure we develop a good understanding of the technology, but the ideas and algorithms generalise to the Multiple Class paradigm.

It is well reported that a data mining project involves a lot more time than just the time spent building models. The commonly recognised six phases are Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation, and Deployment.

The typical work flow process for data mining, in the context of Rattle, can be summarised as:

  1. Load a Dataset;
  2. Select variables and entites for exploring and mining;
  3. Explore the data to understand how it is distributed or spread;
  4. Transform the data to suit our data mining purposes;
  5. Build our Models;
  6. Evaluate the models;
  7. Review the Log of the data mining process.
Pictorially, we illustrate a typical work flow that is embodied in the Rattle interface in Figure 2.1.

Figure 2.1: Initial steps of the data mining process (Tony Nolan)

Figure 2.2: The data mining process

Rattle supports a number of paradigms for data mining. The collection of Paradigms, displayed as radio buttons to the right of the buttons on the toolbar, allow a multitude of Rattle functionality to be shared while supporting the variety of different types of tasks associated with the different paradigms. For example, selecting the Unsupervised paradigm will expose the Cluster and Associate tabs, suitable for descriptive data mining, whilst hiding the Model and Evaluation tabs which are most useful for predictive model building.

We will present more on paradigms in Section 2.3. Before we get there, we need to understand how to interact with Rattle.

Copyright © 2004-2008 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by Togaware.