|
DATA MINING
Desktop Survival Guide by Graham Williams |
|
|||
|
The user interface for Rattle follows a typical data mining process. The idea is to progress through the Tabs that form the primary mechanism for operating with Rattle. We work our way from the left most tab (the Data tab where we identify the source of data to be mined) to the right most tab (the Log tab where we can review all steps of our mining and save it to file as a script that can be rerun at a later time).
We introduce data mining in this book using the simple interface provided by Rattle for the common case of what we call the Two Class paradigm in Section 1.2. We limit ourselves to just two classes to ensure we develop a good understanding of the technology, but the ideas and algorithms generalise to the Multiple Class paradigm.
It is well reported that a data mining project involves a lot more time than just the time spent building models. The commonly recognised six phases are Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation, and Deployment.
The typical work flow process for data mining, in the context of
Rattle, can be summarised as:
Rattle supports a number of paradigms for data mining. The collection of Paradigms, displayed as radio buttons to the right of the buttons on the toolbar, allow a multitude of Rattle functionality to be shared while supporting the variety of different types of tasks associated with the different paradigms. For example, selecting the Unsupervised paradigm will expose the Cluster and Associate tabs, suitable for descriptive data mining, whilst hiding the Model and Evaluation tabs which are most useful for predictive model building.
We will present more on paradigms in Section 2.3. Before we get there, we need to understand how to interact with Rattle.
Copyright © 2004-2008 Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.