DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
|
We generally start up Rattle from a running instance of R. Packaged versions of Rattle (including RStat) may provide an icon or button that hide the initiation of R and simply appear to display the Rattle application. Nonetheless, they all must do the following:
> library(rattle) > rattle() |
The user interface for Rattle follows a typical data mining process. The idea is to progress through the Tabs that form the primary mechanism for operating with Rattle. We work our way from the left most tab (the Data tab where we identify the source of data to be mined) to the right most tab (the Log tab where we can review all steps of our mining and save it to file as a script that can be rerun at a later time).
We introduce data mining in this book using the simple interface provided by Rattle for the common case of what we call the Two Class paradigm in Section . We limit ourselves to just two classes to ensure we develop a good understanding of the technology, but the ideas and algorithms generalise to the Multiple Class paradigm.
It is well reported that a data mining project involves a lot more time than just the time spent building models. The commonly recognised six phases are Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation, and Deployment.
The typical work flow process for data mining, in the context of
Rattle, can be summarised as:
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.