DATA MINING
Desktop Survival Guide
by Graham Williams

Two Class Models

This chapter focuses on the common data mining task of binary (or two class) classification. This is the task of distinguishing between two classes of entities - whether they be high risk and low risk insurance clients, productive and unproductive audits, responsive and non-responsive customers, successful and unsuccessful security breaches, and many other similar examples.

Rattle provides a straight-forward interface to the collection of model builders commonly used in data mining. For each, a basic collection of the commonly used tuning parameters is exposed through the interface for fine tuning the model performance. Where possible, Rattle attempts to present good default values to allow the user to simply build a model with no or little tuning. This may not always be the right approach, but is certainly a good place to start.

The model builders provided by Rattle are: Decision Trees, Boosted Decision Trees, Random Forests, Support Vector Machines, and Logistic Regression. Whilst a model is being built you will see the cursor image change to indicate the system is busy, and the status bar will report that a model is being built.

Subsections

Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.