Data Mining Survivor: Random_Forests

DATA MINING
Desktop Survival Guide
by Graham Williams

Tuning Options

For the Two Class paradigm of Rattle, the random forest model build builds a classification model. Each tree in the resulting ensemble model is then used to predict the class of an entity, with the proportion of trees predicting the positive class then being the probability of the entity being in the positive class.

Rattle provides access to just three parameters (Figure 7.1) for tuning the models built by the random forest model builder: the number of trees, sample size, and number of variables. As is generally the case with Rattle, the defaults are a very good starting point! The defaults are to build 500 trees, to not do any sampling of the training dataset, and to choose from the square root of the number of variables available. In Figure 7.1 we see that the number of variables has automatically been set to 3 for the audit_auto.csv dataset, which has 9 input variables.

Subsections

Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by Togaware.