DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
For the Two Class paradigm of Rattle, the random forest model build builds a classification model. Each tree in the resulting ensemble model is then used to predict the class of an entity, with the proportion of trees predicting the positive class then being the probability of the entity being in the positive class.
Rattle provides access to just three parameters (Figure 7.1) for tuning the models built by the random forest model builder: the number of trees, sample size, and number of variables. As is generally the case with Rattle, the defaults are a very good starting point! The defaults are to build 500 trees, to not do any sampling of the training dataset, and to choose from the square root of the number of variables available. In Figure 7.1 we see that the number of variables has automatically been set to 3 for the audit_auto.csv dataset, which has 9 input variables.