Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Sample Tab

Image rattle-audit-sample
Here we specify how we might partition the dataset for exploratory and modelling purposes. The default for Rattle is to build two subsets of the dataset: one is a training set from which to build models, while the other is used for testing the performance of the model. The default for Rattle is to use a 70% training and a 30% testing split, but you are welcome to turn sampling off, or choose other samplings. A very small sampling may be required to perform some explorations of the smaller dataset, or to build models using the more computationally expensive algorithms (like support vector machines).



Copyright © Graham.Williams.com
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.