Here we specify how we might partition the dataset for exploratory and
modelling purposes. The default for Rattle is to build two subsets of
the dataset: one is a training set from which to build models, while
the other is used for testing the performance of the model. The
default for Rattle is to use a 70% training and a 30% testing split,
but you are welcome to turn sampling off, or choose other samplings. A
very small sampling may be required to perform some explorations of
the smaller dataset, or to build models using the more computationally
expensive algorithms (like support vector machines).