DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
The Transform tab provides numerous options for transforming our datasets. Cleaning our data and creating new features from the data occupies much of our time as data miners. There is a myriad of approaches, and a programming language like R supports them all. Through the Rattle user interface we can perform some of the more common transformations. This includes normalising our data, filling in missing values, turning numeric variables into categorical variables, and vice versa, dealing with outliers, and removing variables or entities with missing values.
In this chapter we introduce the various transformations supported by Rattle. Transformations are not always appropriate and so we indicate where they might be applicable as well providing warnings about the different approaches, particularly in the context of imputation, which can significantly alter the distribution of our datasets.
In tuning our dataset to suit our needs, we do often transform it in many different ways. Of course, once we have transformed our dataset, we will want to save the new version. After working on our dataset through the Transform tab we can save the data through the Export button. We will be prompted for a CSV file into which the current transformation of the dataset will be saved. In fact, this is the same save operation as available through the Export button on the Data and Select tabs.