Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Indicator Variables

Some model builders do not handle categoric variables. Neural networks and regression are two examples. A simple approach in this case is to turn the categoric variable into some numeric form. If the categoric variable is not an ordered categoric variable, then the usual approach is to turn the single variable into a collection of so called indicator variables. For each value of the categoric variable there will be a new indicator variable which will have the value 1 for any entity that has this categoric value, and 0 otherwise. The result is a collection of numeric variables.

Rattle's Transform tab provides an option to transform one or more categoric variables into a collection of indicator variables. Each is prefixed by INDI_ and the remainder is made up of the name of the categoric variable (e.g., XnullXXnullXRattle!VariablesR functions (R function)Rattle!VariablesR libraries (R library)Rattle!VariablesR option (R option)Rattle!VariablesR packages (R package)Rattle!VariablesDatasets (Dataset)XnullXRattle!VariablesRattle!VariablesGender) and the particular value (e.g., Female), to give INDI_Gender_Female. Figure 5.9 shows the result of turning the variable XnullXXnullXRattle!VariablesR functions (R function)Rattle!VariablesR libraries (R library)Rattle!VariablesR option (R option)Rattle!VariablesR packages (R package)Rattle!VariablesDatasets (Dataset)XnullXRattle!VariablesRattle!VariablesGender into two indicator variables.

There is not always a need to transform a categoric variable. Some model builders, like the regressions in Rattle, will do it for us automatically.

Figure 5.9: Turning Gender into an Indicator Variable.
Image rattle-audit-transform-remap-indicator-sex

Copyright © 2004-2008 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by Togaware.