Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Binning

A binning function is provided by Rattle, coded by Daniele Medri. The Rattle interface provides an option to choose between Quantile binning, KMeans binning, and Equal Width binning. For each option the default number of bins is 4, and we can change this to suit our needs. The generated variables are prefixed with either BIN_QUn_, BIN_KMn_, and BIN_EWn_ respectively, with n replaced with the number of bins. Thus, we can create multiple binnings for any variable.

An example of why we might want to do this is to visualise data. A mosaic plot, for exapmle, is only uesful for categoric data and so we could turn Sunshine into a categoric by binning. Also talk about binning to show box plot for different targets.

Note that quantile binning is the same as equal count binning.

Figure 23.7: Binning Age.
Image rattle-audit-transform-binning-age

Figure 23.8: Distributions of binned Age.

Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010