DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Number of Clusters |
Choosing the number of clusters is often quite a tricky exercise. Sometimes it is a matter of just try it and see. Other times you have some heuristics that help you to decide. Rattle provides a iterate approach. There is no definitive statistical answer to this issues
In deciding on a size for a robust cluster we need to note that the larger the number of clusters relative to the size of the sample, then the smaller our clusters will be. Perhaps there is a cluster size below which we don;t want to go.
Different cluster algorithms (and even different random seeds) result in different clusters, and how much they differ is a measure of cluster stability.
One approach to identifying a good cluster number is to iterate through multiple clusters and observe the sum of the within sum of squares. Rattle supports this with the Iterate Clusters option (see Figure 9.1), where a plot is also always generated (see Figure 9.2). A heuristic is to choose the number of clusters where we see the largest drop in the sum of the within sum of squares. In Figure 9.2 we might choose 12, 17 or perhaps even 26.
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.