Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Hierarchical Clustering

Agglomerative clustering is used to build a hierarchical cluster. A complete hierarchical cluster is built on the click of the Execute button. You do not need to re-execute on changing the Number of Clusters. This simply needs to obtain the relevant information from the fully built hclust. But users will automatically go to re-execute after changing this (because this is how everything else in the interface works). An alternative is being considered to make it more obvious not to re-execute.

Once a cluster has been built, have a look at the dendrogram to visually get an idea of the ``natural'' number of clusters, and then set the number appropriately, then have a look at the stats and the plot.

Having a large number of entities to build the hierarchical cluster from is not sensible. Certainly 20,000 entities would not produce anything interpretable. Even perhaps 10,000 is beyond what is usefully interpretable. But there are techniques that help interpret. See http://www.ncbi.nlm.nih.gov/projects/geo/gds/analyze/analyze.cgi?datadir=UCorrelationUPGMA&ID=GDS3254&myType=0

The amap package includes standard hierarchical clustering with a choice of distances like Eulidean and Spearman, and a parallel implementation.

Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Saturday, 16 January 2010