DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Hierarchical Clustering |
Agglomerative clustering is used to build a hierarchical cluster. A complete hierarchical cluster is built on the click of the Execute button. You do not need to re-execute on changing the Number of Clusters. This simply needs to obtain the relevant information from the fully built hclust. But users will automatically go to re-execute after changing this (because this is how everything else in the interface works). An alternative is being considered to make it more obvious not to re-execute.
Once a cluster has been built, have a look at the dendrogram to visually get an idea of the ``natural'' number of clusters, and then set the number appropriately, then have a look at the stats and the plot.
Having a large number of entities to build the hierarchical cluster from is not sensible. Certainly 20,000 entities would not produce anything interpretable. Even perhaps 10,000 is beyond what is usefully interpretable. But there are techniques that help interpret. See http://www.ncbi.nlm.nih.gov/projects/geo/gds/analyze/analyze.cgi?datadir=UCorrelationUPGMA&ID=GDS3254&myType=0
The amap package includes standard hierarchical clustering with a choice of distances like Eulidean and Spearman, and a parallel implementation.
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.