DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Export KMeans Clusters |
The export functionality is implemented for kmeans clusters to export the actual model as PMML. Within the PMML the centroids are recorded in the PMML model specification.
To save a CSV file that records the cluster to which each entity belongs go to the Evaluate tab and select Score.
In saving a cluster model as PMML the PMML specifications provide for quite a bit of generality. So, to use a kmeans cluster model to score a new data point, in general, we calculate the distance between the new data point and each centroid. Now, in general, the different variables may be very different types. Thus, for each variable we might use a different mechanism for calculating the distance. The default is to simply calculate the absolute difference (this is call absDiff in the resulting PMML. This is appropriate for numeric data. For categoric data we might use the Jaccard index. Then we calculate the sum of those distances by adding them in some appropriate form. For this we use the squared Euclidean as the distance comparison measure, as specified in the ComparisonMeasure element of the PMML.
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.