Data Mining Survivor: Rescale_Data - Peer Relativity Profiling Index

DATA MINING
Desktop Survival Guide
by Graham Williams

Peer Relativity Profiling Index

Peer relativity profiling index (Nolan Groups) work by finding the minimum value of a variable for the observations in a group and allocates it a value of 0.01 and the maximum value and allocates it a value of 99.9. All values in between are allocated a new value which is representational of the position and rank in the original variable. When used in conjunction with a categoric variable, the observations are segmented by the values of the categoric variable and all values within the group are then transformed using the minimum value and allocates it a value of 0.01 and the maximum value and allocates it a value of 99.9. All values in between are allocated a new value which is representational of the position and rank in the original variable.

Nolan Groups is a methodology that applies the theory of relativity to analytics, by transforming measurement variables between 0.01 and 99.9 to a specific subset of bins's which represent sub populations identified by their association with a value within a categorical variable. By using this transformation, observations are specifically assigned to a position within the sample, that represents their position relative to only other observations within the same bin.

Lets consider an example, from the author of Nolan Groups.^23.1 Consider measurement data for bike riders at an event. This might include the riders' height, age at the event, weight at the event, average speed, average terrain angle and time to complete the event. We will also have descriptive data (often as categoric variables) like the cycle club, sex, the type of bike, their home state, and years riding.

To normalise this data we take each individual and place them within there specific sub-population. A sub-population is defined in terms of a specific categoric variable, like sex or type of bike. Each rider is then ranked within their sub-population with respect to each of the numeric variables. This records their relative position, within their sub-population (or peer group). They are ranked between 0 and 100. This is mapping is performed for every measurement (numeric) variable by every descriptive (categoric) variable.

We then normalise over all of the remapped numeric variables so they are all scaled appropriately.

A clustering can then be performed over these normalised, remapped, variables to compare each bike rider, against each other bike rider, and to group those who are most similar.

Effectively, we have remapped a persons' individual performance with respect to their peer group, and then mapped that performance against all the other riders. If we have identified key riders, and find that they belong to particular clusters, more so than other clusters, then we can use a classification algorithm to identify others who are quite similar, and perhaps yet to be recognised as key riders. The classification algorithm will also identify the key characteristics that differentiate the key riders from the others.

The yet-to-be-discovered key riders might be much younger or less experienced. As a coach we may investigate the opportunity for an enhanced training programme. They may be the key riders of the future.

Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010