|   | DATA MINING Desktop Survival Guide by Graham Williams |   | |||
| Histogram | 
 
 which means
 which means  , or 10,000), and so on. Thus,
we can see that the most frequent range of values is in the
, or 10,000), and so on. Thus,
we can see that the most frequent range of values is in the  partition. However, each partition spans quite a large range (a range
of $5,000).
partition. However, each partition spans quite a large range (a range
of $5,000).
The plot also includes a line plot showing the so called
density estimate
and is a more accurate display of the actual (at least estimated true)
distribution of the data (the values of Income). It allows
us to see that rather than values in the range  occurring
frequently, in fact there is a much smaller range (perhaps
 occurring
frequently, in fact there is a much smaller range (perhaps
 ) that occurs very frequently.
) that occurs very frequently.
The third element of the plot is the so called rug along the bottom of the plot. The rug is a single dimension plot of the data along the number line. It is useful in seeing exactly where data points actually lay. For large collections of data with a relatively even spread of values the rug ends up being quite black, as is the case here, up to about $25,000. Above about $35,000 we can see that there is only a splattering of entities with such values. In fact, from the Summary option, using the Describe check box, we can see that the highest values are actually $36,1092.60, $38,0018.10, $39,1436.70, $40,4420.70, and $42,1362.70.
 
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.