Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Explore

A key task in any data mining project is exploratory data analysis (often abbreviated as EDA), which generally involves getting a basic understanding of a dataset. Statistics, the fundamental tool here, is essentially about uncertainty--to understand it and thereby to make allowance for it. It also provides a framework for understanding the discoveries made in data mining. Discoveries need to be statistically sound and statistically significant--uncertainty associated with modelling needs to be understood.

We explore the shape or distribution of our data before we begin mining. Through this exploration we begin to understand the ``lay of the land,'' just as a miner works to understand the terrain before blindly digging for gold. Through this exploration we may identify problems with the data, including missing values, noise and erroneous data, and skewed distributions. This will then drive our choice of tools for preparing and transforming our data and for mining it.

Rattle provides tools ranging from textual summaries to visually appealing graphical summaries, tools for identifying correlations between variables, and a link to the very sophisticated GGobi tool for visualising data. The Explore tab provides an opportunity to understand our data in various ways.



Subsections
Copyright © Graham.Williams@togaware.com
Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by Togaware.