DATA MINING
Desktop Survival Guide
by Graham Williams

Agile Data Mining

It is a curious fact that building models, in the context of the framework we have just presented, is but one task of the data miner, albeit perhaps the most important task. Almost as important, though, are all the other tasks associated with data mining. We must ensure our data mining activities are tackling the right business problem. We must understand the data that is available and turn noisy data into data from which we can build robust models. We must evaluate and demonstrate the performance of our models. And we must ensure the effective deployment of our models.

Whilst we can easily describe these steps, it is important to be aware that data mining is really what we might call an agile activity. The concept of agility comes from the Agile Software Engineering principles which include the evolution or incremental development of the business requirements, the requirement for regular client input or feedback, the testing of our models as they are being developed, and frequent rebuilding of the models to improve their performance. An allied aspect is the concept of peer programming where two data miners work together on the same data, in a friendly, competitive and collaborative approach to building models. The agile approach also emphasises the importance of face-to-face communication over all the effort that is otherwise expended, and often wasted, on written documents. This is not to remove the need to write documents, but to identify what is really required to be documented.

This book provides a practical guide to data mining, showing practitioners how to deliver successful data mining projects. It does this by stepping through the stages of an idealised data mining project. We say ``idealised'' because every project is different, offering different challenges, and often requiring different approaches to the model building. Nonetheless, we build from the commonality presented here, to form a solid foundation for successful data mining.

We identify the steps in a data mining project and note that the following chapters then walk us through these steps, one step at a time!

As well as the chapters in this book following this step-by-step process of a data mining project, the open source and freely available tool, Rattle, that is used here to illustrate data mining, is very much based around these same steps. Using a tab based interface, each tab represents one of the steps, and we proceed through the tabs as we work our way through a data mining project. One noticeable exception to this is the first step of business understanding. That is something that needs study, discussion, thought, and brain power, and practical tools to help in this process are not common.

Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010