Desktop Survival Guide
by Graham Williams
Todo: This preface is still to be finalised. This draft is not for distribution. Comments are most welcome.
Knowledge leads to wisdom and better understanding. Data mining builds knowledge from information, adding value to the tremendous stores of data that abound today--stores that are ever increasing in size and availability. Emerging from the database community in the late 1980's the discipline of data mining grew quickly to encompass researchers and technologies from Machine Learning, High Performance Computing, Visualisation, and Statistics, recognising the growing opportunity to add value to data. Today, this multi-disciplinary and trans-disciplinary effort continues to deliver new techniques and tools for the analysis of very large collections of data. Searching through databases measuring in the gigabytes and terabytes, data mining delivers discoveries that improve the way an organisation does business. It can enable companies to remain competitive in this modern data rich, knowledge hungry, wisdom scarce world. Data mining delivers knowledge to drive the getting of wisdom.
The range of techniques and algorithms used in data mining may appear daunting and overwhelming. In performing data mining for a data rich client many decisions need to be made regarding the choice of methodology, the choice of data, the choice of tools, and the choice of application.
Data Mining with Rattle
In this book we introduce the basic concepts and algorithms of data mining, deploying the Free and Open Source Software package Rattle, built on top of the R system. As Free Software the source code of Rattle and R is available to anyone, and anyone is permitted, and indeed encouraged, to extend the software, and to read the source code to learn from it. Indeed, R is supported by a world wide network of some of the world's leading Statisticians. This book guides you through the various options that Rattle provides and serves as a user guide both to Rattle and to Data Mining. Some extensions into using R itself are presented, where this will help with the migration to using the full capacity of the R system.
We deploy the Free and Open Source Software statistical programming language R to illustrate the deployment of data mining technology. As Free Software the source code of R is available to anyone, and anyone is permitted, and indeed encouraged, to extend the software, and to read the source code to learn from it. Indeed, R is supported by a world wide network of some of the world's leading Statisticians. We introduce the R language and guide you through the various R packages that are essential and comprehensive for the Data Miner.