Desktop Survival Guide
by Graham Williams
For the keen data miner: Chapter 2 provides a quick start entré to data mining with Rattle: loading a dataset and building a model. Come back here when you are ready for some foundations.
We are living in a time where data is collected and stored in unprecedented volumes. Large and small enterprises collect data about their businesses, their customers, their human resources, their products, their manufacturing processes, their suppliers, their business partners, their local and international markets and their competitors. Turning this data into information and that information into knowledge has become a key component of the success of a business. Data contains valuable information that can support managers in their business decisions in effectively and efficiently running a business. Information is the basis for identifying new opportunities. Knowledge is the linchpin of society!
Data mining is about building models from data. We build models to gain insights into the world and how the world works, so we can predict how things will behave into the future. A data miner, in building models, deploys many different data analysis and model building techniques. Our choices depend on the business problems to be solved. Although data mining is not the only approach it is becoming very widely used because it is well suited to the data environments we find in today's enterprises. This is characterised by the volume of data available, commonly in the gigabytes and fast approaching the terabytes, and the complexity of that data, both in terms of the relationships that are awaiting discovery in the data and the data types available today, including text, image, audio, and video. Also, the business environments are rapidly changing, and analyses need to be regularly performed and models regularly updated to keep up with today's dynamic world.
Modelling is what people often think of when they think of data mining. Modelling is the process of turning data into some structured form or model that reflects that data in some useful way. Overall the aim is to address a specific problem through modelling the world in some way, and from that model to develop a better understanding of the world.
There is a bewildering array of tools and techniques at the disposal of the data miner for gaining insights into data and for building models.
In this chapter we introduce a modelling framework within which we can present the various algorithms that we use in data mining for building models of the world.