DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
|
There are many different uses to which data mining can be put. For identifying fraud or assessing the likelihood of a client to take up a particular product, we might think of the task as deciding on one of two outcomes. We might think of this as a two class problem. Or the task may be to decide what type of item from amongst a collection of items a client may have a propensity to purchase. This is then a multi class problem. Perhaps we wish to predict how much someone might overstate an insurance claim or understate their income for taxation purposes or overstate their income when seeking approval for a credit card. Here we are predicting a continuous outcome, and refer to this as regression.
In all of these situations, or paradigms, we might think of it as having a teaching who has supplied us examples of the outcomes--whether examples of fraudulent and non-fraudulent cases, or examples of different types clients, or examples of clients and their declared income and actual income. In such cases we refer to the task as supervised modelling.
Perhaps though we know little about the individual, specific targets, but instead have general information about our population or their purchasing patterns. We might think of what we need to do in this case as building a model without the help of a teacher--or unsupervised modelling.
Alternatively, our data may have some special characteristics, such as time series and text data. In these cases we have different tasks we wish to perform.
We refer to these different types of tasks in data mining as different paradigms, and Rattle provides different subsets of functionality for the different paradigms.
The paradigms are listed at the right end of the toolbar, as seen in Figure 2.3, and are selectable as radio buttons. The paradigms provided by Rattle are:
Selecting the Unsupervised paradigm, as in Figure 2.4, removes the Model and the Evaluate tabs, replacing them with a Cluster and an Associate tab, for cluster analysis and association analysis, respectively.
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.