DATA MINING
Desktop Survival Guide
by Graham Williams

Paradigms

There are many different uses to which data mining can be put. For identifying fraud or assessing the likelihood of a client to take up a particular product, we might think of the task as deciding on one of two outcomes. We might think of this as a two class problem. Or the task may be to decide what type of item from amongst a collection of items a client may have a propensity to purchase. This is then a multi class problem. Perhaps we wish to predict how much someone might overstate an insurance claim or understate their income for taxation purposes or overstate their income when seeking approval for a credit card. Here we are predicting a continuous outcome, and refer to this as regression.

In all of these situations, or paradigms, we might think of it as having a teaching who has supplied us examples of the outcomes--whether examples of fraudulent and non-fraudulent cases, or examples of different types clients, or examples of clients and their declared income and actual income. In such cases we refer to the task as supervised modelling.

Perhaps though we know little about the individual, specific targets, but instead have general information about our population or their purchasing patterns. We might think of what we need to do in this case as building a model without the help of a teacher--or unsupervised modelling.

Alternatively, our data may have some special characteristics, such as time series and text data. In these cases we have different tasks we wish to perform.

We refer to these different types of tasks in data mining as different paradigms, and Rattle provides different subsets of functionality for the different paradigms.

**Figure 2.3:** Paradigms as radio buttons to the right of the toolbar icons

The paradigms are listed at the right end of the toolbar, as seen in Figure 2.3, and are selectable as radio buttons. The paradigms provided by Rattle are:

Two Class for binary classification predictive modelling;
Multi Class for multi-way classification predictive modelling;
Regression for continuous variable predictive modelling;
Unsupervised for learning without a target or descriptive modelling;
Time Series for temporal data mining; and
Text Mining for mining of unstructured text data.

Selecting a paradigm will change the tabs that are available in the main body of Rattle. For example, the default paradigm is the Two Class paradigm, which displays a Model and Evaluate tab, as well as the other common tabs. The Model tab exposes a collection of techniques for building two class or binary models. The Evaluate tab provides a collection of tools for evaluating the performance of those models.

Selecting the Unsupervised paradigm, as in Figure 2.4, removes the Model and the Evaluate tabs, replacing them with a Cluster and an Associate tab, for cluster analysis and association analysis, respectively.

**Figure 2.4:** Changing paradigms changes some of the displayed tabs

Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by Togaware.