DATA MINING
Desktop Survival Guide
by Graham Williams

Data mining tasks

These business problems require different techniques, different data and different data mining processes to solve. Disregard of the specifics of data mining techniques, in enterprise data mining a business problem to be solved can be defined as a data mining task. Common data mining tasks in enterprise applications include the following:

Data cleansing and preprocessing: Although they are not the final goal for data mining, data cleansing and preprocessing are important data mining tasks. Business data in enterprises contain errors, missing values, out-of-date data, inconsistent data and wrongly formatted data. Data cleansing task is to correct or remove the errors as much as possible. This is particularly important for customer demographic data because these data are collected by humans and errors in the data items of names, addresses, birth date, income, household, etc. frequently occur. Data preprocessing is to transform the source data into the formats that is acceptable to the data mining algorithms. The common tasks include changing data types and variable distribution, filling in missing values, aggregating data records such as transactions, and deriving new variables from existing variables.

Knowledge discovery: The goal for knowledge discovery is to extract explicit business knowledge from data using data mining techniques. One example is customer profiling. Using a rule induction algorithm such as decision trees, profiles of customers in different profitability groups can be extracted from a customer database. These profiles are represented as a set of IF-THEN rules which express explicit customer knowledge to the users. Explicit knowledge can also be represented as patterns or graphical distributions so humans can visually learn the knowledge. Visualization techniques are often used to extract explicit knowledge from data.

Clustering: Clustering is to partition a set of data objects such as a set of customer records into groups called clusters or segments in which objects in the same group are similar to each other according a defined similarity measure. After customers are divided into groups based on their similarity, they can be treated differently. For example, different products can be promoted to different customer groups and customers in different groups can be provided different services based on their lifetime value to the company.

Classification: The goal of classification in data mining is to build a model from a given set of training data records in which the classes of the objects are known and uses the model to assign classes to new records. For example, a risk classification model can be built from a dataset of previous credit card customers and applied to classify the risk levels of new customers. Classification is a frequently encountered data mining task in enterprise applications. Several data mining techniques are available for building classification models. In the training dataset, a target variable is needed to indicate the class each training record belongs to.

Prediction: Prediction is similar to classification except that the model is to predict a future event that may occur to an object, for instance, a customer. To achieve the prediction the target variable represents a future event. In mobile communication service applications, a prediction model is built to predict the customers who are likely to switch to a competitors in the next two months. Marketing actions can be taken to prevent these customers from leaving because retaining existing customers is more cost-effective than recruiting new customers. Same applications can be found in other sectors such as banking and insurance.

Affinity analysis: In supermarket, when a product is put on sale, the manager expects the product can lead other product sales and is interested in knowing what are the other products. In general terms supermarket managers want to know what products are usually sold together, i.e., customers usually put what products into their shopping baskets. This is called basket analysis. Affinity analysis is used for basket analysis. It is to discover associations among the products and present the associations in the form of association rules. For example, Chips Banana & Beer is an association rule telling that if customers buy banana and beer, they are likely to buy chips as well. Affinity analysis can also be used in other applications such as crossing selling and on-line recommendation in e-commerce.

Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.