Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Stratified Benford Plots

Image rattle-audit-benford-stratify-marital
We often want to stratify our data (that is, split it up into subgroups in some way). For example, in fraud investigations we might split our data up into groups associated with different geographic regions, or different auditors, etc. Suppose we are considering accounts payable data where each record is a payment and there are, say, ten individuals who sign off on the invoices. We can choose in the Select tab the variable that identifies the individuals who are signing off as the Target variable.

The plot here illustrates the idea using the audit dataset. Here, we have chosen XnullXRattle!VariablesR functions (R function)Rattle!VariablesR libraries (R library)Rattle!VariablesR option (R option)Rattle!VariablesR packages (R package)Rattle!VariablesDatasets (Dataset)Rattle!VariablesRattle!VariablesMarital to have the role as a Target variable (doing this in the Select tab). Then we have asked for a Benford plot of the XnullXRattle!VariablesR functions (R function)Rattle!VariablesR libraries (R library)Rattle!VariablesR option (R option)Rattle!VariablesR packages (R package)Rattle!VariablesDatasets (Dataset)Rattle!VariablesRattle!VariablesIncome variable, and we can see that the plot is stratified over the possible values for the XnullXRattle!VariablesR functions (R function)Rattle!VariablesR libraries (R library)Rattle!VariablesR option (R option)Rattle!VariablesR packages (R package)Rattle!VariablesDatasets (Dataset)Rattle!VariablesRattle!VariablesMarital variable.

Figure 4.2: Benford stratified by Marital and Gender.
Image rattle-audit-benford-stratify-marital-gender
To stratify on more than two categoric variables requires a little extra work. Rattle does not allow selecting more than a single target! However, under the Transform tab, under the Remap option (See Section 5.3.3), you can "join" two categoric variables into one and then set this combined categoric as your target variable.

This could be useful when, using the accounts payable example again, we have a person signing off the invoices and another person issuing the invoices, and we wish to explore whether there are any patterns through the combination of these two. That is, the person signing off invoices might only be manipulating those invoices issued by a specific individual. Thus, re-mapping these two categoric variables into a single combined categoric variable will allow us to explore this relationship.

Copyright © 2004-2008 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by Togaware.