![]() |
DATA MINING
Desktop Survival Guide by Graham Williams |
![]() |
|||
Rattle supports the building of support vector machine (SVM) models
using the kernlab package for R. This package
provides an extensive collection of kernel functions, and a variety of
tuning options. The trick with support vector machines is to use the
right combination of kernel function and kernel parameters--and this
can be quite tricky. Some experimentation with the audit
dataset, exploring different kernel functions and parameter settings,
identified that a polynomial kernel function with the class weights
set to c("0"=4, "1"=1)
resulted in the best risk chart. For a
50% caseload we are recovering 94% of the adjustments and 96% of
the revenue.
There are other general parameters that can be set for ksvm, including the cost of constraint violation (the trade-off between the training error and margin? could be from 1 to 1000, and is 1 by default).
For best results it is often a good idea to scale the numeric variables to have a mean of 0 and a standard deviation of 1.
For polynominal kernels you can choose the degree of the polynomial. The default is 1. For the audit data as you increase the degree the model gets much more accurate on the training data but the model generalises less well, as exhibited by its performance on the test dataset.
Another parameter often needing setting for a radial basis function
kernel is the sigma value. Rattle uses automatic sigma estimation
(sigest) for this kernel, to find the best sigma, and so the user need
not set a sigma value. If we wanted to experiment with various sigma
values we can copy the R code from the []Log tab and paste
it into the R console, add in the additional settings, and run the
model. Assuming the results are assigned into the variable
crs$ksvm
, as in the Log, we can the evaluate the perfromance of
this new model using the Evaluate tab.
Copyright © Graham.Williams@togaware.com Support further development through the purchase of the PDF version of the book.