Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Text Mining

See http://crimeanalytics.blogspot.com/

A common procedure for text mining is to `score' each document by a vector that records the frequency of occurrence of commonly used and subject matter specific words and phrases. Assuming the documents are themselves classified into a number of classes already (perhaps those that are relevant versus those that are not) you can use this ``training set'' with any of the many supervised learning or classification tools in R (e.g., trees, logistic regression, boosting, Random Forests, support vector machines, linear discriminant analysis, etc.).



Subsections

Copyright © 2004-2010 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010