Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

R

R is a statistical and data mining package consisting of a programming language and a graphics system. It is used throughout this book to illustrate data mining procedures. It is the programming language used to implement the Rattle graphical user interface for data mining. If you are moving to R from SAS or SPSS then you will find () a great resource. An early version is available from http://RforSASandSPSSusers.com.

R is the most sophisticated statistical software available, easily installed, instructional, state-of-the-art, and it is free and open source.

Learning by example is a powerful learning paradigm. Motivated by the programming paradigm of ``programming by example'' (, ), the intention is that you will be able to replicate the examples from the book, and then fine tune them to suit your own needs. This is one of the underlying principles of Rattle where all of the R commands that are used under the graphical user interface are exposed to the user. This makes it a useful teaching tool in learning R for the specific task of data mining, and also a good memory aid!

So R is a language. The basic modus operandi is to write sentences expressed in this language. After a while you will want to do more than to issue single, simple, commands (sentences), but to write sentences and paragraphs and full novels in the language! R script files (often with the R filename extension) are the place to write scripts. You can re-run your scripts to transform, at will and automatically, your source data into information and knowledge. As we progress through this book we will become familiar with the common R commands.

Whilst for data mining purposes we will use the Rattle graphical user interface, more advanced users will prefer the powerful Emacs editor, augmented with the ESS package. Both run under GNU/Linux, Mac/OSX, and MS/Windows.

We also note that direct interaction with R has a steeper learning curve than using GUI based systems, but once into R, performing operations over the same or similar datasets becomes very easy using its programming language interface.



Subsections
Copyright © 2004-2010 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 07 February 2010