Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Data Formats

Once we have a source of data we need to load the data into Rattle. In this section we review the various approaches to loading data into Rattle. Rattle has a wealth of mature tools to load data from various sources.

Support is directly included for comma separated data files (i.e., files with a .csv filename extension as might be exported from a spreadsheet where the resulting text file uses commas to separate variable values in an observation). Another format that is readily exported from spreadsheets is the tab separated file which often have a .txt filename extension. A common data mining dataset format used by the Weka data mining toolkit is ARFF with files having a .arff filename extension.

The Open Database Connectivity (ODBC) provides a standard method for accessing data in a variety of databases, and is fully supported by R. This allows direct connection to a vast collection of data sources including MS/Excel, MS/Access, SQL Server, Oracle, IBM DB2, Teradata, MySQL, Postgress, and SQLite.

We don't need to use the Rattle interface to load a dataset. We could simply use the underlying R commands to do the same. We can directly use functions like Rfunction[]read.csv, Rfunction[]read.delim, Rfunction[]read.arff, and Rfunction[]odbcConnect. In the following subsections we will illustrate loading data through the Rattle interface, and then review the underlying R commands.

Once a dataset source has been identified and the Data tab executed the data will be displayed in the textview. Figure 5.1 displays the Rattle window after loading the weather.csv which is supplied as a sample dataset with the Rattle package.

Figure 5.1: The file, weather.csv, from the Rpackage[]rattle package has been loaded into Rattle.
Image start:rattle_startup_weather

Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 700 pages).
Brought to you by Togaware. This page generated: Sunday, 13 September 2009