Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Loading Data

The Data tab is the starting point for Rattle, and is where we load a specific dataset into Rattle.

Rattle is able to load data from various sources. Support is directly included for comma separated data files (.csv files as might be exported by a spreadsheet which use commas to separate variable values in a record--see Section 3.3), tab separated files (.txt, which are also commonly exported from spreadsheets and use the tab character to separate columns, rather than commas), a common data mining dataset format used by Weka (.arff files--see Section 3.4), and from an ODBC connection (thus allowing connection to an enormous collection of data sources including MS/Excel, MS/Access, SQL Server, Oracle, IBM DB2, Teradata, MySQL, Postgress, and SQLite--see Section 3.5).

When loading data into Rattle certain special strings are used to identify variable roles. For example, if the variable names starts with ID then the variable is marked as having an ID role See Section 3.10 for details.

Underneath Rattle, R is very flexible in where it obtains its data from, and data from almost any source can be loaded. Consequently, Rattle is able to access this same variety of sources. It does, however, require the loading of the data into the R console and then within Rattle loading it as an R Dataset. All kinds of additional data sources can be loaded directly into R--including loading data directly from SAS, SPSS, Minitab, Oracle, MySQL, and SQLite, as well as data formats including NCSA's HDF5 (Hierarchical Data Format3.1) and UCAR's NETCDF (Network Common Data Form3.2).

Once a dataset has been identified the name of the dataset will be displayed in the title of the Rattle window, as in Figure 3.1.

Figure 3.1: Rattle title bar showing the file name
Image rattle-startup-audit

The remainder of this Chapter covers the loading of data sources directly supported by Rattle.

Copyright © 2004-2008 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by Togaware.