DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Locating a File to Load |
Using the CSV option of Rattle's Data tab we can directly load data from a exttt.csv file. Click the Filename button (Figure 4.2) to display a file chooser dialogue (Figure ).
The file chooser dialogue allows us to browse our file system to find the file we wish to load into Rattle. By default, only files that have a .csv extension will be listed (together with folders). The pull down menu near the bottom right of the file chooser dialogue (above the Open button) allows us to select alternative filters for the files listed. We can list files that end with a .csv or a .txt, or else we can list all files.
Browse to the exttt.csv file you wish to load, highlight it, and click the Open button.
|
|
We have told Rattle the location and the name of the file to load. We now need to actually load the data with a click on the Execute button (or press the F2 key). This loads the contents of the file from the hard disk into the computer's memory, for processing by Rattle.
We have mentioned above that rattle supplies a number of sample CSV files and in particular provides the weather.csv data file. The file itself will have been installed when rattle was installed. We can ask R to tell us of its actual location using the system.file function which we type into the R Console:
> system.file("csv", "weather.csv", package = "rattle") |
[1] "/usr/local/lib/R/site-library/rattle/csv/weather.csv" |
The location reported will depend on your particular installation and operating system. Here the location is as on my own installation, which is a standard GNU/Linux system.
We can review the contents of the file using the file.show function. This will pop up a window displaying the contents of the file.
> fn <- system.file("csv", "weather.csv", package = "rattle") > file.show(fn) |
The file contents can be directly viewed outside of R and Rattle, with any simple text editor. If you aren't familiar with CSV files, it is instructional to do so. We will see that the top of the file will appear as:
Date,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine... 2007-11-01,8,24.3,0,3.4,6.3,NW,30,SW,NW,6,20,68... 2007-11-02,14,26.9,3.6,4.4,9.7,ENE,39,E,W,4,17,80... 2007-11-03,13.7,23.4,3.6,5.8,3.3,NW,85,N,NNE,6,6,82... 2007-11-04,13.3,15.5,39.8,7.2,9.1,NW,54,WNW,W,30,24,62... 2007-11-05,7.6,16.1,2.8,5.6,10.6,SSE,50,SSE,ESE,20,28,68... 2007-11-06,6.2,16.9,0,5.8,8.2,SE,44,SE,E,20,24,70... |
A CSV file is actually a normal text file that begins with a header row, listing the names of the variables, each separated by a comma. The remainder of the file after the header is expected to consist of rows of data that record the observations, again with fields separated by commas recording the values of the variables for each observation.
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.