DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Rattle supports obtaining a dataset from any database accessible through ODBC (Open Database Connectivity) with the ODBC Option (Figure 3.6).
The key to using ODBC is to know (or to set up) the data source name (DSN) for your databases. The setting up of DSNs is outside the scope of Rattle, being a configuration tosk through your operating system. Under GNU/Linux, for example, using the unixodbc package, the system DSNs are often defined in the file /etc/odbcinst.ini and in /etc/odbc.ini. Under MS/Windows the control panel provides access to a DSN tool.
Within Rattle we specify a known DSN by typing the name into the text entry. Once that is done, we press the Enter key and Rattle will attempt to connect. This may require a username and password to be supplied. For a Teradata Warehouse connection you will be presented with a dialog box like the one on the right (Figure 3.7. For a Netezza ODBC connection we will get a window like that in Figure 3.8.
If the connection is successful we will find a list of available tables in the Table combobox.
We can choose a Table, and also include a limit on the number of rows that we wish to load into Rattle. This allows us to get a smaller sample of the data for testing purposes before loading up a large dataset. If the Row Limit is set to 0 then all of the rows from the table are retrieved. Unfortunately there is now SQL standard for limiting the number of rows returned from a query. For the Teradata and Netezza warehouses the SQL keyword is LIMIT and this is what is used by Rattle.
The Believe Num Rows option is an oddity required for some ODBC drivers and appears to be associated with the pre-fetch behaviour of these drivers. The default is to activate the check box (i.e., Believe Num Rows is True). However, if you find that you are not retrieveing all rows from the source table, the the ODBC driver may be using a pre-fetch mechanism that does not ``correctly'' report the number of rows (it is probably only reporting the number of rows limited to the size of the pre-fetch). In these cases deactivate the Believe Num Rows check box. See Section 15.6 for more details. Another solution is to either disable the pre-fetch option of the driver, or to increase its count. For example, in connecting through the Netezza ODBC driver the configuration window is available, where you can change the default Prefetch Count value.
Copyright © Graham.Williams@togaware.com Support further development through the purchase of the PDF version of the book.