DATA MINING
Desktop Survival Guide
by Graham Williams

Matricies

A dataset is usually more complex than a simple vector. Indeed, often we have several vectors making up the dataset, and refer to this as a matrix. A matrix is a data structure containing items all of the same data type. We construct a matrix with the matrix and c functions. Rows and columns of a matrix can have names, and the functions colnames and rownames will list the current names. However, you can also assign a new list of names to these functions!

 > ds <- matrix(c(52, 37, 59, 42, 36, 46, 38, 21, 18, 32, 10, 67), nrow=3, byrow=T) > colnames(ds) <- c("Low", "Medium", "High","VHigh") > rownames(ds) <- c("Married","Prev.Married","Single") > ds Low Medium High VHigh Married 52 37 59 42 Prev.Married 36 46 38 21 Single 18 32 10 67

Of course, manually creating datasets in this way is only useful for small data collections. A slightly easier approach is to manually modify and add to the dataset using a simple spreadsheet-like interface through the edit function or through the fix function which will also assign the results of the edit back to the variable being edited. Note that normally the edit function returns , and thus prints to the screen if it is not assigned, the datasets. To avoid the dataset being printed to the screen, when you do not assign edit to a variable because all you wanted to do was browse the dataset, use the invisible function.

 > ds <- edit(ds) > fix(ds) > invisible(edit(ds))

The cbind function combines each of its arguments, column-wise (the c in the name is for column), into a single data structure:

 > age <- c(35, 23, 56, 18) > gender <- c("m", "m", "f", "f") > people <- cbind(age, gender) > people age gender [1,] "35" "m" [2,] "23" "m" [3,] "56" "f" [4,] "18" "f"

Because the resulting matrix must have elements all of the same data type, we see that the variable age has been transformed into the character data type (since gender could not be so convincingly converted to numeric).

The rbind function similarly combines its argument, but in a row-wise manner. The result will be the same as if we transpose the matrix with the t function:

 > t(people) [,1] [,2] [,3] [,4] age "35" "23" "56" "18" gender "m" "m" "f" "f" > people <- rbind(age, gender) > people [,1] [,2] [,3] [,4] age "35" "23" "56" "18" gender "m" "m" "f" "f"

Subsections