DATA MINING
Desktop Survival Guide by Graham Williams 


The tree structure is a traditional computer science data structure. Whilst we think of a tree, we often present tree upside down, with the root at the top and the leaves at the bottom. Starting from the root the tree splits from the single trunk into two or more branches. Then each branch itself might further split into two or more branches. This continues until we reach a leaf. We refer to each place where a branch splits into other branches as a node of the tree. The root and leaf are also referred to as nodes.
A decision tree has this traditional structure. It starts with a single root node which splits into multiple branches leading to further nodes, each which may further split, or terminate as a leaf node.
Associated with each nonleaf node is a test or question. The answer to the test or question tells us which branch to follow, when we are interpreting the tree to represent knowledge.
Consider the decision tree drawn on the previous page (which is the same tree from Figure 2.5 on page ). This represents knowledge about observing weather conditions one day and the observation of rain on the following day (the No and Yes values at the leaves of the decision tree).
The root node of the example decision tree tests the atmospheric pressure at 3pm (Pressure3pm). When this variable, for a observation, has a value greater than 1012 hPa then we will continue down the left side of the tree. The next test, down this left side of the tree, is on the amount of cloud cover observed at 3pm. If this is less than 8 oktas (i.e., anything but a fully overcast sky) then it is observed that on the following day it generally does not rain (No). If we observe that it is overcast today at 3pm (i.e., Cloud3pm is 8 oktas, which is the maximum value of this variablesee Section , page ) then generally we have in the past observed that it rains the following day (Yes). Thus we would be inclined to think that it might rain tomorrow if we observe these conditions today.
Resuming our interpretation of the model from the root node of the tree, if Pressure3pm is equal to or less than 1012 hPa, and Sunshine is greater than 9 (greater than 9 hours of sunshine in the day) then we do not expect to observe rain tomorrow. If we record 9 or less hours of sunshine (with the pressure at 3pm being equal to or less than 1012 hPa) then we will be expecting rain tomorrow.
The decision tree is a very convenient and efficient representation of the knowledge. Generally, models expressed in one form or language can be translated to other forms or languages. So it is with a decision tree, and one simple and useful translation to be aware of is the translation into a rule set. The above decision tree translates to the following rules. In effect, each rule corresponds to one pathway through the decision tree, starting at the root node and terminating at a leaf node.
Rule number: 7 [yval=Yes cover=27 (11%) prob=0.74] Pressure3pm< 1012 Sunshine< 8.85 Rule number: 5 [yval=Yes cover=9 (4%) prob=0.67] Pressure3pm>=1012 Cloud3pm>=7.5 Rule number: 6 [yval=No cover=25 (10%) prob=0.20] Pressure3pm< 1012 Sunshine>=8.85 Rule number: 4 [yval=No cover=195 (76%) prob=0.05] Pressure3pm>=1012 Cloud3pm< 7.5 
A rule representation has its advantages. In reviewing the knowledge captured we can consider each rule separately, rather than being distracted by the sometimes complex structure of a large decision tree. Each rule can also be simply translated into statement in a progmamming language, like R, Python, C, VisalBasic, and SQL. The structure is simple, and clear, as an IfThen statement.
Although it is not shown in the tree representation at the begining of the chapter, we can see in the rules above the probabilities that are typically recorded for each leaf node of the decision tree. The probabilities can be used to provide an indication of the strength of the decision we derive from the model. Thus, rule 7 suggests 74% of the time, when the observed pressure at 3pm is less than 1012 hPa and the hours of sunshine is less than 8.85 hours, there is rainfall recorded on the following day (yval=Yes). The other information provided with the rule is that 27 observations from the training dataset (i.e., 11% of the training dataset observations) ar covered by this rulethey satisfy the two conditions.
Variations of the basic decision tree structure that we have presented here for representing knowledge exist. Some approaches limit trees to two splits at any one node to generate a binary decision tree. For categoric data this might involve partitioning the values (levels) of the variable into two groups. Another approach is to have a branch corresponding to each of the levels of a categoric variable. From a representation point of view, what can be represented using a multiway tree can also be represented as a binary tree, and viceversa.
Other variations allow multiple variables to be tested at a node. Once again, the simpler tree structures can represent the same knowledge. Thus, in decision tree induction, we generally stay with the simpler representation, though the resulting model represented as a tree may appear a little more complex than if we used a more complex decision tree structure.
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.