Data Mining using Decision Trees

Data Mining using Decision Trees Professor J. F. Baldwin

Decision Trees from Data Base Ex Att Att Att Concept Num Size Colour Shape Satisfied 1 med blue brick yes 2 small red wedge no 3 small red sphere yes 4 large red wedge no 5 large green pillar yes 6 large red pillar no 7 large green sphere yes Choose target : Concept satisfied Use all attributes except Ex Num

CLS - Concept LearningSystem - Hunt et al. Tree Structure Node with mixture of +ve and -ve examples Parent node Attribute V v1 v2 v3 Children nodes

CLS ALGORITHM 1. Initialise the tree T by setting it to consist of onenode containing all the examples, both +ve and -ve,in the training set 2. If all the examples in T are +ve, create a YES node and HALT 3. If all the examples in T are -ve, create a NO node and HALT 4. Otherwise, select an attribute F with values v1, ..., vn Partition T into subsets T1, ..., Tn according to the values on F. Create branches with F as parent and T1, ..., Tn as child nodes. 5. Apply the procedure recursively to each child node

Data Base Example Using attribute SIZE {1, 2, 3, 4, 5, 6, 7} SIZE med large small {1} {2, 3} {4, 5, 6, 7} Expand Expand YES

Expanding {1, 2, 3, 4, 5, 6, 7} SIZE large med small {1} {2, 3} COLOUR {4, 5, 6, 7} SHAPE wedge sphere YES {2, 3} SHAPE pillar wedge {7} {4} {5, 6} COLOUR sphere {3} {2} red green Yes No {6} No {5} Yes no yes

Rules from Tree IF (SIZE = large AND ((SHAPE = wedge) OR (SHAPE = pillar AND COLOUR = red) ))) OR (SIZE = small AND SHAPE = wedge) THEN NO IF (SIZE = large AND ((SHAPE = pillar) AND COLOUR = green) OR SHAPE = sphere) ) OR (SIZE = small AND SHAPE = sphere) OR (SIZE = medium) THEN YES

Disjunctive Normal Form - DNF IF (SIZE = medium) OR (SIZE = small AND SHAPE = sphere) OR (SIZE = large AND SHAPE = sphere) OR (SIZE = large AND SHAPE = pillar AND COLOUR = green THEN CONCEPT = satisfied ELSE CIONCEPT = not satisfied

ID3 - Quinlan Attributes are chosen in any order for the CLS algorithm. This can result in large decision trees if the ordering is not optimal. Optimal ordering would result in smallest decision Tree. No method is known to determine optimal ordering. We use a heuristic to provide efficient ordering which will result in near optimal ordering ID3 = CLS + efficient ordering of attributes Entropy is used to order the attributes.

Entropy For random variable V which can take values {v1, v2, …, vn} with Pr(vi) = pi, all i, the entropy of V is given by Entropy for a fair dice = = 1.7917 Entropy for fair dice with even score = = 1.0986 Differences between entropies Information gain = 1.7917 - 1.0986 = 0.6931

Attribute Expansion Expand attribute Ai - other attributes Ai T Pr Equally likely unless specified Pr(A1, …Ai, …An, T) Attributes Except Ai aim ai1 T Pr T Pr(A1, …Ai-1, Ai+1, …An, T | Ai = ai1) Pass probabilities corresponding to ai1 from above and re-normalise -equally likely again if previous equally likely

Expected Entropy for an Attribute Attribute Ai and target T - Ai T Pr Pass probabilities corresponding to tk from above for ai1and re-normalise aim ai1 Pr T T Pr Pr(T | Ai=aim) S(ai2) S(aim) S(ai1) Expected Entropy for Ai =

How to choose attribute and Information gain Determine expected entropy for each attribute i.e. S(Ai), all i Choose s such that Expand attribute As By choosing attribute As the information gain is S - S(As) where where Minimising expected entropy is equivalent to maximising Information gain

Previous Example Ex Att Att Att Concept Num Size Colour Shape Satisfied 1 med blue brick yes 1/7 2 small red wedge no 1/7 3 small red sphere yes 1/7 4 large red wedge no 1/7 5 large green pillar yes 1/7 6 large red pillar no 1/7 7 large green sphere yes 1/7 Pr Concept satisfied Pr S = (4/7)Log(4/7) + (3/7)Log(3/7) = 0.99 yes no 4/7 3/7

Entropy for attribute Size Att Concept Size Satisfied med yes 1/7 small no 1/7 small yes 1/7 large no 2/7 large yes 2/7 Pr S(Size) = (2/7)1 + (1/7)0 + (4/7)1 = 6/7 = 0.86 Information Gain for Size = 0.99 - 0.86 = 0.13 Pr(large) = 4/7 Pr(small) = 2/7 large small Concept Satisfied no 1/2 yes 1/2 Pr Concept Satisfied no 1/2 yes 1/2 Pr Pr(med) = 1/7 med Concept Satisfied yes 1 S(large) = 1 Pr S(small) = 1 S(med) = 0

First Expansion Attribute Information Gain SIZE 0.13 COLOUR 0.52 SHAPE 0.7 max choose {1, 2, 3, 4, 5, 6, 7} SHAPE sphere wedge pillar brick {5, 6} {2, 4} {1} {3, 7} Expand YES NO YES

Complete Decision Tree {1, 2, 3, 4, 5, 6, 7} Rule: IF Shape is wedge OR Shape is brick OR Shape is pillar AND Colour is red OR Shape is sphere THEN NO ELSE YES SHAPE sphere wedge pillar brick {5, 6} {2, 4} {1} {3, 7} COLOUR YES NO YES green red {5} {6) YES NO

A new case Att Att Att Concept Size Colour Shape Satisfied med red pillar ? SHAPE pillar COLOUR red ? = NO

Post Pruning Any Node S N examples in node Let C be class with most examples i.e majority E(S) n cases of C C is one of {YES, NO} Suppose we terminate this node and make it a leaf with classification C. What will be the expected error, E(S), if we use the tree for new cases and we reach this node. E(S) = Pr(class of new case is a class ≠ C)

Bayes Updating for Post Pruning Let p denote probability of class C for new case arriving at S We do not know p. Let f(p) be a prior probability distribution for p on [0, 1]. We can update this prior using Bayes’ updating with the information at node S. The information at node S is n C in S Pr(n C in S | p) f(p) f(p | n in S) = 1  Pr(n C in S | p) f(p)dp 0

Mathematics of Post Pruning Assume f(p) to be uniform over [0, 1] The evaluation of the integral n N – n p (1-p) 1 f(p | n C in S) = a b  dx = x (1-x) 1 n N – n 0  p (1-p) dp n! (N – n + 1)! 0 (N + 2)! using Beta Functions E(S) = E (1 – p) f(p | n C in S) n N – n + 1  N – n + 1 dp p (1-p) using Beta Functions. E(S) = = N + 2 1 n N – n  p (1-p) dp 0

Post Pruning for Binary Case For leaf nodes Si Error(Si) = E(Si) Error(S) = MIN S { } E(S) BackUpError(S) Num of examples in Si Pm Pi = Num of examples in S P2 P1 E(S) BackUpError(S) S1 S2 Sm Error(Sm) Error(S2) Error(S1) For any node S which is not a leaf node we can calculate BackUpError(S) = Pi Error(Si) Decision: Prune at S if BackUpError(S) ≥ Error(S)  i

Example of Post Pruning [x, y] means x YES cases and y NO cases Before Pruning a 0.417 0.378 [6, 4] We underline Error(Sk) c 0.5 0.383 b 0.375 0.413 PRUNE [2, 2] [4, 2] [1, 0] 0.333 [3, 2] 0.429 [1, 0] 0.333 d 0.4 0.444 PRUNE [1, 2] PRUNE means cut the sub- tree below this point [1, 1] 0.5 [0, 1] 0.333

Result of Pruning After Pruning a [6, 4] c [4, 2] [2, 2] [1, 0] [1, 2]

Generalisation For the case in which we have k classes the generalisation for E(S) is N – n + k – 1 E(S) = N + k Otherwise, pruning method is the same.

Testing DataBase Learn rules using Training Set and Prune Test rules on this set and record % correct Test rules on Test Set record % correct Training Set Test Set % accuracy on test set should be close to that of training set. This indicates good generalisation Over-fitting can occur if noisy data is used or too specific attributes are used. Pruning will overcome noise to some extent but not completely. Too specific attributes must be dropped.

Data Mining using Decision Trees

Data Mining using Decision Trees

Presentation Transcript

Data Mining in Artificial Intelligence: Decision Trees

Data Mining, Decision Trees and Earthquake Prediction

Data Mining and Decision Trees

Mining Date: Using Data for Decision-making

Data Mining With Decision Trees

Data Mining using Decision Trees

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

MIS2502: Data Analytics Classification using Decision Trees

Decision Support: Data Mining

Mining Decision Trees from Data Streams

Classification using Decision Trees

Data Mining and Machine Learning Decision Trees and ID3

Data Mining, Decision Trees and Earthquake Prediction

Data Mining – Algorithms: Decision Trees - ID3

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Decision trees for stream data mining – new results

Data Mining – Algorithms: Decision Trees - ID3

Applied Data Mining Basic Decision Trees in R

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation