180 likes | 530 Views
Richard Jensen Qiang Shen. Fuzzy-Rough Feature Significance for Fuzzy Decision Trees. Advanced Reasoning Group Department of Computer Science The University of Wales, Aberystwyth. Outline. Utility of decision tree induction Importance of attribute selection
E N D
Richard Jensen Qiang Shen Fuzzy-Rough Feature Significance for Fuzzy Decision Trees Advanced Reasoning Group Department of Computer Science The University of Wales, Aberystwyth
Outline • Utility of decision tree induction • Importance of attribute selection • Introduction of fuzzy-rough concepts • Evaluation of the fuzzy-rough metric • Results of F-ID3 vs FR-ID3 • Conclusions
Decision Trees • Popular classification algorithm in data mining and machine learning • Fuzzy decision trees (FDTs) follow similar principles to crisp decision trees • FDTs allow greater flexibility • Partitioning of the instance space; attributes are selected to derive partitions • Hence, attribute selection is an important factor in decision tree quality
Fuzzy Decision Trees • Object membership • Traditionally, node membership of {0,1} • Here, membership is any value in the range [0,1] • Calculated from conjunction of membership degrees along path to the node • Fuzzy tests • Carried out within nodes to determine the membership of feature values to fuzzy sets • Stopping criteria • Measure of feature significance
Decision Tree Algorithm Training set S and (optionally) depth of decision treel Start to form decision tree from the top level, Do loopuntil • the depth of the tree gets to lor • there is no node to expand a)Gauge significance of each attribute of S not already expanded in this branch b)Expand the attribute with the most significance c)Stop expansion of the leaf node of attribute if maximum significance obtained End do loop
Feature Significance • Previous FDT inducers use fuzzy entropy • Little research in the area of alternatives • Fuzzy-rough feature significance has been used previously in feature selection with much success • This can also be used to gauge feature importance within FDT construction • The fuzzy-rough measure extends concepts from crisp rough set theory
Crisp Rough Sets Upper Approximation [x]B is the set of all points which are indiscernible with point x in terms of feature subset B. Set X Lower Approximation Equivalence class [x]B
Crisp equivalence class Fuzzy equivalence class Fuzzy Equivalence Classes At the centre of Fuzzy-Rough Feature Selection • Incorporate vagueness • Handle real valued data • Cope with noisy data Image: Rough Fuzzy Hybridization: A New Trend in Decision Making, S. K. Pal and A. Skowron (eds), Springer-Verlag, Singapore, 1999
Fuzzy-Rough Significance • Deals with real-valued features via fuzzy sets • Fuzzy lower approximation: • Fuzzy positive region: • Evaluation function: • Feature importance is estimated with this
Evaluation • Is the γ’ metric a useful gauger of feature significance? • γ’ metric compared with leading feature rankers: • Information Gain, Gain Ratio, Chi2, Relief, OneR • Applied to test data: • 30 random feature values for 400 objects • 2 or 3 features used to determine classification • Task: locate those features that affect the decision
Evaluation… • Results for x*y*z2 > 0.125 • Results for (x + y)3 < 0.125 • FR, IG and GR perform best • FR metric locates the most important features
FDT Experiments • Fuzzy ID3 (F-ID3) compared with Fuzzy-Rough ID3 (FR-ID3) • Only difference between methods is the choice of feature significance measure • Datasets used taken from the machine learning repository • Data split into two equal halves: training and testing • Resulting trees converted to equivalent rulesets
Results • Real-valued data • Average ruleset size • 56.7 for F-ID3 • 88.6 for FR-ID3 • F-ID3 performs marginally better than FR-ID3
Results… • Crisp data • Average ruleset size • 30.2 for F-ID3 • 28.8 for FR-ID3 • FR-ID3 performs marginally better than F-ID3
Conclusion • Decision trees are a popular means of classification • The selection of branching attributes is key to resulting tree quality • The use of a fuzzy-rough metric for this purpose looks promising • Future work • Further experimental evaluation • Fuzzy-rough feature reduction pre-processor