230 likes | 536 Views
Feature Grouping-Based Fuzzy-Rough Feature Selection. Richard Jensen Neil Mac Parthaláin Chris Cornelis. Outline. Motivation/Feature Selection (FS) Rough set theory Fuzzy-rough feature selection Feature grouping Experimentation. The problem: too much data.
E N D
Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis
Outline • Motivation/Feature Selection (FS) • Rough set theory • Fuzzy-rough feature selection • Feature grouping • Experimentation
The problem: too much data • The amount of data is growing exponentially • Staggering 4300% annual growth in global data • The complexity of the problem is vast • (e.g. the powerset of features for FS) • Therefore, there is a need for FS and other data reduction methods • Curse of dimensionality: a problem for machine learning techniques
Feature selection • Remove features that are: • Noisy • Irrelevant • Misleading • Task: find a subset that • Optimises a measure of subset goodness • Has small/minimal cardinality • In rough set theory, this is a search for reducts • Much research in this area
Rough set theory (RST) • For a subset of features P Upper approximation Set X Lower approximation Equivalence class [x]P
Rough set feature selection • By considering more features, concepts become easier to define…
Rough set theory • Problems: • Rough set methods (usually) require data discretization beforehand • Extensions require thresholds, e.g. tolerance rough sets • Also no flexibility in approximations • E.g. objects either belong fully to the lower (or upper) approximation, or not at all
Fuzzy-rough sets • Extends rough set theory • Use of fuzzy tolerance instead of crisp equivalence • Approximations are fuzzified • Collapses to traditional RST when data is crisp • New definitions: Fuzzy lower approximation: Fuzzy upper approximation:
Fuzzy-rough feature selection • Search for reducts • Minimal subsets of features that preserve the fuzzy lower approximations for all decision concepts • Traditional approach • Greedy hill-climbing algorithm used • Other search techniques have been applied (e.g. PSO) • Problems • Complexity is problematic for large data (e.g. over several thousand features) • No explicit handling of redundancy
Feature grouping • Idea: don’t need to consider all features • Those that are highly correlated with each other carry the same or similar information • Therefore, we can group these, and work on a group by group basis • This paper: based on greedy hill-climbing • Group-then-rank approach • Relevancy and redundancy handled by • Correlation: similar features grouped together • Internal ranking (correlation with decision feature) f4 f1 f7 f2 f9 f8 F1
Forming groups of features τ Threshold: Data Calculate correlations Correlation measure Redundancy . . . Feature groups F1 F2 F3 Fn #1 f3 #2 f12 #3 f1 … #m fn #1 f #2 f #3 f … #m fn #1 f #2 f #3 f … #m fn #1 f #2 f #3 f … #m fn Internally-ranked feature groups Relevancy
Selecting features . . . Search mechanism Feature subset search and selection Subset evaluation Selected subset(s)
Initial experimentation • Setup: • 10 datasets (9-2557 features) • 3 classifiers • Stratified 5 x 10-fold cross-validation • Performance evaluation in terms of • Subset size • Classification accuracy • Execution time • FRFG compared with • Traditional greedy hill-climber (GHC) • GA & PSO (200 generations, population size: 40)
Results: classification accuracy JRip IBk (k=3)
Conclusion FRFG • Motivation: reduce computational overhead; improve consideration of redundancy • Group-then-rank approach • Parameter determines granularity of grouping • Weka implementation available: http://bit.ly/1oic2xM Future work • Automatic determination of parameter τ • Experimentation using much larger data, other FS methods, etc • Clustering of features • Unsupervised selection?
Simple example Dataset of six features After initialisation, the following groups are formed Within each group, rank determines relevance: e.g. f4more relevant than f3 F1 f4 f3 f1 F2 f2 Ordering of groups F3 f3 f1 F = {F4, F1, F3, F5, F2, F6} F4 f4 f1 f5 Greedy hill-climber etc…
Simple example... f4 f1 • First group to be considered: F4 • Feature f4is preferable over others • So, add this to current (initially empty) subset R • Evaluate M(R + {f4}): • If better score than the current best evaluation, store f4 • Current best evaluation = M(R + {f4}) • Set of features which appear in F4: ({f1 ,f4 ,f5}) • Add to the set Avoids • Next feature groupwith elements that do not appear in Avoids: F1 And so on… f5 F4 F1 f3 f4 f1