1 / 21

Feature Grouping-Based Fuzzy-Rough Feature Selection

Feature Grouping-Based Fuzzy-Rough Feature Selection. Richard Jensen Neil Mac Parthaláin Chris Cornelis. Outline. Motivation/Feature Selection (FS) Rough set theory Fuzzy-rough feature selection Feature grouping Experimentation. The problem: too much data.

conan
Download Presentation

Feature Grouping-Based Fuzzy-Rough Feature Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis

  2. Outline • Motivation/Feature Selection (FS) • Rough set theory • Fuzzy-rough feature selection • Feature grouping • Experimentation

  3. The problem: too much data • The amount of data is growing exponentially • Staggering 4300% annual growth in global data • The complexity of the problem is vast • (e.g. the powerset of features for FS) • Therefore, there is a need for FS and other data reduction methods • Curse of dimensionality: a problem for machine learning techniques

  4. Feature selection • Remove features that are: • Noisy • Irrelevant • Misleading • Task: find a subset that • Optimises a measure of subset goodness • Has small/minimal cardinality • In rough set theory, this is a search for reducts • Much research in this area

  5. Rough set theory (RST) • For a subset of features P Upper approximation Set X Lower approximation Equivalence class [x]P

  6. Rough set feature selection • By considering more features, concepts become easier to define…

  7. Rough set theory • Problems: • Rough set methods (usually) require data discretization beforehand • Extensions require thresholds, e.g. tolerance rough sets • Also no flexibility in approximations • E.g. objects either belong fully to the lower (or upper) approximation, or not at all

  8. Fuzzy-rough sets • Extends rough set theory • Use of fuzzy tolerance instead of crisp equivalence • Approximations are fuzzified • Collapses to traditional RST when data is crisp • New definitions: Fuzzy lower approximation: Fuzzy upper approximation:

  9. Fuzzy-rough feature selection • Search for reducts • Minimal subsets of features that preserve the fuzzy lower approximations for all decision concepts • Traditional approach • Greedy hill-climbing algorithm used • Other search techniques have been applied (e.g. PSO) • Problems • Complexity is problematic for large data (e.g. over several thousand features) • No explicit handling of redundancy

  10. Feature grouping • Idea: don’t need to consider all features • Those that are highly correlated with each other carry the same or similar information • Therefore, we can group these, and work on a group by group basis • This paper: based on greedy hill-climbing • Group-then-rank approach • Relevancy and redundancy handled by • Correlation: similar features grouped together • Internal ranking (correlation with decision feature) f4 f1 f7 f2 f9 f8 F1

  11. Forming groups of features τ Threshold: Data Calculate correlations Correlation measure Redundancy . . . Feature groups F1 F2 F3 Fn #1 f3 #2 f12 #3 f1 … #m fn #1 f #2 f #3 f … #m fn #1 f #2 f #3 f … #m fn #1 f #2 f #3 f … #m fn Internally-ranked feature groups Relevancy

  12. Selecting features . . . Search mechanism Feature subset search and selection Subset evaluation Selected subset(s)

  13. Fuzzy-rough feature grouping

  14. Initial experimentation • Setup: • 10 datasets (9-2557 features) • 3 classifiers • Stratified 5 x 10-fold cross-validation • Performance evaluation in terms of • Subset size • Classification accuracy • Execution time • FRFG compared with • Traditional greedy hill-climber (GHC) • GA & PSO (200 generations, population size: 40)

  15. Results: averagesubset size

  16. Results: classification accuracy JRip IBk (k=3)

  17. Results: execution times (s)

  18. Conclusion FRFG • Motivation: reduce computational overhead; improve consideration of redundancy • Group-then-rank approach • Parameter determines granularity of grouping • Weka implementation available: http://bit.ly/1oic2xM Future work • Automatic determination of parameter τ • Experimentation using much larger data, other FS methods, etc • Clustering of features • Unsupervised selection?

  19. Thank you!

  20. Simple example Dataset of six features After initialisation, the following groups are formed Within each group, rank determines relevance: e.g. f4more relevant than f3 F1 f4 f3 f1 F2 f2 Ordering of groups F3 f3 f1 F = {F4, F1, F3, F5, F2, F6} F4 f4 f1 f5 Greedy hill-climber etc…

  21. Simple example... f4 f1 • First group to be considered: F4 • Feature f4is preferable over others • So, add this to current (initially empty) subset R • Evaluate M(R + {f4}): • If better score than the current best evaluation, store f4 • Current best evaluation = M(R + {f4}) • Set of features which appear in F4: ({f1 ,f4 ,f5}) • Add to the set Avoids • Next feature groupwith elements that do not appear in Avoids: F1 And so on… f5 F4 F1 f3 f4 f1

More Related