1 / 25

Identifying Feature Relevance Using a Random Forest

Identifying Feature Relevance Using a Random Forest. Jeremy Rogers & Steve Gunn. Overview. What is a Random Forest? Why do Relevance Identification? Estimating Feature Importance with a Random Forest Node Complexity Compensation Employing Feature Relevance Extension to Feature Selection.

Download Presentation

Identifying Feature Relevance Using a Random Forest

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying Feature Relevance Using a Random Forest Jeremy Rogers & Steve Gunn

  2. Overview • What is a Random Forest? • Why do Relevance Identification? • Estimating Feature Importance with a Random Forest • Node Complexity Compensation • Employing Feature Relevance • Extension to Feature Selection

  3. Random Forest • Combination of base learners using Bagging • Uses CART-based decision trees

  4. Random Forest (cont...) • Optimises split using Information Gain • Selects feature randomly to perform each split • Implicit Feature Selection of CART is removed

  5. Feature Relevance: Ranking • Analyse Features individually • Measures of Correlation to the target • Feature is relevant if: Assumes no feature interaction Fails to identify relevant features in parity problem

  6. Feature Relevance: Subset Methods • Use implicit feature selection of decision tree induction • Wrapper methods • Subset search methods • Identifying Markov Blankets • Feature is relevant if:

  7. Relevance Identification using Average Information Gain • Can identify feature interaction • Reliability dependant upon node composition • Irrelevant features give non-zero relevance

  8. Node Complexity Compensation • Some nodes are easier to split • Requires each sample to be weighted by some measure of node complexity • Data projected on to one-dimensional space • For Binary Classification:

  9. Unique & Non-Unique Arrangements • Some arrangements are reflections (non-unique) Some arrangements are symmetrical about their centre (unique)

  10. Node Complexity Compensation (cont…) Au - No. Unique Arrangements

  11. Information Gain Density Functions • Node Complexity improves measure of average IG • The effect is visible when examining the IG density functions for each feature • These are constructed by building a forest and recording the frequencies of IG values achieved by each feature

  12. Information Gain Density Functions • RF used to construct 500 trees on an artificial dataset • IG density functions recorded for each feature

  13. Employing Feature Relevance • Feature Selection • Feature Weighting • Random Forest uses a Feature Sampling distribution to select each feature. • Distribution can be altered in two ways • Parallel: Update during forest construction • Two-stage: Fixed prior to forest construction

  14. Parallel • Control update rate using confidence intervals. • Assume Information Gain values have normal distribution. Statistic has a Student’s t distribution with n-1 degrees of freedom Maintain most uniform distribution within confidence bounds

  15. Convergence Rates

  16. Results • 90% of data used for training, 10% for testing • Forests of 100 trees were tested and averaged over 100 trials

  17. Irrelevant Features • Average IG is the mean of a non-negative sample. • Expected IG of an irrelevant feature is non-zero. • Performance is degraded when there is a high proportion of irrelevant features.

  18. Expected Information Gain nL - No. examples in left descendant iL - No. positive examples in left descendant

  19. Expected Information Gain No. positive examples No. negative examples

  20. Bounds on Expected Information Gain • Upper can be approximated as Lower Bound is given by

  21. Irrelevant Features: Bounds • 100 trees built on artificial dataset • Average IG recorded and bounds calculated

  22. Friedman FS: CFS:

  23. Simple FS: CFS:

  24. Results • 90% of data used for training, 10% for testing • Forests of 100 trees were tested and averaged over 100 trials • 100 trees constructed for feature evaluation in each trial

  25. Summary • Node complexity compensation improves measure of feature relevance by examining node composition • Feature sampling distribution can be updated using confidence intervals to control the update rate • Irrelevant features can be removed by calculating their expected performance

More Related