1 / 23

Active Learning: Sampling Method

Active Learning: Sampling Method. Meeting 6 — Jan 31, 2013 CSCE 6933 Rodney Nielsen. Space of Active Learning. Uncertainty Sampling. Uncertainty sampling Select examples based on confidence in prediction Least confident Margin sampling Entropy-based models.

hua
Download Presentation

Active Learning: Sampling Method

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Active Learning: Sampling Method Meeting 6 — Jan 31, 2013 CSCE 6933 Rodney Nielsen

  2. Space of Active Learning

  3. Uncertainty Sampling • Uncertainty sampling • Select examples based on confidence in prediction • Least confident • Margin sampling • Entropy-based models

  4. If |Y|=2, three uncertainty methods are the same • If |Y|=3, consider the following examples • 0.34, 0.33, 0.33 • 0.50, 0.50, 0.00 • 0.50, 0.49, 0.01 • 0.40, 0.30, 0.30 • 0.41, 0.40, 0.19

  5. Query by Committee • Train a committee of hypotheses • Representing different regions of the version space • Obtain some measure of (dis)agreement on the instances in the dataset (e.g., vote entropy) • Assume the most informative instance is the one on which the committee has the most disagreement • Goal: minimize the version space • No agreement on size of committee, but even 2-3 provides good results

  6. Competing Hypotheses • a

  7. Expected Model Change • Query the instance that would result in the largest expected change in h based on the current model and Expectations • E.g., the instance that would result in the largest gradient descent in the model parameters • Prefer the instance x that leads to the most significant change in the model

  8. Expected Model Change • What learning algorithms does this work for • What are the issues • Can be computationally expensive for large datasets and feature spaces • Can be led astray if features aren’t properly scaled • How do you properly scale the features?

  9. Admin • IR / Thursday’s meeting time

  10. ML Publication Venues • ML Journals • Machine Learning • Journal of Machine Learning Research • ML Conferences • NIPS – Neural Information Processing • ICML – International Conference on ML • ECML • IROS – Intl Conf on Intelligent Robots and Systems • ICPR – Intl Conference on Pattern Recognition • ISNN – Intl Symposium on Neural Networds • COLT – Computational Learning Theory • UAI – Uncertainty in Artificial Intelligence (AI) • AAAI – Association for Advancement of AI • IJCAI – International Joint Conference on AI • FLAIRS – Conference of the AI Research Society

  11. NLP Publication Venues • NLP Journals • Computational Linguistics • JNLE – Journal of Natural Language Engineering • Language Resources and Evaluation • NLP Conferences • ACL / NAACL / EACL / PAACL • ICASP • CoLing • HLT • LREC • EMNLP • Interspeech

  12. Projects • Set up meeting with me next week to discuss possible projects • Come prepared to discuss the concept you are most interested in pursuing (not the implementation details, just the high-level description) • Or if you don’t have a specific goal, send me an email describing your general interests

  13. Reading Responses • Skip this coming Monday/Tuesday reading response

  14. Estimated Error Reduction • Other models approximate the goal of minimizing future error by minimizing (e.g., uncertainty, variance, …) • Estimated Error Reduction attempts to directly minimize E[error]

  15. Estimated Error Reduction • Often computationally prohibitive • Binary logistic regression would be O(|U||L|G) • Where G is the number of gradient descent iterations to convergence • Conditional Random Fields would be O(T|Y|T+2|U||L|G) • Where T is the number of instances in the sequence

  16. Variance Reduction • Regression problems • E[error2] = noise + bias + variance: • Learner can’t change noise or bias so minimize variance • Fisher Information Ratio used for classification

  17. Outlier Phenomenon • Uncertainty sampling and Query by Committee might be hindered by querying many outliers

  18. Density Weighted Methods • Uncertainty sampling and Query by Committee might be hindered by querying many outliers • Density weighted methods overcome this potential problem by also considering whether the example is representative of the input dist. • Tends to work better than any of the base classifiers on their own

  19. Diversity • Naïve selection by earlier methods results in selecting examples that are very similar • Must factor this in and look for diversity in the queries

  20. Active Learning Empirical Results • Appears to work well, barring publication bias From Settles, 2009

  21. Labeling Costs • Are all labels created equal? • Generating labels by experiments • Some instances easier to label (eg, shorter sents) • Can pre-label data for a small savings • Experimental problems • Value of information (VOI) • Considers labeling & estmtd misclassification costs • Critical to goal of Active Learning • Divide informativeness by cost?

  22. Batch Mode Active Learning

  23. Questions • ???

More Related