220 likes | 371 Views
Active Learning. Meeting 5 — October 21, 2014 CSCE 6933 Rodney Nielsen. Active Learning. Usually an abundance of unlabeled data How much should you label? Which instances should you label? Does it matter? Can the learner benefit from selective labeling?
E N D
Active Learning Meeting 5 — October 21, 2014 CSCE 6933 Rodney Nielsen
Active Learning • Usually an abundance of unlabeled data • How much should you label? • Which instances should you label? • Does it matter? • Can the learner benefit from selective labeling? • Active Learning: incrementally request labels for instances believed to be informative
Learning Paradigms random ? query ? ? ? ? random Supervised Learning Unsupervised Learning Active Learning
Active Learning Applications • Speech Recognition • 10 mins to annotate words in 1 min of speech • 7 hrs to annotate phonemes of 1 minute speech • Named Entity Recognition • Half an hour for a simple newswire article • PhD for a bioinformatics article • Image annotation
Heuristic Active Learning Algorithm • Start with unlabeled data • Randomly pick small num exs to have labeled • Repeat • Train classifier on labeled data • Query the unlabeled ex that: • Is closest to the boundary • Has the least certainty • Minimizes overall uncertainty random ? query random ? ? ? ?
Active Learning Performance Ex. • Document classification: baseball vs. hockey
Membership Query Synthesis • Dynamically construct query instances based on expected informativeness • Applications • Character recognition. • Robot scientist: find optimal growth medium for a yeast • 3x $ decrease vs. cheapest next • 100x $ decrease vs. random selection
Questions • Membership Query Synthesis
Stream-based Selective Sampling • Informativeness measure • Region of uncertainty / Version space • Applications • POST • Sensor scheduling • IR ranking • WSD
Pool-based Active Learning • Informativeness measure • Applications • Cancer diagnosis • Text classification • IE • Image classfctn & retrieval • Video classfctn & retrieval • Speech recognition
Questions • Questions???
Uncertainty Sampling • Uncertainty sampling • Select examples based on confidence in prediction • Probabilistic models • Entropy-based models
Questions • Entropy
Instance Selection Methods • Informativeness measures • Region of uncertainty • Information density