180 likes | 499 Views
The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?. Illustrations of different approaches Peter Clark and John Thompson Boeing Research 2004. Premise. Intelligent machines needs lots of knowledge , for question-answering intelligent search information integration
E N D
The Knowledge Acquisition Bottleneck Revisited:How can we build large KBs? Illustrations of different approaches Peter Clark and John Thompson Boeing Research 2004
Premise • Intelligent machines needs lots of knowledge, for • question-answering • intelligent search • information integration • natural language understanding • decision support • modeling • etc. etc. • Much of this knowledge can be drawn from some general repository of reusable knowledge • e.g., WordNet • How does one build such a repository? “No-one considers hand-building a large KB to be a realistic proposition these days” [paraphrase of Daphne Koller, 2004]
1. Build it by Hand • “Let’s roll up our sleeves and get on with it!” • But: It’s a daunting task • Our own work • Cyc + Lots in it, (Relatively) well designed ontology - 650 person-years effort so far - Still patchy coverage (why?) • Difficult to use outside Cycorp
1. Build it by Hand (cont) • WordNet + Easy to use + Comprehensive • Little inference-supporting knowledge in • Ad hoc ontology
1. Build it by Hand (cont) • The Component Library Claim: can bound the required knowledge by working at a coarse-grained level + Large, more doable • Hard to use, still very incomplete
2. Extract from Dictionaries - MindNet + Automatically built • Unusable? • Extended WordNet + Won TREC competition - Still somewhat incoherent • Lot of manual labor
3. Corpus-based Text/Web Mining - Schubert’s system + Automatic + Lots of knowledge • Noisy • No word senses • Only grabs certain kinds of knowledge 30M entries…
3. Corpus-based Text/Web Mining (cont) - KnowIt (Etsioni) + automatic • only factoids
4. Community-Based Acquisition • Knowledge entry by the masses • OpenMind + Large • Full of junk, unusable (?) • Would this work with better acquisition tools? (see next slide for illustration)
5. Use Existing Resources • e.g., • databases • CIA World Fact Book • Web data/services • e.g., SRI/ISI’s ARDA QA system + Syntactically simple + Available • Largely limited to factoids • Information integration is a major challenge • different ontologies, contradictory data
Where to? • Can we bound the knowledge needed • for a particular application • for a useful, sharable, general resource? • Which of these approaches seems most realistic? • build by hand • extract from dictionaries • mine text corpora • community knowledge entry • use existing resources