130 likes | 258 Views
Automatic Construction of Semantic Hierarchies Rion L. Snow rionsnow@fairisaac.com. AQUAINT Phase I Biannual Workshop San Diego, CA 9 – 12 June 2003. Fair Isaac 5935 Cornerstone Court San Diego, CA 92121. Notation and Terminology Similarity of Meaning and Usage Automatic Polysemy Discovery
E N D
Automatic Construction of Semantic HierarchiesRion L. Snowrionsnow@fairisaac.com AQUAINT Phase I Biannual WorkshopSan Diego, CA 9 – 12 June 2003 Fair Isaac 5935 Cornerstone CourtSan Diego, CA 92121
Notation and Terminology Similarity of Meaning and Usage Automatic Polysemy Discovery Constructing Semantic Hierarchies Applications to Query Expansion and Sentence Meaning Comparison Outline / Summary
Language Representation in the Model Incoming text is mapped into the universal token language by means of the token lexicon. Our experiments use a lexicon of size 100,000, representing the 30,000 most frequent words and the 70,000 selected phrases. “President George Bush visited San Jose last weekend.” The Token Lexicon president george bush visited Each word activates a fixed token of neurons on the input region. Our experiments typically use a nine region network; this network advances one word at a time over the input text. … … president george bush visited san jose last weekend.
Unsupervised Language Training Target region Cortical antecedent support fascicles are trained between each pairof regions. The connection strength between a pair of neurons is a function of those neurons’ occurrence and co-occurrence probabilities. In our experiments we train a maximum of four fascicles forward and backward from each region, for a total of 52 possible fascicles. For training we use a 1.4 giga-word, 75 million sentence untagged newswire corpus (which includes the AQUAINT newswire corpus). An antecedent support network Cerebral cortex Source region president george bush san jose last weekend. visited
Similarity of Meaning and Usage brazil brazil venezuela venezuela ecuador brazil colombia . guatemala . nicaragua . bolivia . ecuador . mexico . el salvador . honduras . venezuela . costa rica . panama . brazil
Word Families: The Emergence of Abstraction across into along near around through onto toward off inside down over from on in out talks accord agreement negotiations deal peace process plan peace talks efforts comply with abide by compliance withaccordance with violation of line with complying with red blue pink gray green black yellow white passion fascination enthusiasm desire appetite penchant obsession love fondness affection sense newspaper new york times washington post magazine wall street journal journal daily times post newspapers daily news associated press paper news weekly A simple automated process produced over 400,000 families for a lexicon of 30,000 words and 70,000 multi-word ‘phrases’. Each family is a subset of the synonymy set of that word. Families are like word senses, but more abstract and more useful. For example, word family matching between sentences can be used to evaluate their similarity of meaning.
plants animals birds species fish dogs animal dog cats humans insects Automatic Polysemy Discovery Plants Foot Win “living plants” “industrial plants” “body part” “unit of measurement” “verb: to win” “noun: a win” “common verbs” plants plant facilities facility reactors reactor factories factory systems equipment system pipeline nuclear station foot knee ankle shoulder wrist elbow leg foot feet ## feet miles inches meters ## miles mile kilometers ## inches ## meters inch yards km win to win winning won after winning who won wins win victory game victory over loss games season win over opener series win get see do play go take make hear find
Multi-Scale Semantic Similarity intel at&t wells fargo merrill lynch corp. intel ibm microsoft compaq hewlett-packard apple oracle motorola at&t bellsouth gte mci bell atlantic nynex sprint sbc ameritech bell corp. wells fargo bank bank of america first interstate security pacific chase manhattan fargo merrill lynch morgan stanley salomon brothers lehman brothers goldman sachs smith barney goldman j.p. morgan bear stearns corp. co. inc. ltd. plc company group unit “Companies” Super-Family
Extrapolating this Process Yields…The Semantic Structure of the English Language
Semantic Analysis with Word Families “He was named executive vice president following the annual meeting.” “Hewas namedexecutive vice presidentfollowingtheannual meeting.” he i we she you people it who was named will become was appointed became was elected serves as he became served as is now executive vice president vice president senior vice president chief executive vice chairman chief executive officer president chairman director chief operating officer following during shortly after just before prior to after soon after shortly before before days before hours after on the eve of days after in the wake of the their his Its her our my annual meeting convention meeting conference summit annual convention annual conference “Whoserved asvice presidentin the wake ofthemeeting?”
Conclusion • Similarity of meaning is applied to construct powerful hypernym-type semantic hierarchies by grouping words according to similar contexts. • Domain specific knowledge is seamlessly integrated into the overall semantic construction. • This method may be directly applied to foreign languages, as well as to other information modalities such as sound and vision. • We plan on building semantic hierarchies for Chinese, Arabic, and Spanish soon.
The Team • Other Team Members • Dr. Robert Hecht-Nielsen - Project Leader • Dr. Robert Means - Chief Technologist • Kate Mark - Project Coordinator • David Busby - Chief Brain Software Architect • Dr. Syrus Nemat-Nasser - Scientist • Dr. Shailesh Kumar - Scientist • Adrian Fan - Researcher • Research Sponsors • Fair Isaac • ARDA