150 likes | 424 Views
Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey. Indra. Uses Mutual Information to choose parse, assign word sense, and form ontologies based on context
E N D
Indra: Emergent Ontologies from Text for Feeding Data to SimulationsDeborah DuongAugustine ConsultingTRAC-Monterey
Indra • Uses Mutual Information to choose parse, assign word sense, and form ontologies based on context • Iterative feedback finds global consensus on meaning, for accurate role discovery • Flexible emergent ontologies form, combining data driven with hypothesis driven approaches • Feedback facilitates data fusion with other modalities • A way to feed higher level information back to lower level extraction, introducing feedback to data fusion
Language is Context Dependent • Language is deeply context dependent, but natural language programs complete each stage before the next starts in “pipelines” • Indra uses a feedback loop to let the parse, word sense assignment, and ontological assignments inform each other • The result is a flexible data driven ontology that can be aligned with other models
Making “Sense” of Text • “Word sense” of entities and their actions • Inter-Document Coreference Resolution • Many ways of Naming a Person • Different Persons may have the same name • Link Normalization • Many ways of referring to a Behavior • Different Behaviors referred to with the same words
General Roles and Role Relationships • Indra extracts general Role and Role relationships from text • These Role and Role relationships are arranged in ontological groupings • Iterative feedback allows different parts of the ontology to influence each other • Iterative feedback makes system deeply adaptive so outside data can have widespread influence
Global Consensus on Sense • Grouping of entities and links increases the information with each iteration • With each iteration, the unsupervised scatter-gather finds the “sense” of named entities, finding which individuals they are based on their role • As information corrects senses of links and entities, and neighbors correct their neighbors, a global consensus on sense forms. • As links and entities are grouped, an emergent ontology is formed
Iterative Feedback introduced in stages • Stage 1: Upper-lower feedback *Implemented • Larger clusters and smaller clusters influence each other • Stage 2: Side-to-side feedback *Implemented • Node clusters and link clusters influence each other • Stage 3: More Upper-lower feedback • Ontology and parse influence each other • Stage 4: Feedback with external systems • Seed hypotheses from analysts and inference engines have wide influence
Stage 1: Upper –Lower Feedback • Roles are clustered according to link contexts, and Role relations are clustered according to entity contexts • Two separate ontologies form • Clusters at higher levels split clusters at lower levels • Essential for word sense (and “entity sense”) • For example, clusters for factories and autotrophs split the word “plant” • Clustering algorithms are either agglomerative or divisive: “unsupervised scatter gather” is both • Clusters split and divide until convergence
Stage 2 : Side to Side feedback • Stage 1 was clustering entities based on links and links based on entities • Stage 2 is clustering entities based on link *clusters* and links based on entity *clusters* • The separate Role and Role relationship ontologies of stage 1 become intertwined • Needed for data smoothing and more consensus
Stage 3: More upper-lower feedback • Choose parse based on ontology (parse already influences ontology in feedforward) • Choose parse based on how common it is for similar words to be attached in that way. • Example: • Jane ate the salad with a fork • “with” modifies “ate” because tools such as “forks” and “knives” are typically found to be used to “eat” or “consume” • Jane ate the salad with croutons • With modifies salad, because things that are “eaten” or “consumed” are typically foods such as “croutons” or “tomatoes” • Later, instead of using rule based parser, use mutual information to parse (Yuret), making Indra purely statistical • Can be used with any language
Stage 4: Feedback with External Systems • Purpose of feedback is deep adaptivity, so external data can influence and be easily fused • Hypothesis Driven AND Data Driven Ontologies • If an analyst groups concepts: • Collocated paths found • These help develop analyst’s concept • More consonant concepts and paths found • RELATIVELY FEW points of correspondence needed
Example Cluster • p:35805,n:34540.fes // morocco city • p:35805,n:37114.tenerife //spanish city • p:35805,n:37344.zaragoza //spanish city, with football club • p:35805,n:37548.boavista //portugese island, with football club • p:35805,n:38590.maritimo //portugese sports club known for football team • p:43243,n:39997. • p:39997,n:29474.saccoh • p:39997,n:29612.spaho //bosnia small town • p:39997,n:33375.spartak //Moscow football club • p:39997,n:34467.environmentastrit • p:39997,n:34721.haxhi //Albanian football player • p:43243,n:40629.tenerife • p:43243,n:41043.boavista • p:43243,n:42049.maritimo • p:46477,n:44423.bilbao //basque city • p:46477,n:44563.centreleft //football position • p:49912,n:48979.oviedo //spanish city
Example Cluster • p:49224,n:50682.tenerife • p:56352,n:53348. • p:53348,n:46799. • p:46799,n:40301.rayo //football club in madrid • p:46799,n:41027.bilbao • p:53348,n:47751.bilbao • p:56352,n:53354. • p:53354,n:47225.shelling • p:56352,n:53766.shelling • p:56352,n:53814.spartak • p:56352,n:54104.youridjourkaeff • p:56352,n:54108.zaragoza • p:56352,n:54460.colo //chile football club • p:56352,n:55076.kickoff //football term • p:65663,n:62554. • p:62554,n:60508.youridjourkaeff • p:83660,n:85323.youridjourkaeff • p:86579,n:84114. • p:84114,n:81134. • p:81134,n:75091. • p:75091,n:73692.deportivo //spanish football club
Ontologies Problematic • Indra will approximate most likely (highest mutual information) ontology • BUT, analysts want their own ontologies • Different experts look at same data • Data stored in primitive entities and paths • Indra to make semantic model on the fly tailored to ontology of who is looking at it • Tailored Ontologies towards ontologies of particular simulation models
Hypothesis Driven AND Data Driven • Indra can flexibly take in analyst input • Indra can align its ontology to another with very few points of correspondence • Indra can fill in the gaps • Feedback gives Indra advantage over other systems that generate ontologies: • Global consensus • Ability to adapt to any amount of user input