Indra

Indra: Emergent Ontologies from Text for Feeding Data to SimulationsDeborah DuongAugustine ConsultingTRAC-Monterey

Indra • Uses Mutual Information to choose parse, assign word sense, and form ontologies based on context • Iterative feedback finds global consensus on meaning, for accurate role discovery • Flexible emergent ontologies form, combining data driven with hypothesis driven approaches • Feedback facilitates data fusion with other modalities • A way to feed higher level information back to lower level extraction, introducing feedback to data fusion

Language is Context Dependent • Language is deeply context dependent, but natural language programs complete each stage before the next starts in “pipelines” • Indra uses a feedback loop to let the parse, word sense assignment, and ontological assignments inform each other • The result is a flexible data driven ontology that can be aligned with other models

Making “Sense” of Text • “Word sense” of entities and their actions • Inter-Document Coreference Resolution • Many ways of Naming a Person • Different Persons may have the same name • Link Normalization • Many ways of referring to a Behavior • Different Behaviors referred to with the same words

General Roles and Role Relationships • Indra extracts general Role and Role relationships from text • These Role and Role relationships are arranged in ontological groupings • Iterative feedback allows different parts of the ontology to influence each other • Iterative feedback makes system deeply adaptive so outside data can have widespread influence

Global Consensus on Sense • Grouping of entities and links increases the information with each iteration • With each iteration, the unsupervised scatter-gather finds the “sense” of named entities, finding which individuals they are based on their role • As information corrects senses of links and entities, and neighbors correct their neighbors, a global consensus on sense forms. • As links and entities are grouped, an emergent ontology is formed

Iterative Feedback introduced in stages • Stage 1: Upper-lower feedback *Implemented • Larger clusters and smaller clusters influence each other • Stage 2: Side-to-side feedback *Implemented • Node clusters and link clusters influence each other • Stage 3: More Upper-lower feedback • Ontology and parse influence each other • Stage 4: Feedback with external systems • Seed hypotheses from analysts and inference engines have wide influence

Stage 1: Upper –Lower Feedback • Roles are clustered according to link contexts, and Role relations are clustered according to entity contexts • Two separate ontologies form • Clusters at higher levels split clusters at lower levels • Essential for word sense (and “entity sense”) • For example, clusters for factories and autotrophs split the word “plant” • Clustering algorithms are either agglomerative or divisive: “unsupervised scatter gather” is both • Clusters split and divide until convergence

Stage 2 : Side to Side feedback • Stage 1 was clustering entities based on links and links based on entities • Stage 2 is clustering entities based on link *clusters* and links based on entity *clusters* • The separate Role and Role relationship ontologies of stage 1 become intertwined • Needed for data smoothing and more consensus

Stage 3: More upper-lower feedback • Choose parse based on ontology (parse already influences ontology in feedforward) • Choose parse based on how common it is for similar words to be attached in that way. • Example: • Jane ate the salad with a fork • “with” modifies “ate” because tools such as “forks” and “knives” are typically found to be used to “eat” or “consume” • Jane ate the salad with croutons • With modifies salad, because things that are “eaten” or “consumed” are typically foods such as “croutons” or “tomatoes” • Later, instead of using rule based parser, use mutual information to parse (Yuret), making Indra purely statistical • Can be used with any language

Stage 4: Feedback with External Systems • Purpose of feedback is deep adaptivity, so external data can influence and be easily fused • Hypothesis Driven AND Data Driven Ontologies • If an analyst groups concepts: • Collocated paths found • These help develop analyst’s concept • More consonant concepts and paths found • RELATIVELY FEW points of correspondence needed

Example Cluster • p:35805,n:34540.fes // morocco city • p:35805,n:37114.tenerife //spanish city • p:35805,n:37344.zaragoza //spanish city, with football club • p:35805,n:37548.boavista //portugese island, with football club • p:35805,n:38590.maritimo //portugese sports club known for football team • p:43243,n:39997. • p:39997,n:29474.saccoh • p:39997,n:29612.spaho //bosnia small town • p:39997,n:33375.spartak //Moscow football club • p:39997,n:34467.environmentastrit • p:39997,n:34721.haxhi //Albanian football player • p:43243,n:40629.tenerife • p:43243,n:41043.boavista • p:43243,n:42049.maritimo • p:46477,n:44423.bilbao //basque city • p:46477,n:44563.centreleft //football position • p:49912,n:48979.oviedo //spanish city

Example Cluster • p:49224,n:50682.tenerife • p:56352,n:53348. • p:53348,n:46799. • p:46799,n:40301.rayo //football club in madrid • p:46799,n:41027.bilbao • p:53348,n:47751.bilbao • p:56352,n:53354. • p:53354,n:47225.shelling • p:56352,n:53766.shelling • p:56352,n:53814.spartak • p:56352,n:54104.youridjourkaeff • p:56352,n:54108.zaragoza • p:56352,n:54460.colo //chile football club • p:56352,n:55076.kickoff //football term • p:65663,n:62554. • p:62554,n:60508.youridjourkaeff • p:83660,n:85323.youridjourkaeff • p:86579,n:84114. • p:84114,n:81134. • p:81134,n:75091. • p:75091,n:73692.deportivo //spanish football club

Ontologies Problematic • Indra will approximate most likely (highest mutual information) ontology • BUT, analysts want their own ontologies • Different experts look at same data • Data stored in primitive entities and paths • Indra to make semantic model on the fly tailored to ontology of who is looking at it • Tailored Ontologies towards ontologies of particular simulation models

Hypothesis Driven AND Data Driven • Indra can flexibly take in analyst input • Indra can align its ontology to another with very few points of correspondence • Indra can fill in the gaps • Feedback gives Indra advantage over other systems that generate ontologies: • Global consensus • Ability to adapt to any amount of user input

Indra

Indra

Presentation Transcript

Indra Espacio

indra

Indra Nooyi PepsiCo

INDRA FUTURE MINDS COMPETITION

INDRA at GSI

Indra Espacio

INDRA FUTURE MINDS COMPETITION PROJECT PRESENTATION

Indra nil Chattopadhyay