160 likes | 171 Views
The TANGO project transforms tables into a canonicalized form, generates mini-ontologies, and merges them into a growing ontology. The Mini-Ontology GeneratOr (MOGO) performs concept/value recognition, relationship discovery, and constraint discovery.
E N D
Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department of Computer Science Brigham Young University Supported by the
TANGO Overview TANGO: Table ANalysis for Generating Ontologies Project consists of the following three components: • Transform tables into a canonicalized form • Generate mini-ontologies • Merge into a growing ontology
Sample Input Sample Output
Mini-Ontology GeneratOr (MOGO) • Concept/Value Recognition • Relationship Discovery • Constraint Discovery
Concept/Value Recognition • Lexical Clues • Labels as data values • Data value assignment • Data Frame Clues • Labels as data values • Data value assignment • Default • Classifies any unclassified elements according to simple heuristic. Year 2002 2003 Region State Concepts and Value Assignments Location Population Latitude Longitude Northeast Northwest Delaware Maine Oregon Washington 2,122,869 817,376 1,305,493 9,690,665 3,559,547 6,131,118 45 44 45 43 -90 -93 -120 -120
Relationship Discovery • Dimension Tree Mappings • Lexical Clues • Generalization/Specialization • Aggregation • Data Frames • Ontology Fragment Merge
Constraint Discovery • Generalization/Specialization • Computed Values • Functional Relationships • Optional Participation
Validation • Concept/Value Recognition • Correctly identified concepts • Missed concepts • False positives • Data values assignment • Relationship Discovery • Valid relationship sets • Invalid relationship sets • Missed relationship sets • Constraint Discovery • Valid constraints • Invalid constraints • Missed constraints
Concept Recognition • What we counted: • Correct/Incorrect/Missing Concepts • Correct/Incorrect/Missing Labels • Data value assignments
Relationship Discovery • What we counted: • Correct/incorrect/missing relationship sets • Correct/incorrect/missing aggregations and generalization/specializations
Constraint Discovery • What we counted: • Correct/Incorrect/Missing: • Generalization/Specialization constraints • Computed value constraints • Functional constraints • Optional constraints
Concept Recognition • Successes • 98% of concepts identified • Missing label identification • 97% of values assigned to correct concept • Common problems • Finding an appropriate label • Duplicate concepts
Relationship Discovery • Recall of 92% for relationship sets • Missing aggregations and generalizations/specializations • Only found in label nesting
Constraint Discovery • F-measure of 98% for functional relationship sets • Poor computed value discovery • Rows/Columns with totals
Conclusions • Tool to generate mini-ontologies • Assessment of accuracy of automatic generation
Future Work • Tool Enhancements • Linguistic processing • Data frame library • Domain specific heuristics • Alternate Uses • Annotation for the Semantic Web