740 likes | 825 Views
Upper Ontology Symposium. For Your Eyes Only. Dr. Douglas B. Lenat , 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email : Lenat@cyc.com Phone: (512) 342-4001. 2 July 2005. Upper Ontology Symposium. For Your Eyes Only.
E N D
Upper Ontology Symposium For Your Eyes Only Dr. Douglas B. Lenat , 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 2 July 2005
Upper Ontology Symposium For Your Eyes Only • Tues. a.m.: issues talk, to UO community • Tues. p.m.: review my 1st public Wed. talk • Wed. a.m.: semi-public: UO Applications • Wed. p.m.: public: value of formal ontol • Wed. p.m.: public: about our communique • Thu.: related talk at Mitre for S&T analysts 15 min 15 min 25min 10min
Upper Ontology Symposium • To first order: I agree with the communiqué (apple pie) • How/why I was forced into this field • Upper ontology mostly just impacts efficiency • Of the vocabulary (lower ontology): fewer terms, simpler terms • Of the axioms: fewer, terser, less ambiguous • Of the various types of cross-ontology mapping axioms • What needs to be shared • No “correct” UO; and yet no need for separate indep. UO’s • Have contexts (“microtheories”) and an ist relation • Ontologies at that point seem to be normal 1st-class objects • As with any important region of the ontology, facet that • 12 useful (categories of) facets or “dimensions” of ontology-space • Just a few remarks about OpenCyc and ResearchCyc
Upper Ontology Symposium • To first order: I agree with the communiqué (apple pie) • How/why I was forced into this field • Upper ontology mostly just impacts efficiency • Of the vocabulary (lower ontology): fewer terms, simpler terms • Of the axioms: fewer, terser, less ambiguous • Of the various types of cross-ontology mapping axioms • What needs to be shared • No “correct” UO; and yet no need for separate indep. UO’s • Have contexts (“microtheories”) and an ist relation • Ontologies at that point seem to be normal 1st-class objects • As with any important region of the ontology, facet that • 12 useful (categories of) facets or “dimensions” of ontology-space • Just a few remarks about OpenCyc and ResearchCyc
How/why I was forced into this Goal: Amplify human beings via ubiquitous real AI Reality: Throughout the 1960’s and 1970’s, every subfield of AI kept hitting the same brick wall: BRITTLENESS BOTTLENECK NL understanding, speech understanding, robotics, learning, expert systems, search, semantic database integration,… (Programs need massive amounts/coverage of common sense and general world knowledge)
Repr. UO mostly just impacts efficiency • E.g., natural language sentences • E.g., node & link diagrams • E.g., high-resolution imagery • E.g., nth–order logic formulae • E.g., database tuples • E.g., algebraic equations • Of the vocabulary (lower ontology): fewer and simpler terms Ex: non-souled trees and souled trees. Ex: FranceIn1985 Ex: grue and bleen • Of the axioms: fewer, terser, less ambiguous • Ex: things grue by day are usually bleen at night • Ex: when smurfing a car, first smurf the key • Of the cross-ontology mapping axioms
Thing Intangible Individual SetOr Collection Intangible SpatialThing Individual Situation Temporal Thing Set- Collection Mathematical Mathematical Object Event Role Relationship Time Interval Something Existing Physical Function- Attribute TruthFunctional Event Denotational Value Static Situation Logical Predicate Partially Partially Connective Intangible Tangible Quantifier ActorSlot Configuration Composite Intangible TangibleAnd Existing IntangibleObject Thing
So if the UO mostly impacts efficiency, where is the power? Water is wet upper ontology Vehicles slow down in bad weather task-specific knowledge HUMMV’s lose 18% traction in 4-inch-deep mud
So if the UO mostly impacts efficiency, where is the power? • The Upper Level need only be adequate • The Lower Levels supply the minutiae • The Intermediate Level is locus of power So Upper + Intermed. is what we need to share
Answering even an innocuous-sounding question: “Can vehicle X get from Y to Z by time t ?” may require intermediate-level knowledge about localized spatial things, pathways, earth sciences, weather, topography, oceanography (depth, temperature, biota), terrain, transportation, industry, vehicles, geopolitics (“international waters”), communications, the driver, holidays, ... So Upper + Intermed. is what we need to share
What Needs to be Shared? • bits/bytes/streams/network… • alphabet, special characters,… • words, morphological variants,… • syntactic meta-level markups (HTML) • semantic meta-level markups (SGML, XML) • content (logical representation of doc/page/...) • context (models of the user’s prior/tacit knowledge (incl. common sense, recent history), wants/needs, budget,…and n dimensions of metadata: time, space, level of granularity, the source’s purpose/ideology...) Semantic Web
To do the logical/arithmetic combination across information sources, we need tens of thousands of relations, not tens What Needs to be Shared? DAML+OIL, OWL add a few more distinctions: inverses, unambiguous properties, unique properties, lists, restrictions, cardinalities, pairwise disjoint lists, datatypes, … • bits/bytes/streams/network… • alphabet, special characters,… • words, morphological variants,… • syntactic meta-level markups (HTML) • semantic meta-level markups (SGML, XML) • content (logical representation of doc/page/...) • context (models of the user’s prior/tacit knowledge (incl. common sense, recent history), wants/needs, budget,…and n dimensions of metadata: time, space, level of granularity, the source’s purpose/ideology...) Tiny vocabulary (# distinctions) of standard relations: rdf:type, subclass, label, domain, range, comment,… Beyond which diversity is tolerated Which means divergence is inevitable “What do you mean we have no standard, we have lots of standards!”
To do the logical/arithmetic combination across information sources, we need tens of thousands of relations, not tens What Needs to be Shared? • bits/bytes/streams/network… • alphabet, special characters,… • words, morphological variants,… • syntactic meta-level markups (HTML) • semantic meta-level markups (SGML, XML) • content (logical representation of doc/page/...) • context (common sense, recent utterances, and n dimensions of metadata: time, space, level of granularity, the source’s purpose, etc.) Analogy: # words in the English language
There is no “correct” UO • Are apes monkeys? • Are poinsettias red flowers? • Do we need to distinguish instance & subtype? • Are these two terms one and the same thing? • Black US Presidents in the 20th Century • Female US Presidents in the 20th Century • Davidsonian reification of events or not?
(marriedIn <groom> <bride> <wedding> <date>) Events are rich (no limit to the number of arguments) (groom Wedding0947 JoeSmith) (bride Wedding0947 JaneDoe) (dateOfEvent Wedding0947 (DayFn 13 (MonthFn May (YearFn 1999))))
No need for separate ontologies • Are apes monkeys? • Are poinsettias red flowers? • Do we need to distinguish instance & subtype? • Are these two terms one and the same thing? • Black US Presidents in the 20th Century • Female US Presidents in the 20th Century • Davidsonian reification of events or not? (ist <context> <assertion>) Each of these is true in some contexts and false in others Contexts (microtheories) are themselves terms in the ontology. (genlMt HockeyMt SportsMt) 12 facets or dimensions that (largely) characterize a Mt.
“If it’s raining, carry an umbrella” • the performer is a human being, • the performer is sane, • the performer can carry an umbrella; thus: the performer is not a baby, not unconscious, not dead, • the performer is going to go outdoors now/soon, • their actions permit them a free hand (e.g., not wheelbarrowing) • their actions wouldn’t be unduly hampered by it (e.g., marathon-running) • the wind outside is not too fierce (e.g., hurricane strength) • the time period of the action is after the invention of the umbrella • the culture is one that uses umbrellas as a rain- (not just sun-)protection device, • the performer has easy access to an umbrella; thus: not too destitute, not someone who lives where it practically never rains, not at the office/theater/… caught without an umbrella • the performer is going to be unsheltered for some period of time the more waterproof their clothing, the gentler the rain, and the warmer the air, the longer that time period • the performer will not be wet anyway (e.g., swimming) • the rain is annoying -- but merely annoying. Thus: not ammonia rain on Venus, radioactive post-apocalyptic rain, biblical (Noah’s-ark-sized, or frogs/blood as rained on Pharaoh) the performer is not a hydrophobic person, gingerbread man, etc., and not a hydrophilic person, someone dying of thirst, etc.
12 Dimensions of Ontol. Contexts • Anthropacity / Let’s • Time • GeoLocation • TypeOfPlace • TypeOfTime • Culture • Sophistication/Security • Topic • Granularity • Modality/Disposition/Epistemology • Argument-Preference • Justification
How we evaluate proposed dimensions Criteria: • Do they separate out mutually-irrelevant (and esp. mutually-incompatible) portions of the KB? • Is it easy for Cyc to mechanically compute the overlap or disjointness of regions of n-dim. context-space? • Cognitive assonance: Do they (esp. their extrema) correspond to familiar real-world notions? • Using them, is it empirically faster to enter assertions? • Using them, is it empirically faster to do inference?
Context feature: Time • The piece of time (the 1920s, the first five years after WWII, the Pleistocene Era) in which a context’s assertions hold. • Useful because: • Facts about very distant time periods are often mutually irrelevant; if stated tersely, they are often inconsistent. • Inefficient to temporally qualify each assertion individually. • In many reasoning contexts, causes precede effects by a small amount of time.
Context feature: Spatial Location • The piece of space (Lebanon, my bloodstream, the Southern Hemisphere, Mike Ditka’s backyard) in which a context’s assertions hold. • Useful because: • Facts about very distant locations are often mutually irrelevant; if stated tersely, they are often inconsistent. • Inefficient to spatially qualify each assertion individually. • In many reasoning contexts, interacting objects and events are usually spatially proximate.
Context feature: Culture • The cultural point of view assumed by the assertions in a context. • This dimension has many subdimensions, e.g.: • political culture, sexual culture, sexual orientation culture, age culture, generation culture, religious culture, ancestral culture, geo-political culture, regional culture, region-type culture, legal culture, and more
Culture, ctn’d • Useful because: • In many reasoning contexts, some cultural perspective (or set of perspectives) is assumed, and other perspectives are not relevant. • Accuracy of applications which involve reasoning about agents’ intent and expectations requires sensitivity to variations in cultural context. • Many good ways to infer the cultural POV of author. • This dimension is quite possibly the most difficult of all: very complex, very hard to separate fact from preconception. Very hard to maintain objectivity and not antagonize everyone.
Context feature: Sophistication • The level of information, education, intelligence or other capacity for knowledge assumed by the assertions in this context. • What capacity for knowledge would a person need in order to: • Understand the assertions, to learn them in this form, • Recognize them as true, once they are hinted at or stated, • Already be familiar with the assertions, at least theoretically, • Have already deeply assimilated the content of the assertions? • Useful for dialogue, and collaborative planning applications.
Context feature: Granularity • The level of coarseness assumed by assertions in a context. • This dimension has many potential subdimensions: • size of objects, duration of events, parts, subevents, suborganizations, specificity or abstraction of classes (collections), relationships, measurements • Ex: Newtonian vs Relativistic vs Quantum physics • Useful because: answers often vary drastically depending on the granularity desired.
Evaluating proposed dimensions Criteria: • Do they separate out mutually-irrelevant (and esp. mutually-incompatible) portions of the KB? • Is it easy for Cyc to mechanically compute the overlap or disjointness of regions of n-dim. context-space? • Cognitive assonance: Do they (esp. their extrema) correspond to familiar real-world notions? • Using them, is it empirically faster to enter assertions? • Using them, is it empirically faster to do inference?
UnitedStatesIn1985Context: Ronald Reagan is president. PennsylvaniaIn1985Context: Dick Thornburgh is governor. Mathematical Factoring of MetaData Dimensions There are at least 900,000 doctors. This inference depends on the time, space, and respective granularities of the contexts. LehighCountyInFebruary1985Context: Dick Thornburgh is governor and Ronald Reagan is president. Dick Thornburgh is governor and there are at least 900,000 doctors.
Therefore: Doug is talking, at 2:05 to 2:40, on 5/4/04. But not: Doug is talking, at 2:11:15, on 5/4/04. Time Indices and Granularities Doug is talking, at 1:45 to 2:45, on 5/4/04. Doug is talking.
t3 t4 Qa ? Qa ? Calculi for deciding (dimension by dimension) in what context we can assert a logical conclusion Backward Inference Pa Pa (implies Px Qx) t1 t1 t2 t2 Qacan be inferred at t3, with granularity , if t3 subsumes some instanceof the granularity ofPa, and some instance of the granularity of (implies Px Qx), and is at least as big as both of these granularities. If t4 subsumes some instance of the granularity of Pa, 1, and some instance of the granularity of (implies P Q), 2,then Qa is inferred at t4, with each granularity in the set of minimal upper bounds of (12).
(genlMt The content of this context Subsumes the content of this context ) Inferring Context Subsumption (MtSpace LebanonDataContext (TimeIndex: June, 1985) (TemporalGranularity: Month) (SpatialGranularity: Governorate)) June1985LebanonDataContext (MtSpace SouthWestAsiaDataContext (TimeIndex: 1985) (TemporalGranularity: Day) (SpatialGranularity: SquareMile)) 1985SouthWestAsiaDataContext
Getting back to:No need for separate ontologies • Declarative assertions that map them to Cyc • And thereby map between them (using Cyc as an interlingua) • Create a context or Mt for each external ontology O; eventually, there is enough in and about each such Mt that it almost subsumes O. • “Almost” because O might be optimized in some way repr./algorithmically (e.g., a DB)
"(synonymousExternalConcept TERM SOURCE STRING) means that the CycL expression TERM is synonymous with at least one of the interpretations of STRING in the external data source SOURCE." (synonymousExternalConcept InnerEar MeSH-Information1997 "Labyrinth | A9.246.631") (synonymousExternalConcept Temperature CNLPOntology "temp") (synonymousExternalConcept Concerto WordNet-Version2_0 "N06611782") (synonymousExternalConcept PowerGenerationComplex-Nuclear LSCOMObjectAndSituationOntology “power plant (nuclear)” )
"(overlappingExternalConcept TERM SOURCE STRING) means that the CycL expression TERM overlaps semantically with at least one of the interpretations of STRING in the external data source SOURCE." (overlappingExternalConcept TextualPCW CNLPOntology "document") (overlappingExternalConcept defectors HorusPersonOrganizationOntology "splinterFromOrg") (overlappingExternalConcept SpleniusCapitis MeSH-Information1997 "Neck Muscles | A2.633.567.650")
"(codeMapping MAP CODE DENOTATION) specifies one mapping for the reified mapping MAP. When a table uses MAP to interpret some field, the value CODE in that field will be interpreted as DENOTATION." (codeMapping FACC-FeatureType-CMLS "BH100" Moat) (codeMapping NGA-FeatureType-CMLS "PLN" (GroupFn Plain-Topographical)) (codeMapping FACC-FeatureType-CMLS "GB020" AircraftArrestingGear) (ForAll ?x (fieldDecoding USGS-GNIS-LS ?x (TheFieldCalled “population”) (numberOfInhabitants (TheReferentOfTheRow USGS-GNIS) ?x)))
Upper Ontology Symposium • To first order: I agree with the communiqué (apple pie) • How/why I was forced into this field • Upper ontology mostly just impacts efficiency • Of the vocabulary (lower ontology): fewer terms, simpler terms • Of the axioms: fewer, terser, less ambiguous • Of the various types of cross-ontology mapping axioms • What needs to be shared • No “correct” UO; and yet no need for separate indep. UO’s • Have contexts (“microtheories”) and an ist relation • Ontologies at that point seem to be normal 1st-class objects • As with any important region of the ontology, facet that • 12 useful (categories of) facets or “dimensions” of ontology-space • Just a few remarks about OpenCyc and ResearchCyc
Thing Intangible Thing Individual Sets Relations Spatial Thing Temporal Thing Partially Tangible Thing Space Time Paths Events Scripts Spatial Paths Logic Math Agents Physical Objects Borders Geometry Artifacts Living Things Organ- ization Materials Parts Statics Actors Actions Movement Life Forms Plans Goals State Change Dynamics Organizational Actions Types of Organizations Ecology Human Beings Human Activities Physical Agents Natural Geography Organizational Plans Human Organizations Plants Human Anatomy & Physiology Nations Governments Geo-Politics Human Artifacts Political Geography Agent Organizations Business & Commerce Politics Warfare Animals Emotion Perception Belief Human Behavior & Actions Sports Recreation Entertainment Social Behavior Products Devices Conceptual Works Purchasing Shopping Professions Occupations Weather Law Vehicles Buildings Weapons Mechanical & Electrical Devices Software Literature Works of Art Social Relations, Culture Business, Military Organizations Earth & Solar System Social Activities Transportation & Logistics Travel Communication Everyday Living Language Cyc Knowledge Base Cyc contains: 15,000 Predicates 68,000 Collections 300,000 Concepts 3,200,000 Assertions • Represented in: • First Order Logic • Higher Order Logic • Context Logic • Micro-theories General Knowledge about Various Domains Specific data, facts, and observations
OpenCyc + 1M taxon./mereol. Axioms + inference engines, interfaces Cyc contains: 15,000 Predicates 68,000 Collections 300,000 Concepts 3,200,000 Assertions ResearchCyc All of that + whole Cyc KB
Temporal Relations 37 Relations Between Temporal Things #$temporalBoundsContain #$temporalBoundsIdentical #$startsDuring #$overlapsStart #$startingPoint #$simultaneousWith #$after #$temporalBoundsIntersect #$temporallyIntersects #$startsAfterStartingOf #$endsAfterEndingOf #$startingDate #$temporallyContains #$temporallyCooriginating
#$physicalParts #$externalParts #$internalParts #$anatomicalParts #$constituents #$functionalPart Senses of ‘Part’ • #$parts • #$intangibleParts • #$subInformation • #$subEvents • #$physicalDecompositions • #$physicalPortions
Senses of ‘In’ • Can the inner object leave by passing between members of the outer group? • Yes -- Try#$in-Among
If the container were turned around could the contained object fall out? Senses of ‘In’ • Does part of the inner object stick out of the container? • None of it. -- Try #$in-ContCompletely • Yes -- Try #$in-ContPartially • Yes -- Try #$in-ContOpen • No -- Try #$in-ContClosed
Can it be removed, if enough force is used, without damaging either object? Yes -- Try #$in-Snugly or #$screwedIn Senses of ‘In’ Is it attached to the inside of the outer object? • Yes -- Try #$connectedToInside Does the inner object stick into the outer object? • Yes -- Try#$sticksInto
#$PhysicalStateChangeEvent #$TemperatureChangingProcess #$BiologicalDevelopmentEvent #$ShapeChangeEvent #$MovementEvent #$ChangingDeviceState #$GivingSomething #$DiscoveryEvent #$Cracking #$Carving #$Buying #$Thinking #$Mixing #$Singing #$CuttingNails #$PumpingFluid Event Types 11,000 more
#$performedBy #$causes-EventEvent #$objectPlaced #$objectOfStateChange #$outputsCreated #$inputsDestroyed #$assistingAgent #$beneficiary #$fromLocation #$toLocation #$deviceUsed #$driverActor #$damages #$vehicle #$providerOfMotiveForce #$transportees Relations Between an Event and its Participants Over 400 more.
#$opinions #$knows #$rememberedProp #$perceivesThat #$seesThat #$tastesThat Propositional Attitudes Relations Between Agents and Propositions • #$goals • #$intends • #$desires • #$hopes • #$expects • #$beliefs
Devices • Device Specific Predicates • #$gunCaliber • #$speedOf • Device States (40+) • #$DeviceOn • #$CockedState • Over 4000 Specializations of #$PhysicalDevice • #$ClothesWasher • #$NuclearAircraftCarrier • Vocabulary for Describing Device Functions • #$primaryFunction-DeviceType
Lexical Entry Example: Coke Constant : Coke-TheWordisa : EnglishWord Mt : EnglishMtsingular : “coke” pnSingular : “Coke” massNumber : “coke” pnMassNumber : “Coke” (denotation Coke-TheWord ProperCountNoun 0 (ServingFn CocaCola)) (denotation Coke-TheWord ProperMassNoun 0 CocaCola) (denotation Coke-TheWord MassNoun 0 Cocaine-Powder) (denotation Coke-TheWord MassNoun 2 ColaSoftDrink) (denotation Coke-TheWord SimpleNoun 0 (ServingFn ColaSoftDrink) <various other denotations of the English word “coke”> SLANG SLANG SLANG
Lexical Entry Example: Eat Constant: Eat-TheWord isa: EnglishWord (verbSemTrans Eat-TheWord 0 TransitiveNPCompFrame(and (isa :ACTION EatingEvent) (performedBy :ACTION :SUBJECT) (inputsDestroyed :ACTION :OBJECT))) Mt: EnglishMt infinitive: “eat” pastTense: “ate”perfect: “eaten” agentive-Sg: “eater” (subcatFrame Eat-TheWord Verb 0 TransitiveNPCompFrame)