1 / 35

Quality Taxonomies

Quality Taxonomies. Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th , 2001. Ontology / Taxonomy. Static Discovery. Root Ontology. Taxonomy Generation. Dynamic Discovery. What is Quality ?. “Best value for the money”

ron
Download Presentation

Quality Taxonomies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5th, 2001

  2. Ontology / Taxonomy Static Discovery Root Ontology Taxonomy Generation Dynamic Discovery

  3. What is Quality ? • “Best value for the money” • According to this definition, you are entitled to get high performance from a costly product; likewise a low cost product or service is expected to be a poor delivery. For example, a loose demo delivery is both predictable and acceptable, since its quality is: low conformance / low cost.

  4. What is Quality ? • “Good Quality is Nominal Conformance” • Taxonomy Quality is defined as Taxonomy Conformance to: • Valid requirements; • Explicitly documented development standards; and, • Implicit characteristics that are expected of all professionally developed taxonomies, such as the desire for good maintainability.

  5. Standards • ISO 2788-1986 • International Organization for Standardization. Documentation—Guidelines for the Establishment and Development of Monolingual Thesauri. 2nd ed. n.p.: ISO, 1986. (ISO 2788-1986(E)). (Available in the U.S. from American National Standards Institute) • ISO 5964-1985  • International Organization for Standardization. Documentation—Guidelines for the Establishment and Development of Multilingual Thesauri. n.p.: ISO, 1985. (ISO 5964-1985(E)). (Available in the U.S. from American National Standards Institute) • ANSI/NISO Z39.19-1993 • National Information Standards Institute. Guidelines for the Construction, Format, and Management of Monolingual Thesauri. Bethesda, MD: NISO Press, 1994. 69p. (ANSI/NISO Z39.19-1993) • SEMIO Quality Plan v1 2000 • ISO/IEC 13250 Topic Maps • RDF • Please refer to RDF at http://www.w3.org/RDF and XML at http://www/w3/org/XML

  6. Project Plan • Kick-off • Requirements Review • Lexicon Review • Taxonomy Review • Tags Review • Final Review

  7. 1. Kick-off • Objectives • Purpose • Scope • Scale • Users • Conditions of receipt • Roles • Supplier • Customer • Admin • KE • Experts • Users • Planning • Training and Transfer

  8. 2. Requirements Review • Sources • Lexicon • Ontology • Install

  9. Sources • Dispersion (Multiplicity, Size, Homogeneity) • Refresh • Access

  10. Typical Patterns • Disparity • Adjust sources • Adjust crawl strategy • Isolate communities / taxonomies

  11. Lexicon • Vocabularies, etc. • Substitutions: Acronyms, Synonyms, etc. • Preferred Keywords: Brand Names, etc. • Banned Keywords

  12. Typical Patterns • Lack of requirements • Use Librarian Resources

  13. Ontology • Thesaurus ? • Is the information domain analysis complete, consistent, and accurate ? • Is the partitioning of the problem complete ?

  14. Typical Patterns • Directory versus Taxonomy • Isolate “directory” branches • Thesaurus versus Taxonomy • Put an ontology on top of thesaurus • Check ASAP match of thesaurus generics with extracted lexicon • Very high level design for top categories requirements • Plan to work bottom-up • See also Taxonomy (functions, combinations, etc.)

  15. Install • Implementation / Integration: • Are external and internal interfaces properly defined? • Are all requirements traceable to the system level? • Has prototyping been conducted for the user/customer? • Is performance achievable within the constraints imposed by other system elements? • Are requirements consistent with schedule, resources, and budget?

  16. Typical Patterns • Scale • Security • Missing Documents

  17. 3. Lexicon Review • Coverage • Extracted words / Words • (Extracted Index / Index) • Sources bench-marking • Coverage • Extraction quality • Topic distribution • Structure • Most Frequent Phrases • Most Productive Generics • Substitutions • Exceptions

  18. Typical Patterns • Low level of frequency / quality for the most meaningful content • Increase size of value corpus • Filter and re-import lexicon

  19. 4. Taxonomy Review • Taxonomy Operation • Correctness • Reliability • Usability • Integrity • Efficiency • Taxonomy Revision • Maintainability • Flexibility • Testability • Taxonomy Transition • Portability • Reusability • Interoperability

  20. Tax Liability Loan Term loan Short-term loan Folk Taxonomies Design The Berlin and Kay model: Taxonomy = Nomenclature + Terminology Unique Beginner Life Form Generic Specific Varietal

  21. Correctness • Accuracy • Completeness • Consistency

  22. Accuracy • Precision • Recall

  23. Completeness Taxonomy Maps Lexicon Collection

  24. Tagging Taxonomy Maps Lexicon Document Collection Concentration Works Against Quality • Tagging Coverage • Ontology Coverage • Hook Coverage • Map Coverage • Lexical Coverage • Collection Coverage

  25. Consistency:Typical Patterns • Objectivization • Hyperonymy • Speciation • Necessity

  26. Employment Firing Hiring Salaries Avoid functional categories Don’t mix functions / objects Exhaust scripts Match idiomatic phrases Objectivization

  27. Parts Air Conditioning Belts and Hoses Body Brake System Chassis Engine Exhaust System Fuel System Glass Ignition Avoid meronymy Don’t mix meronymy / hyperonymy Exhaust prototypes Genericity

  28. Person Unwelcome person Unpleasant person Selfish person Opportunist Backscratcher Avoid “strings” of categories Avoid (non-idioms) properties for categories Speciation (WordNet)

  29. Necessity • Avoid non-productive categories • Avoid combinations of categories

  30. Nomenclature (Design Structure) Quality Index • Depth • Width • Balance

  31. Complexity Index • Cyclometric complexity increases with number of Cross References within the Taxonomy, giving an indication of complexity and difficulty of testing. • Taxonomy Complexity Index combines: • autonomy • closure • similarity • typicality • commonality • redundancy • stability

  32. Maturity index • The IEEE standard 982.1-1988 suggests a taxonomy maturity index to provide an indication of the stability of the taxonomy . • Maturity Index combines: • number of modules in current ontology / taxonomy. • number of modules in current ontology / taxonomy that have been changed. • number of modules added to current ontology / taxonomy. • number of modules deleted from the previous version of the ontology / taxonomy.

  33. 5. Tags Review • Document coverage • Concepts coverage <tagset> <document> <docurl>http://www.TaxSource.com</docurl> <tag> <tagname>Liability</tagname> <weight>1.289</weight> </tag> <tag> <tagname>Federal Funds</tagname> <weight>0.746</weight> </tag> </document></tagset>

  34. 6. Final Review • Receipt • Maintenance

  35. Quality Taxonomies Jim Nisbet niz@semio.com Knowledge Technologies 2001

More Related