1 / 15

Controlled Vocabularies: What, Why, How?

Vocabulary Workshop, RAL, February 25, 2009. Controlled Vocabularies: What, Why, How?. Metadata. Love it or hate it without metadata automated data handling isn’t possible For automated data handling to be possible across distributed data sources metadata standards are required

zody
Download Presentation

Controlled Vocabularies: What, Why, How?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Vocabulary Workshop, RAL, February 25, 2009 Controlled Vocabularies:What, Why, How?

  2. Metadata • Love it or hate it without metadata automated data handling isn’t possible • For automated data handling to be possible across distributed data sources metadata standards are required • Standardised metadata comprises fields that represent real world entities such as location, time, phenomena, etc.

  3. Metadata • These fields need to be populated • Plaintext may be used. Makes population easy, but it’s next to useless. • Some real examples: • A wide variety of chemical and biological parameters • Amplitude de l'echo retrodiffuse • Cu, Zn, Fe, Pb, Cd, Cr, Ni in biota • MACR0-MEIOFAUNA,SED BIOCHEMISTRY,ZOOPLANKTON, CILIATES,BACT CELLS,BACT BIOMASS,LEUCINE UPT,PRIM. PROD,METABOL, COCCOLITH • Plaintext should be confined to abstracts

  4. Controlled Vocabularies • Much better to use concepts labelled using universally agreed terms that have universally agreed meanings • A collection of concepts designed to populate a given metadata field may be called a controlled vocabulary • Controlled vocabularies • Ensure consistent spellings • Ensure consistent syntax • Well-managed controlled vocabularies • Prevent metadata misunderstandings • Maintain a static relationship between metadata fields and the real world

  5. Thesuari • Concepts within a controlled vocabulary may be semantically connected using simple relationships: • Blue broader colour • Colour narrower blue • Colour related pigmentation • Concepts from different controlled vocabularies describing the same type of thing may be semantically connected using simple mapping relationships: • Bacillariophycaea exactMatch diatoms • IPTS68 temperature closeMatch ITS90 temperature • Nutrients in rivers relatedMatch nitrate in water bodies • Salinity broadMatch physical oceanography • Physical oceanography narrowMatch salinity • The results may termed thesauri

  6. Ontologies • But what if the controlled vocabularies describe different types of thing? • We can relate them by increasing the semantic richness of the relationships • For example: • We could have a controlled vocabulary of instruments • We could also have a controlled vocabulary of parameters

  7. Ontologies • We can link these up using relationships such as: • Themosalinograph measures salinity • Fluorometer measures chlorophyll • Air temperature measuredBy psychrometer • The result may be termed an ontology

  8. Ontologies • Ontology relationships are: • Semantically rich • Potentially abundant • Software agents need to have some relationship understanding to exploit the knowledge encoded in the ontology • This is achieved through relationships describing relationships called rules

  9. Knowledge Representation • Relationships between concepts may be expressed using Resource Description Framework (RDF) • W3C standard XML encoding having ‘triples’ as its basic building block • Each triple has a subject, a predicate and an object. For example: • Colour related pigmentation • Thermosalinograph measures salinity • Familiar?

  10. Knowledge Representation • Controlled vocabularies (concept collections) and thesauri may be represented using the Simple Knowledge Organization System (SKOS) • W3C standard XML schema based on RDF • Jointly developed by STFC and Manchester University Computer Science • 2008 version is the one to use

  11. Knowledge Representation <?xml version="1.0" ?> - <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dc="http://purl.org/dc/elements/1.1/"> - <skos:Concept rdf:about="http://vocab.ndg.nerc.ac.uk/term/P011/116/TEMPS901"> <skos:externalID>SDN:P011:116:TEMPS901</skos:externalID> <skos:prefLabel>Temperature (ITS-90) of the water column by CTD or STD</skos:prefLabel> <skos:altLabel>CTDTmp90</skos:altLabel> <skos:definition>Unavailable</skos:definition> <dc:date>2009-02-09T10:45:32.262+0000</dc:date> <skos:broadMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P021/37/TEMP" /> </skos:Concept> </rdf:RDF>

  12. Knowledge Representation • Ontologies may be represented using Web Ontology Language (OWL) • W3C standard XML schema based on RDF • Example OWL document http://mida.ucc.ie/ont/20080124/theme.owl • Alternative simple text encodings are available such as Open Biomedical Ontologies (OBO) • OBO used for NERC-related EnvO ontology

  13. Knowledge Management Tools • RDF • Tools abound – see for example http://planetrdf.com/guide/ • Jena is one of the better known • SKOS • See the SKOS Tool Shed http://esw.w3.org/topic/SkosDev/ToolShed • Note this includes a Protégé plugin

  14. Knowledge Management Tools • OWL • Protégé with appropriate plugin is the most widely used • There are commercial alternatives such as TopBraid Composer • MMI (http://marinemetadata.org) has developed a vocabulary to OWL converter (voc2OWL) • OBO • Text so text tools work • OWL and SKOS converters available

  15. Knowledge Management Tools • Mapping • MMI have developed a mapping tool (VINE) to build maps from two OWL files • Visualisation • Concept maps are useful • Cmap tools is very good • FreeMind (open source)

More Related