150 likes | 303 Views
Vocabulary Workshop, RAL, February 25, 2009. Controlled Vocabularies: What, Why, How?. Metadata. Love it or hate it without metadata automated data handling isn’t possible For automated data handling to be possible across distributed data sources metadata standards are required
E N D
Vocabulary Workshop, RAL, February 25, 2009 Controlled Vocabularies:What, Why, How?
Metadata • Love it or hate it without metadata automated data handling isn’t possible • For automated data handling to be possible across distributed data sources metadata standards are required • Standardised metadata comprises fields that represent real world entities such as location, time, phenomena, etc.
Metadata • These fields need to be populated • Plaintext may be used. Makes population easy, but it’s next to useless. • Some real examples: • A wide variety of chemical and biological parameters • Amplitude de l'echo retrodiffuse • Cu, Zn, Fe, Pb, Cd, Cr, Ni in biota • MACR0-MEIOFAUNA,SED BIOCHEMISTRY,ZOOPLANKTON, CILIATES,BACT CELLS,BACT BIOMASS,LEUCINE UPT,PRIM. PROD,METABOL, COCCOLITH • Plaintext should be confined to abstracts
Controlled Vocabularies • Much better to use concepts labelled using universally agreed terms that have universally agreed meanings • A collection of concepts designed to populate a given metadata field may be called a controlled vocabulary • Controlled vocabularies • Ensure consistent spellings • Ensure consistent syntax • Well-managed controlled vocabularies • Prevent metadata misunderstandings • Maintain a static relationship between metadata fields and the real world
Thesuari • Concepts within a controlled vocabulary may be semantically connected using simple relationships: • Blue broader colour • Colour narrower blue • Colour related pigmentation • Concepts from different controlled vocabularies describing the same type of thing may be semantically connected using simple mapping relationships: • Bacillariophycaea exactMatch diatoms • IPTS68 temperature closeMatch ITS90 temperature • Nutrients in rivers relatedMatch nitrate in water bodies • Salinity broadMatch physical oceanography • Physical oceanography narrowMatch salinity • The results may termed thesauri
Ontologies • But what if the controlled vocabularies describe different types of thing? • We can relate them by increasing the semantic richness of the relationships • For example: • We could have a controlled vocabulary of instruments • We could also have a controlled vocabulary of parameters
Ontologies • We can link these up using relationships such as: • Themosalinograph measures salinity • Fluorometer measures chlorophyll • Air temperature measuredBy psychrometer • The result may be termed an ontology
Ontologies • Ontology relationships are: • Semantically rich • Potentially abundant • Software agents need to have some relationship understanding to exploit the knowledge encoded in the ontology • This is achieved through relationships describing relationships called rules
Knowledge Representation • Relationships between concepts may be expressed using Resource Description Framework (RDF) • W3C standard XML encoding having ‘triples’ as its basic building block • Each triple has a subject, a predicate and an object. For example: • Colour related pigmentation • Thermosalinograph measures salinity • Familiar?
Knowledge Representation • Controlled vocabularies (concept collections) and thesauri may be represented using the Simple Knowledge Organization System (SKOS) • W3C standard XML schema based on RDF • Jointly developed by STFC and Manchester University Computer Science • 2008 version is the one to use
Knowledge Representation <?xml version="1.0" ?> - <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dc="http://purl.org/dc/elements/1.1/"> - <skos:Concept rdf:about="http://vocab.ndg.nerc.ac.uk/term/P011/116/TEMPS901"> <skos:externalID>SDN:P011:116:TEMPS901</skos:externalID> <skos:prefLabel>Temperature (ITS-90) of the water column by CTD or STD</skos:prefLabel> <skos:altLabel>CTDTmp90</skos:altLabel> <skos:definition>Unavailable</skos:definition> <dc:date>2009-02-09T10:45:32.262+0000</dc:date> <skos:broadMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P021/37/TEMP" /> </skos:Concept> </rdf:RDF>
Knowledge Representation • Ontologies may be represented using Web Ontology Language (OWL) • W3C standard XML schema based on RDF • Example OWL document http://mida.ucc.ie/ont/20080124/theme.owl • Alternative simple text encodings are available such as Open Biomedical Ontologies (OBO) • OBO used for NERC-related EnvO ontology
Knowledge Management Tools • RDF • Tools abound – see for example http://planetrdf.com/guide/ • Jena is one of the better known • SKOS • See the SKOS Tool Shed http://esw.w3.org/topic/SkosDev/ToolShed • Note this includes a Protégé plugin
Knowledge Management Tools • OWL • Protégé with appropriate plugin is the most widely used • There are commercial alternatives such as TopBraid Composer • MMI (http://marinemetadata.org) has developed a vocabulary to OWL converter (voc2OWL) • OBO • Text so text tools work • OWL and SKOS converters available
Knowledge Management Tools • Mapping • MMI have developed a mapping tool (VINE) to build maps from two OWL files • Visualisation • Concept maps are useful • Cmap tools is very good • FreeMind (open source)