1 / 16

A generic data import layer for the Berlin Taxonomic Information Model

A generic data import layer for the Berlin Taxonomic Information Model. Anton Güntsch, Andreas Müller & Walter G. Berendsohn Botanic Garden and Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories. The Berlin Taxonomic Information Model.

herne
Download Presentation

A generic data import layer for the Berlin Taxonomic Information Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A generic data import layer for the Berlin Taxonomic Information Model Anton Güntsch, Andreas Müller & Walter G. Berendsohn Botanic Garden and Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

  2. The Berlin Taxonomic Information Model • Concepts as name-reference pairs • Explicit representation of relations between concepts • Mechanisms for calculating factual data

  3. Berlin Model used by • Euro+Med • Med-Checklist • IOPI Species Plantarum Initiative • Algaterra • Dendroflora of El Salvador • German Standard List of Vascular Plants and Ferns • Reference List of the German Mosses • EDIT WP6

  4. Data imports (1) • Heterogeneous sources (e.g. text files, printer-formatted data, spread sheets, DBs) • Complex target model  Imports consume a substantial fraction of project costs which are often substantially underestimated.

  5. Data imports (2)

  6. Data imports (2) Needs a great deal of human input Can be automated

  7. Step-by-step transformation of taxonomic information: preparation • Identify patterns • Communicate problems • Export to simple XML

  8. Step-by-step transformation of taxonomic information: preparation <Aizoaceae xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <AcceptedTaxa> <Taxon> <ID>7814</ID> <Genus>Acrodon</Genus> <Epithet>bellidiflorus</Epithet> <AllAuthorsString>N.E.Br.</AllAuthorsString> <SubSpeciesEpi>v</SubSpeciesEpi> <AllAuthorsStringSubSpecies/> <SpeciesName>Acrodon bellidiflorus</SpeciesName> </Taxon> <Taxon> <ID>8566</ID> <Genus>Acrodon</Genus> <Epithet>subulatus</Epithet> <AllAuthorsString>(Miller) N.E.Br.</AllAuthorsString> <AllAuthorsStringSubSpecies/> <SpeciesName>Acrodon subulatus</SpeciesName> </Taxon> </AcceptedTaxa> <SynonymTaxa> […] </SynonymTaxa> </Aizoaceae>

  9. Step-by-step transformation of taxonomic information: phase I • Transform into soft schema xml • Re-arrange, lump and split elements • Don‘t check „taxonomic integrity“ • Tools: XSLT, Taxonomic Transformation Library (TTL), and others

  10. Step-by-step transformation of taxonomic information: phase I <BMIDataSource xmlns="http://www.bgbm.org/schemas/BMI/s0.7" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.bgbm.org/schemas/BMI/s0.7 P:\XMLSchema\ImportSchicht\BMISoft0.7.xsd"> <MetaData> […] </MetaData> <ConceptReference> <RefCategory>database</RefCategory> <RefString>Aizoaceae</RefString> </ConceptReference> <PotentialTaxa> <PTaxon> <TaxonName> <Rank>species</Rank> <GenusEpi>Acrodon</GenusEpi> <SpeciesEpi>bellidiflorus</SpeciesEpi> <AllAuthors>N.E.Br.</AllAuthors> </TaxonName> <TaxonStatus>Accepted</TaxonStatus> <IdInSource>7814</IdInSource> <RelatedTaxon ref="21" relType="basionym"/> </PTaxon> […] </PotentialTaxa> </BMIDataSource>

  11. Step-by-step transformation of taxonomic information: phase II • Transform into strict schema XML • Check data integrity • Report malformed data • Tool: TTL

  12. Step-by-step transformation of taxonomic information: phase II <BMIDataSource xmlns="http://www.bgbm.org/schemas/BMI/0.7" […]> <MetaData> […] </MetaData> <ConceptReference> <RefCategoryAbbrev>BK</RefCategoryAbbrev> <RefString>refString</RefString> <DatabaseID>4</DatabaseID> </ConceptReference> <PotentialTaxa> <PTaxon> <TaxonName> <SpeciesName> <GenusEpi>Acrodon</GenusEpi> <SpeciesEpi>bellidiflorus</SpeciesEpi> <AuthorTeam> <AuthorTeamCache>N.E.Br.</AuthorTeamCache> </AuthorTeam> </SpeciesName> </TaxonName> <TaxonStatusAbbrev>A</TaxonStatusAbbrev> <IdInSource>7814</IdInSource> <RelatedTaxa> […] </RelatedTaxa> </PTaxon> </PotentialTaxa> </BMIDataSource>

  13. Step-by-step transformation of taxonomic information: phase III • Import into database • Duplicate detection and resolution • No User interaction required • Tools: Berlin Model Object Layer (BMOL)

  14. Berlin Model Object Layer (BMOL) • Hides the database key system • Duplicate detection • Core-Module provides objects corresponding to database entities • Mapper-Module interfaces with database • Persistence-Module manages data flow between core-module and mapper-module

  15. Outlook • Method has been successfully tested for import of Med Checklist I, II & IV • Further imports planned for 2006 • Programming of additional mapper modules desirable

  16. www.bgbm.org/biodivinf/

More Related