1 / 35

Distributed Databases and metadata

Distributed Databases and metadata. G. Bégni, H. Makhmara - MEDIAS-France July 18, 2004 ENVIROMIS Tomsk. Aims of the presentation. Understanding the principles of metadata and databases. Making the scientific community aware of the efforts expected in terms of data documentation.

egan
Download Presentation

Distributed Databases and metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Databases and metadata G. Bégni, H. Makhmara - MEDIAS-France July 18, 2004 ENVIROMIS Tomsk

  2. Aims of the presentation • Understanding the principles of metadata and databases. • Making the scientific community aware of the efforts expected in terms of data documentation. • Highlighting the positive impacts of such efforts. • Demonstrating the need of an easy way to access distributed databases.

  3. Approach • Presentation of the AMMA context and its constraints: status of the problem. • Reflection on a solution. • Abstract description of the various elements part of the solution. • Selection and justification of standards and techniques. • Assessment of selections.

  4. AMMA context • Scientific level • Multi-disciplinary • Multi-scale. • Technical level • Multi-format • Multi-volume • Multi-structure • Multi-location. • Cultural level • Multi-lenguage • Multi-usage • Multi-possibilities.

  5. Constraints involved • Providing the various communities with the best suited access to data (language, medium, cost, services…) • Guaranteeing the durability of data wherever they are produced. • Ensuring the durability of services as time goes by (technological developments).

  6. Access services • Easy web interface for data research and location (geographical, temporal, thematic, keywords). • Transparent service to access heterogeneous distributed data (possibilities of compiling…). • Homogeneous documentation for heterogeneous datain order to optimise their exploitation.

  7. Data durability • Multiple and systematic back-up procedure. • Data transparency in relation to technological changes (hardware, software). • Transparent data exploitation as time goes by.

  8. A solution • Fully defined back-up process. • Data storage in standardised formats. • Clear data documentation for future exploitation.

  9. Service durability • Services should not depend on any proprietary or « exotic » software. • The quality of a service should not deteriorate according to technological changes.

  10. A solution • Services based on standards. • Services based on the « Open source».

  11. To sum-up: • Standardise storage. • Standardise services. • Standardise exploitation. • However, some data formats cannot be standardised (satellite imaging). • Neither can the related services.

  12. Principles applied • Every item liable to be standardised should be standardised. • There should be a system gateway based on standards only. • Every item that cannot be standardised should be described in a standardised way.

  13. A standard for each element • Data storage: ANSI/ISO, SQL, XML. • Data description: FGDC-STD-001-1998 or ISO 19115. • Service description: W3C SOAP. • Catalogue: ANSI/ISO 23950 (Z39.50).

  14. Data description Metadata • Formed from a Greek root(« meta »). • What surpasses, encompasses a subject, a science.(Le Robert Dictionary). • Denoting a nature of a higher order or more fundamental kind. (Ofxord Talking Dictionary). • English: metadataFrench: métadonnées. • Literally speaking, metadata are data about data. • To be more precise, they are structured sets of information that describe resources.

  15. Metadata standards • Metadata have always existed. • An effort of world-wide standardisation has been undertaken for several years. • Several (georeferenced) standards: • Content Standard for Digital Geospatial Metadata: FGDC-STD-001-1998. • ISO 19115 since the end of 2002. • FGDC is a de facto standard.

  16. Advantages • Homogeneous presentation. • Pooled developments. • Possibility to automate data processing. • Comparison of examples: • GeoConnections Portal, Canada: http://geodiscover.cgdi.ca • Portal on desertification monitoring (OSS/Medias/SCOT): http://geooss.oss.org.tn/geooss

  17. Efforts askedfrom data providers • Be aware of standards. • Endeavour to describe data as completely as possible. • Use data exchange formats as simple and consistent as possible. -------------------- • Data providers do not have to care about the technical or formal aspects of standards. • Database managers will provide them with easy and user-friendly tools to describe their data.

  18. AMMA INFORMATION SYSTEM ARCHITECTURE 4.Choose datasets 1.Search by criteria (User friendly interface) MetaCatalog (Portal to the AMMA I.S) 6. Retrieve datasets 4.Query data 3.Retrieve metadata 2.Query metadata Meta database (ISO 19115 AND/OR FGDC) 5. Locate and query datasets from relevant data sources Exchange protocol Exchange protocol Exchange protocol DB AMMASAT DB SOP DB LOP

  19. Technical diagram Other catalogues (GCMD, Clearinghouse FGDC) ZOOM YAZPHP Web forms XML records Z39.50 Zebra indexer Metadata creation - validation Zebra server ZAP client Import XML Catalogue service (any user) Edition service (data provider)

  20. Characteristics • Management of multi-standard metadata • ISO 19115 • FGDC • DIF if XML schema. • Transparent to the data provider. • Transparent to the user.

  21. Data access services • Médias-France is devoloping generic data access services • These services have to be auto descriptive, registered and with well know interfaces • For the moment, we focus our efforts on software permitting access to geographically distant databases (Distributed databases)

  22. Principe • Each service is registered within a directory server • Each data source declares what data it serves • A web portal is used by scientists to locate and request data from different sources • Data is sent back to the user in a standardized format

  23. Implementation • Data sources are under PostgreSQL, flat files or other RDBSM systems • Each data server is a DODS servlet (Distribued Oceanographic Data System) • Sevlet container is Apache Tomcat • Metada are in XML files

  24. Prospects • Develop Web services based on W3C SOAP recommandation • Implement a Directory service for services • Hope share development effors with other organisations, within the framework of international projects (Funded by EC, INTAS…)

More Related