1 / 31

MPI WP2/3 Report Metadata Integrated Resource Domain Portal Creation

Intera. MPI WP2/3 Report Metadata Integrated Resource Domain Portal Creation. Peter Wittenburg MPI for Psycholinguistics Nijmegen NL. INTERA WP2 Summary November 2004. What is Metadata?. Intera. Annotation Resource. Primary Functions of MD visibility of resources

rexs
Download Presentation

MPI WP2/3 Report Metadata Integrated Resource Domain Portal Creation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intera MPI WP2/3 ReportMetadata Integrated Resource Domain Portal Creation Peter Wittenburg MPI for Psycholinguistics Nijmegen NL INTERA WP2 Summary November 2004

  2. What is Metadata? Intera Annotation Resource • Primary Functions of MD • visibility of resources • searching/browsing • organization of corpus • management of corpus • event documentation • etc • Metadata Description • Language about • Researcher • Modalities • Content Type • Informant Name • Age • Microphone Type • Resource Pointers • etc etc Sound Resource Video Resource • Emerging Functions of MD • metadata is virtual fingerprint of the resource • can be used instead of resource • ready for the Semantic Web – virtual resource domains INTERA WP2 Summary November 2004

  3. Metadata Process can be grouped to large distributed LR collections can be grouped to large distributed MD catalogues searching for resources possible MD Search Large Collection of LR Large Catalogue of MD Content Search IMDI provides a core description and special extensions for resource types Metadata Description MD Creation the creation process is comparatively simple; any time the resource is updated some MD information has to be updated as well Intera can be any type of Language Resource (Annotated Media, Lexica, Grammars, etc) Language Resource Resource Creation the creation process is iterative, mostly very complex and dependent on the resource type INTERA WP2 Summary November 2004

  4. Strategic Goals and Impact Intera • strategic goals are about survival after project lifetime • stimulate the idea of a building a joint metadata domain • “critical mass” idea • ISO standardization based • impact • from few subcontractors to over 50 institutions world-wide • ISO TC37/SC4 standardization activity (ISO, ->industry) • LIRICS – adaptation of relevant tools to ISO DCR • DAM-LR – bring the DELAMAN archives into Data-GRID • web-based exploration and commentary frameworks • MPI, CMU, U Melbourne, etc working on this • but • metadata creation is hard, it also means organizing, cleaning … • needs more evangelization and benefits INTERA WP2 Summary November 2004

  5. DAM-LR/DELAMAN GRID Intera EMELD ELAR INL MPI Lund ANLC AILLA AMPM LACITO PARADISEC INTERA WP2 Summary November 2004

  6. Stabilization and Framework Intera • IMDI 3.04 now stable and part of ISO standardization efforts • all categories are in ISO DCR (WP3) • DCR is key element on the way to Semantic Web • IMDI infrastructure now mature and stable (open source, free) • professional IMDI Editor (creating correct IMDI XML) • CV editor • IMDI browser (can operate in linked IMDI XML domains) • gateway to OLAC and Dublin Core • HTML browsing • Google-like and complex searching • Access Rights Management • portal creation • web-based Ingestion (not Intera - in progress) • web-based exploration (not Intera – in progress) INTERA WP2 Summary November 2004

  7. WP3 Issues Intera • Getting Metadata into the Semantic Web Framework • just this whole week ISO TC37/SC4 meeting in Pisa • IMDI is in the ISO DCR • all ISO 11179 and ISO 12620 compliant • localization of IMDI in DCR (Se, Gr, D, E, Fr, Nl, It, Sp) • ISO DCR is based on XML (not RDF) • SYNTAX tool at LORIA is web-accessible • next steps: • integrate OLAC(DC) and TEI (LIRICS) • link tools with SYNTAX via Web-services • already done for a lexicon tool • still deep discussions (is_a, has_a relation) • separate relation repositories (in RDF/OWL of course) • different layers of DCRs remains an issue INTERA WP2 Summary November 2004

  8. WP3 DCR Intera INTERA WP2 Summary November 2004

  9. IMDI Editor Intera also supports node creation and profiles INTERA WP2 Summary November 2004

  10. Corpus Structure Building Intera INTERA WP2 Summary November 2004

  11. IMDI Browser Intera also supports lexica, catalogue metadata and profiles INTERA WP2 Summary November 2004

  12. Structured IMDI Search Intera INTERA WP2 Summary November 2004

  13. HTML Browsing Intera INTERA WP2 Summary November 2004

  14. Unstructured Search Intera INTERA WP2 Summary November 2004

  15. Access Rights Management Intera INTERA WP2 Summary November 2004

  16. MD Infrastructure/Portal Intera Browsing & Searching IMDI Browser & IE IMDI Domain via INTERNET corpus structure generation MPI BAS Metadata Editing IMDI Editor Excel S S S S S S S S S S S S Corpus exploitation (WP4) HRELP Workshop London November 2003 INTERA Review November 2003

  17. INTERA Domain Intera State INTERA sub-contracts INTERA WP2 Summary November 2004

  18. IMDI Domain Intera • Europe • ELRA Paris • INALF Nancy • DFKI Saarbrücken • University of Saarland • Bavarian Speech Archive Munich • Meertens Institute Amsterdam • University of Florence • ILSP Athens • ILC Pisa • University of Madrid • Max-Planck-Institute Nijmegen • University of Kiel • University of Bochum • Free University of Berlin • University of Bonn • University of Bielefeld • University of Helsinki • University of Helsinki • Phonogrammarchiv Vienna • University of Groningen • Kotus Project Helsinki • Sweden’s National Dialect Archive Lund • European Sign Language Communities • (Se, UK NL, D) • University of Utrecht • University of Uppsala • University of Stavanger • University of Lund • University of Leipzig • University of Erfurt • University of Leiden • University of Frankfurt • … • International • Federal University of Rio de Janeiro • University of Colorado • University of Buenos Aires • University of Kansas • University of Victoria • University of Sydney • University of Melbourne • E Michigan University • Wayne State University • AILLA Austin • … Big problem: integration and portal effort INTERA WP2 Summary November 2004

  19. MD Creation Problems Intera • Conclusions • contracts are difficult – much overhead for little money • no broad experience for MD creation • much interaction necessary over all aspects • no standard contract form – adaptations needed • institutes often wanted more money than expected • rather chaotic situation in some cases as basis • some cases no handiness with XML • problems with changing student assistants • special wishes wrt MD (IMDI flexible enough) • MPI expected stepwise availability – delivery at the end is practice • strong support for the ENABLER declaration necessary • creating MD remains extra work INTERA WP2 Summary November 2004

  20. Portal Creation – XML Browsing Intera Task: creation of a web-site that offers all options for a selected domain of IMDI resources just get the URL’s and create a root node INTERA WP2 Summary November 2004

  21. Portal Creation –Searching Portal Node Fast Index IMDI Repositories Intera harvest all data by traversing links and validate create a fast index file (using Java Library DBMS) just select a button in the browser so: simple, everyone can setup a portal INTERA WP2 Summary November 2004

  22. Portal Creation – HTML Support Intera install Tomcat server and IMDI-Web-Interface software traverses tree to establish database large index file is created under the cover give a HTML entry point (HTTP server) Web Client TOMCAT Server Web-Server MPI Web-Server BAS IMDI-Web-Interface Database INTERA WP2 Summary November 2004 Portal Site IMDI Provider IMDI Provider

  23. Portal Creation – DC/OLAC Gateway Intera DC Service Provider the database can be used to fulfill the OAI protocol for metadata harvesting; any record can be served Servlet OAI-PMH Fast Index Portal Node INTERA WP2 Summary November 2004 IMDI Repositories

  24. Dissemination Intera • Dissemination / Events • Intern Metadata Workshop Nijmegen November 02 • Open Forum on Metadata Registries Santa Fe January 03 • Lexicon Workshop Munich February 03 • Workshop on Resource Storage and Access Göttingen February 03 • Intern Workshop on LR Archiving London March 03 • Sign Language Workshop Nijmegen May 03 • Intern E-Meld Workshop Ypsilanti July 03 • Intern Linguistic Congress Prague July 03 • ENABLER Workshop Paris August 03 • DRH Meeting Cheltenham September 03 • Intern PARADISEC Archiving Workshop Sydney October 03 • HRELP Archiving Workshop London November 03 • etc • LREC 2004 – Demonstration of infrastructure and MD domain • Two Metadata Flyer (MPI – U Lund) distributed at various occasions • Web-Site Design • several training workshops done INTERA WP2 Summary November 2004

  25. INTERA Portal Screenshots INTERA WP2 Summary November 2004

More Related