1 / 9

Why should we invest in DWF ?

Why should we invest in DWF ?. Peter Wittenburg CLARIN Research Infrastructure EUDAT Data Infrastructure. www.clarin.eu. www.eudat.eu. Things that keep us busy I. understanding language roots feature matrix extracted from many cross-disciplinary & cross-country resources

trygg
Download Presentation

Why should we invest in DWF ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Why should we invest in DWF? Peter Wittenburg CLARIN Research Infrastructure EUDAT Data Infrastructure www.clarin.eu www.eudat.eu

  2. Things that keep us busy I • understanding language roots • feature matrix extracted from many • cross-disciplinary & cross-country • resources • phylogenetic algorithms to compute • dependency trees • can’t easily access required resources • understanding language machine • so many institutes creating brain • image data • do we know about them and their • recording contexts? • can we access them easily?

  3. Things that keep us busy II • automatic language processing • speech and body movement (gesture, signing, mimics, etc.) recognition is hard • no one stochastic recognizer will do • there is so much technology out there worldwide and components from different disciplines • do we know about them • can we easily access them

  4. In CLARIN we are so good  • developed a flexible component model to allow user to create metadata profiles • have established an open Data Category Registry (ISOcat) system based on ISO 12620 (compliant with ISO 11179) • got a professional tool set allowing users • to create, register and share components and profiles • to create MD descriptions efficiently

  5. In CLARIN we are so good  Virtual Language Observatory

  6. In CLARIN we are so good  • got a distributed SOA domain with many language&speech tools integrated / being integrated • use metadata profile matching to find appropriate tools when chaining services

  7. but ...  • there is so much data (& software) out there no one still knows of resp. no one is able to access • from about 200 linguistic departments creating data there are less than a handful centers in EU who have a proper repository, do archiving and curation, give access, allow computation and enrichments, are audited, etc. • no way to allow machines currently to access most of the resources blindly - common way: download & squeeze each individual resource/collection • proper metadata at high granularity still unpopular • only some harmonization at international level • only incidentally discipline crossing chats

  8. cross-disciplinary aspect network of discipline hubs • large number of discipline-specific centers with access services • all disciplines similar • should we all do LTA, offer capacity computing, run PID, etc.? • a network of strong data & compute hubs • let them give COMMON services such as LTP, data staging, PID, AAI, etc. network of large data hubs

  9. but ...  • do we know what common services are and do we accept • do we understand data organizations of communities to design services • do we have agreed mechanisms working on large and complex data sets in a secure way in a federation • do we agree on the same essential building blocks for a common data infrastructure • AND - many communities are organized worldwide • Thus - need a GLOBAL forum to agree on some essentials that will make data-driven research more efficient and foster new insights

More Related