1 / 14

Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur et de la Recherche

Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director « Information & Communication Technologies » Department French Ministry of Research. Support to LT: Techno-langue.

floyd
Download Presentation

Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur et de la Recherche

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director « Information & Communication Technologies » Department French Ministry of Research

  2. Support to LT: Techno-langue • Report to the Prime Minister (November 2000) • Techno-langue Action • Language technology survey and evaluation • Articulate with related existing programs • ICT Research & Innovation Technological Networks (RRIT) • Telecommunications, Software Engineering, Audiovisual & Multimedia • Ministry of Research action on Business Intelligence Tools (VSE) Cocosda / WRITE Workshop

  3. Techno-langue structure Infrastructure program to support core LT progress, while innovative application projects stay with RRIT (110 M€ / year) TELECOM SOFT AMM VSE Cocosda / WRITE Workshop

  4. Techno-langue Call • Language resources (data, tools) • Evaluation (technology, application) • Standards • Technological survey Cocosda / WRITE Workshop

  5. Techno-langue Call • Launched in 2002, 3 year duration • Funding by 3 ministries (Research, Industry, Culture) • Same on Vision Technology (Techno-vision) in 2005 (MoD) • International cooperation • Foreign entities may participate in the projects, with their own funding • All funded projects completed in 2006 • Joint Techno-X workshop (ASTI conference, October 2005) • Paper at LREC’2006 (S. Chaudiron, J. Mariani) + 16 papers • Book under preparation • Public presentation of results (Fall 2006) • Feedback to research and industry (RRIT, VSE/Business Intelligence) • Presentation to administration Agencies (DoD, MAE…) • LT in 2006 « Data Masses and Ambient Intelligence » CfP • Managed by ANR – 3 M€ funding for LT Cocosda / WRITE Workshop

  6. Results of the Call • 52 proposals submitted • 21 projects funded • 94 participants • 33 industry • 39 public research • 11 other categories (Associations, CEA, French DoD…) • 11 foreign (Bell Labs (USA), NII (Japan), EPFL, LATL…) • Budget: 20 M€ effort- 7.5 M€ public funding (3 years) • Special attention to the distribution of Language Resources and Evaluation packages Cocosda / WRITE Workshop

  7. 21 funded projects • 10 on Language Resources (data and tools) • 2 on Standards (Spoken / Written) : support to ISO TC37-SC4 • 1 on Technological survey (Portal) : http://www.technolangue.net • 8 on Technology Evaluation • Written language processing (5) • EASY: Syntactic parsing • ARCADE 2: Text alignment • CESART: Terminology extraction • EQUER: Information query • CESTA: Machine translation • Spoken Language processing (3) • EVASY: Speech synthesis • MEDIA: Spoken dialog • ESTER: Speech transcription / automatic indexing Cocosda / WRITE Workshop

  8. ESTER • Task: «Rich» speech transcription and indexing evaluation • Broadcast news data in French (radio/TV) • 100 h manually transcribed (1 MW,350 speakers) + 1600 h untranscribed • Second largest worldwide • 13 participants (3 companies) • Written transcription (RT / non RT) • Segmentation (sound, speaker recognition / diarization) • Named Entity recognition (from speech / transcribed text) • Topic detection and tracking for indexing : postponed • Final internal Workshop in March 2005 • Distribution of Evaluation Package • Development and Test data, scoring, results. Data used in EASY. • Workshop for linguists in May 2005 • Data and tools available, Results • Open issues necessitating Basic Scientific Research investigations Cocosda / WRITE Workshop

  9. LT for a Multilingual Europe • Language as a specific issue for Europe • Economical, cultural and political challenge with 2 dimensions: • A) Preserve the EU Member States cultures • Preference for native language (Web sites in German (75%)...) • 50% of European citizens only speak one language • (3% of Japanese people speak a foreign language) • B) Allow for communication across member states • 1170 translators at the EC - 1.3 Mpages translated in 2001 • 30% European Parliament budget (300 M€) – 500 translators • EU: 25 countries, 20 languages / 380 language pairs • Enormous cost for the EU, while mandatory • Need for the assistance of Language Technologies • Huge effort (# LT * # languages), too large for the EC alone • Should be shared with EU Member States (subsidiarity) Cocosda / WRITE Workshop

  10. Language Technologies EU Program • European Research Area (ERA) • Coordinate EC (< 15%) and MS (> 85%) research efforts • ERA-Net initiative in FP6 to coordinate MS national programs • LT well fitted with ERA • EC prime responsibility : • the coordination: management, standards, technology evaluation, communication... • the development cost of generic Language Technologies: • Speech recognition, synthesis, understanding, spoken dialog, language tagging, parsing, analysis, generation, text retrieval, document understanding, machine translation... • Each Member State would primarily have the responsibility of ensuring a proper coverage of its language(s): • Language Resources (essential) : (annnotated) corpus (spoken / written), lexicon (including pronunciations), dictionaries… • Language specific technology development/adaptation Cocosda / WRITE Workshop

  11. Lang-Net proposal • Build-up ERA-Net proposal of infrastructural nature • Language Resources, LT evaluation, Standards, Survey • Share of information • Strategic activities and Best Practice • Implementation of joint activities • Transnational research activities • Identify EU countries or regions having similar programs • 11 countries / regions in partnership : Germany, France, Italy, Trento region, Czech Republic, Denmark, Norway, The Netherlands / Belgium-Flanders (Dutch Language Union), Spain, Basque region, Sweden • Austria, Catalonia, Finland, Greece, Iceland, Portugal, Switzerland, UK (contacts) • Extendable to other partners • NMS (Slovenia, Cyprus, Poland, Hungary, Malta, Baltic countries…) • AS (Romania, Bulgaria…) • USA, Japan, South Africa, Israel, Canada… (contacts) Cocosda / WRITE Workshop

  12. Joint LT program proposal • DG Research (ERA-Net program) • Lang-Net proposal submitted in march 2005, not selected • Look forward for Thematic ERA-Net+ in FP7 • DG INFSO + Media • «Science & Technology Forum on Multilingualism» • June 2005 and February 2006 in Luxembourg • DG Education, culture and multilingualism • « A new framework strategy for multilingualism » (Nov. 2005) • http://europa.eu.int/languages/ Web site in the 20 EU languages • EC will set up a High Level Group on Multilingualism • A EU ministerial conference will be held • Further communication will be presented by EC to Parliament and Council • Committee of the regions (use of regional Spanish languages) • TC-Star report : Introduction signed by V. Reding & J. Figel Cocosda / WRITE Workshop

  13. French support to LT in FP7 • Visit of a French delegation to EC E Directorate • H. Forster & B. Smith (September 2005) • French Memorandum for a Digital Europe (i2010) • EuropeanDigital Library • EU ICT Directors meeting (Vienna, March 2006) • FP7 ICT program (2007-2013) • Technology pillar :Simulation, Visualization, Interaction, mixed realities • « Multilingual and automatic machine translation systems » • Replace / add LT • « Language technology, including multilingual and automatic MT systems » • FP7 Budget reduction (12 B€ to 9 B€ for ICT) • «language-enabled … interaction & communication» Cocosda / WRITE Workshop

  14. LT in FP7 • Article 169 large (several 100 M€) EC + MS + industry program) on LT in FP7 ? • Present topics: SMEs, Metrology, Research in the Baltic sea… • Joint support to LT in FP7 from MS Cocosda / WRITE Workshop

More Related