1 / 18

ETSI STQ Aurora Distributed Speech Recognition (DSR)

ETSI STQ Aurora Distributed Speech Recognition (DSR). Distributed Speech Recognition. Dieter Kopp Alcatel Research & Innovation email:Dieter.Kopp@alcatel.de. DSR system vision. ETSI STQ Aurora. Participants

aalderson
Download Presentation

ETSI STQ Aurora Distributed Speech Recognition (DSR)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ETSI STQ Aurora Distributed Speech Recognition (DSR) Distributed Speech Recognition Dieter Kopp Alcatel Research & Innovation email:Dieter.Kopp@alcatel.de

  2. DSR system vision

  3. ETSI STQ Aurora • Participants • Alcatel, AT&T, British Telecom, Ericsson, France Telecom, Hewlett Packard, Motorola, Nokia, Qualcomm, Siemens, Sony, Texas Instruments, IBM, Conversay, etc. • MEL-Cepstrum DSR Front-End & Compression • Complete - ETSI standard published in February 2000 • Advanced Noise Robust DSR Front-End • Current activity - standard expected in 2002 • DSR Application & Protocols • Architecture definition, Client /Server protocol specification & contribution to other standardization group

  4. ETSI STQ Aurora Front- End Standardization

  5. DSR Elements

  6. Performance Enhancement with DSR Telephone Application & DSR

  7. Benefit of DSR for IP transmission • Worst performance obtained using speech codec • Speech Recognition over IP using DSR has at 50% packet lost only 3% recognition rate degradation compared to 63% for coded speech transmission (Simulation done by BT)

  8. Advanced Noise Robust DSR Front-End • Goals: • Standardization of a Noise Robust DSR Front-End algorithm under following conditions: • 50% recognition rate improvement compared to the existing DSR Front-End standard • Latency below 250ms • Complexity below 17wMOPs • Selection process using: • Aurora database, SpeechDatCar (top 2/3 cluster selection) • Large vocabulary database (final winner)

  9. ETSI STQ Aurora Application & Protocols

  10. Application & Protocols Subgroup • Definition of DSR scenarios for applications • Information applications • Voice portals (flight, weather, news, movies) • Location-specific information • Voice Navigation of maps • Transaction-based applications • Finance • e-commerce (various) • Information capture • Dictation • Form filling

  11. Application & Protocols Subgroup • Specification of the Client /Server architecture • Specification of the communications elements (voice transport interface, synchronization between Client/Server, etc.) • Contribution to other standardization groups • Participants:Alcatel, British Telecommunications, Ericsson, HP, IBM, ICSI, Intel Labs, Motorola, Nokia, Qualcomm, SpeechWorks, Temic/Daimler Chrysler, TI, Verbaltek, WaveMakers, Philips, etc.

  12. ETSI/STQ-Aurora Protocol & Application Voice Recognition URL Voice page Graphic I/O DSR Speech output GUI page Speech output Mobile Network Open & establish connection, Capability negotiation Connection to DSR Back-End Server Pre-processing data, Speech output, contents exchange

  13. Applications for Multi-modal Distributed Speech Recognition • Advanced Applications towards 3G terminals

  14. Multi-modal User Interaction Output:Speech, Display Capability Feedback/ Interaction Application PresentationManager Environment Service Request Dependent on the environment (background noise) and the user preferences more or less speech I/O could be used Input:Speech, Key, Pen, etc. User Profile

  15. 1 Tell me todays schedule! 3 2 Who will participating the 9 o’clock phone call? You have meetings at 9, 11:30 and 1 p.m.. You have have two meeting requests. Details: 9 until 10 o’clock, phone-conference MAP 10:30 possible meeting with M. Hauser Marketing, 11:30 until 12:30 lunch ... Tuesday, 26.6.2001 8:30 9:00 MAP TP 4 9:30 phone conference 10:00 10:30 ? M. Hauser 11:00 ? Marketing 11:30 Lunch 12:00 12:30 1:00 department conv. Mobile `02 How may I help you? Menu WAP Select 4 9:00 e-business O’Neill, Scott Dumont, Denise 5 Invite Jim Mason! Scenario: Personal Information Manager

  16. DSR decoder Audio Codec (s) Conversational Engines DSR encoder Audio drivers Voice Browser Audio I/O DOM Wrapper DOM GUI Browser Wrapper GUI drivers Content Server MM Shell GUI I/O Multi-modal Architecture Network Server Communication Manager Voice Transport Interface Gateway and router with Voice transport and Synchronization Support Network Transport Layer Network Transport Layer Synchronization Interface Synchronization Protocols Data Transport Interface HTTP

  17. P&A next steps • Voice Transport protocol specification and contribution to 3GPP • Definition of the Multi-modal Shell function. How the synchronization could be managed • Liaison offer to W3C for the standardization of the DOM interface for VoiceXML • Contribution to W3C Multi-modality group with ETSI multi-modal architecture • Common interface to all speech recognizers (IBM activity)

  18. Thank You

More Related