1 / 18

ICSLP Info Building the Interspeech 2006 Information Line

ICSLP Info Building the Interspeech 2006 Information Line. Rohit Kumar, Arthur Chan, Lingyun Gu Spring 2006 11754: Dialog Lab Project. Layout. Motivation Use Cases System Architecture Backend Language Components Grammar, NLG Templates, Language Models Dialog Component

teenie
Download Presentation

ICSLP Info Building the Interspeech 2006 Information Line

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICSLP InfoBuilding the Interspeech 2006 Information Line Rohit Kumar, Arthur Chan, Lingyun Gu Spring 2006 11754: Dialog Lab Project

  2. Layout • Motivation • Use Cases • System Architecture • Backend • Language Components • Grammar, NLG Templates, Language Models • Dialog Component • Telephony Component • Conclusions & Future Work • Distribution of work among team members

  3. Motivation • To provide schedule information to attendees of Interspeech 2006, Pittsburgh • Users can ask queries about • Time / Location of Papers & Sessions • Search for Papers & Session by title & author • Other information • Panel Discussion, Keynotes, Breaks • To study development of Spoken Dialog System by practicing involved principles

  4. Use Case • One typical use case is shown below • System: Welcome to Interspeech 2006 Information Line. What information about the conference schedule do you need ? I can talk about the various papers and sessions in the conference. • User: When is the session on Language modeling on Thursday ? • System: Looking for Language modeling session. • User: Yes • System: I found one session matching your query. There is a session on Language Modeling at 12.30pm on Thursday in Siaca Hall on 2nd Floor. Do you need any more information ? • User: No • System: Thank you for using the Interspeech 2006 Information Line. Have a nice day.

  5. Present Results max.3 Welcome Prompt Summary Prompt Natural Language Prompt: How may I help you? Got Something No understanding Sorry. I didn’t get that. Confirmation Menu: What type of Information you want? Papers Sessions Date Author and/or Title Paper Found No Paper Found Confirmation Area / Topic Area / Topic Confirmation Confirmation Title / Keywords Time of Day Confirmation More Info? Decide if we have enough info. to lookup few definitive records Lookup Intended Dialog Flow chart

  6. System Architecture • Based on the Ravenclaw Dialog System Architecture along with use of the VeraOut system for Telephony through Skype. • Sphinx2 for Telephony setup • Sphinx3 for Desktop Setup

  7. System Components • The following components were specifically engineered for this project • Backend: Database & Robust Querying • DM: Dialog Task Specification • Parser: Grammar • ASR: Language Model & Vocabulary • NLG: Templates • Telephony: Skyper

  8. Backend • Database: • ICSLP 2005 Records crawled from their Website to build the currently used database • Information crawled • Sessions (Chair, Title, Time, Date, Location) • Papers (Title, Session, Type, Time, Date, Location, Chair, Authors) • Other information manually filled into the database like keynote address, panel discussions, special sessions, etc. • Database structured as a single table with all relevant information available for each record • Totally: 875 records

  9. Backend • Querying System • Statistical Matching of Queries • Queries considered as bag of words and matched with appropriate fields of all records • Records with highest normalized match confidence are reported back • At most 10 records are returnedIf more than 10 records match, 0 records are returned • Query pre-processing: Expanding abbreviations(TTS  text to speech) • Robust: Allows matching of Multilingual Speech to Multilingual Text to Speech records unlike fixed expression matching

  10. Grammar • 126 Nets including generic & task specific non – terminals • Task Specific Concepts Extracted • Session Names (110 with variations) • Author Names (92) • Query Type (5: Paper, Session, Keynote, Panel, Lunch) • Request Type (5: When, Where, Which, How Many, About) • Date & Time Specifications (Day of week is the only one used in the dialog)

  11. NLG: Templates • Number of Task Specific Templates authored (70) • Establish Context: 3 • Inform: 37 • Request: 13 • Implicit Confirmation: 7 • Explicit Confirmation: 10

  12. Language Models & Vocabulary • Training Corpus • Created for a strict version of grammar • ~120,000 utterances in the training corpus • Created by generating fake sentences from various nets of the grammar and concatenating the individual sets • ~100,000 utterances make of author names, session titles, full and partial valid queries • Utterances appearing more than 25 times removed to keep only 25 copies • Utterances appearing between 12 to 24 times reduced in number

  13. Language Models & Vocabulary • Vocabulary size: • 497 unique words + <s>, </s>, <UNK> in LM • 497 words with pronunciation variations • Trained trigram models using CMU-Cambridge LM Toolkit • Good Turing discounting applied • Number of Unique Trigrams: 37333 • Number of Unique Bigrams: 5177 • WER (on 22 utterances, 8k Sphinx3 models): 23.529%

  14. Final Dialog Task Specification 55 Agents in Total

  15. Final Dialog Task Specification

  16. Telephony: Skyper • Skype In used to receive calls • Skype Id: interspeech06 • Phone: 412 567 2683 • Additional Setup • Automatically Receive call • Non Default Audio Devices • No ringing • Hardware Dependency eliminated • Virtual Audio Cable Software • Created 2 virtual cables • Virtual Cable 1 IN for Sphinx • Virtual Cable 1 OUT for Skype • Virtual Cable 2 IN for Skype • Virtual Cable 2 OUT for TTS • No Sound Cards Required at all • Better quality due to no Physical coupling between sound cards

  17. Conclusions & Future Work • Improving LM for better ASR performance • Extended Vocabulary with pronunciation corrections & variations • State specific LMs • Useful when asking only for day, session name, author name, etc. • Incorporation of Interspeech 2006 schedule information when available • Bug fixes in Audio Server and Sphinx3 for better functioning with Skyper based telephony • Extending dialog functionality to support further questioning on results of a query • Session management in Back end

  18. Task Distribution • Besides participation in group meeting, everyone’s individual contributions to the project are listed below • Rohit Kumar • Initial Project Proposal • Backend: Crawling from ICSLP05 site, Querying System, Galaxy Agent Wrapper • Dialog Management: Complete First and Final iteration of Dialog Task Specification & Implementation • Natural Language Generation: Authoring of all Final Templates (in proper “british” English) • Grammar for the Parser • Telephony Integration • Building Final (non-class based) LM and Vocabulary and integration with final system after implementation of recommended tuning • Project Presentation & other documentation • Arthur Chan* • Project proposal refinement • Initial version management setup • Intermediate iteration of Dialog Task implementation • First and Intermediate iteration of Language models • Intermediate iteration some of the NLG Templates • Several fixes in Sphinx3 and Audio Server for improved performance • Lingyun Gu • Worked on LM fine tuning as per recommendations • Intermediate authoring of 1 NLG Template • Intermediate authoring of grammar * dropped course

More Related