190 likes | 316 Views
Ron Hoory Manager, Speech Techno logies IBM Haifa Research Lab. C ontact C enter o f the F uture and Conversational Technologies. Watson. 1952 San Jose California. Zürich. Beijing. Almaden. Tokyo. Austin. Haifa. Delhi. Keeping ahead of the curve – IBM Research.
E N D
Ron Hoory Manager, Speech Technologies IBM Haifa Research Lab Contact Center of the Future and Conversational Technologies
Watson 1952San JoseCalifornia Zürich Beijing Almaden Tokyo Austin Haifa Delhi Keeping ahead of the curve – IBM Research
40 Years of Speech Technologies Research NIST STD eval Speech Analytics Speech to Speech Translation TC-STAR AVASR Conversational Biometrics TALES MALACH Websphere Voice Server DSR Concatenative TTS ViaVoice 2000 Embedded ViaVoice Commercial Dictation System 20000 word vocabulary 5000 word vocabulary recognition with 95% accuracy 1990 Speech recognition Research begins 1980 Digit recognition 1970 1960
IBM’s Contact Center of the Future Initiative • Contact Centers represent a huge business for IBM • IBM spends over $3B/year on operating its own contact centers. • IBM is close #1 (with Accenture), holding 8% of the $90B CRM-related services market, of which >50% is contact centers. • The CCOF initiative has the following vision: • A standardized, open architecture, to support all contact center activities • A holistic customer experience • Multi-channel, multi-modal self-service • End-to-end analytics • to serve the customer, the call center and the enterprise
Sample Speech Projects in IBM’s Contact Center of the Future • Facilitate de-accentization training in Indian call centers • Assess before, during and after training • Pitch, duration, energy • Monitor agent performance with call transcription • Transcribe calls, search for meaningful phrases, replace ‘play-back’ • Cover 100% vs. 4% of calls, listen only on ‘bad’ ones (115% more found) • Enhance agent productivity with “Agent Buddies” • CABs listen on call, transcribe, analyze, match, suggest answers and log the calls, increasing productivity, effectiveness and consistency. • Increase car-rental deal closures with speech analytics • Analyze calls and characterize ‘effective starts’ and ‘best practices’ • Build scripts to turn slow-starting calls into successful ones
Customer Customer Machine Agent Customer Speech Technologies for Contact Centers of the Future • Self Service Human-Machine Interaction • Conversational Biometrics • Advanced dialog and natural language interaction • Expressive Text-to-speech • Agent-customer interaction • Transcription of phone conversations • Speech analytics • Speech translation
Voice Dialog based Authentication Acoustic Voice Print Conversational Biometrics Three Keys To Authentication Know Are Own
Conversational Biometrics Spectral modeling Non-Spectral audio based text based Accent Conversation Topics Dialect Channel Speaking Rate context audio Communicated Information Gender Spoken Text Vocal Tract Geometry Language Usage Emotion Background Noise Language Direct Indirect observables
Reusable Dialog Components (RDC) • Mainstreaming speech-enabled Web applications • Adopt the Web programming model for voice interaction • Build applications from reusable components • From static to dynamic VoiceXML • State-Chart XML login Get Rate Mortgage
Touchtone Replacement Directed Dialog Natural Dialog System initiative User Initiative Mixed Initiative • Initiative: • Complexity: Strict Policy Free Policy Characterizing Dialog Systems • Level of Sophistication: “I would like to transfer 500 dollars from saving to checking” for Savings say 'one', for checking say 'two' Say the account do you want to withdraw from The system accepts user initiatives while being in control System controls the flow of the dialog user can say "Show me restaurants in this area" user can specify information only in the current context user can 'jump' to other places in the dialog 'tree'
Expressive Text-to-Speech • Benefits: • Save time, increase efficiency • Contrastive emphasis makes it clear that the morning flight is unavailable • Sensitive to user – avoid inappropriate expressions • Preserves full meaning of the source speaker in translation systems
Challenges: • Noisy environment • Spontaneous, conversational speech • Different accents • Emotional speech • Multiple languages Agent Customer Transcription of phone conversations Rich Transcript (words, Time-stamps, Alternatives, Phones) Speech data Transcription Engine Text World Text Analytics Keyword Search Translation
Word Confusion Network glasses 27% graphic 22% … have 61% impressions 19 % screen 99% … on 100% my 100% graphics 13% and 39% and 1% interested 9% impresses 7 % grass 3% … have glasses on my screen …
Acoustic model Language model Vocabulary Agent Customer Transcribers The Transcription Process Recorded calls data collection Data is manually transcribed Model training Run-time transcription with adaptation Call Center logging Call Recordings Adaptation Training Manual transcripts Automatic transcripts Transcripts Speech Transcription Engine
How can I find the solution to this problem? Let me search.. I’m trying to program the navigation system in my Cadillac CTS. How do I set the destination? Is this the answer? Maybe.. Let me check with another CSR.. Hold please! Now I have to review the log. Now I have to create a log. Ok. Here’s what you can do… To set the destination: A. then B. C.... Solution CAB Ok. Here’s what you can do… Log CAB 0 min 2 min 4 min CAB – Call Center Agent Buddy 0 min 2 min 4 min 6 min 8 min 9 min Problem Phase Research Phase Research Phase Research Solution Phase Log Phase • CABs present the solution or follow-up question to the agent within seconds • Provide consistent, high-quality customer experience • Reduce agent training cost That was easy… I’m trying to program the navigation system in my Cadillac CTS. How do I set the destination? Hold please! • CABs shorten the research phase and log phase • Increase agent productivity Problem Phase Research Phase Solution Phase Log Phase
Example of a ‘Simple’ CAB (Call Transcript Matching) Best Answer(s) Call Transcript Customer-Agent Call Call Transcript Incoming Call Transcript Closet Matching Call Transcript CallTranscript Compare to All Previous Calls KM Record Accessed on Previous Call CallTranscript
Real-time CAB System Example The system analyzes ongoing calls and provides the agent with a popup or sidebar containing related links. The actual implementation depends on the desktop/KM system being used. Information about a specific error is given by the caller. The system detects that the call is about instant messaging. The recommendations are re-ranked and the display is updated accordingly. This narrows down the possible solution documents. The system recommends documents to the agent, based on past experience are related to the messaging systems. As the call progresses, the recommendations are updated based on the added information.
CAB Automatically-Generated Log Record • At the end of the call, the system generates a call log record that can be transferred to the ticket management system. • For editing and storage • “Important” sentences are selected from the cleansed & normalized speech transcript • Skipped sentences are denoted by “…” • It’s possible to get more/fewer Automating the generation of logs is the single biggest cost-saving feature of the CABs at one customer
Speech to Speech Translation • Problem: Language barriers cause high cost for agents • need both subject matter expertise and language skills • Solution: Real Time Speech to Speech Translation • Challenges: • Translation has to handle output of ASR system • Recognition errors • Spoken language: different from written language • Non-grammatical disfluencies • Imperfect syntax • Lack of formal characteristics of text: no punctuation or paragraphing • Translated text must be "speakable" for oral communication • not enough to translate content adequately; output must be fluent Transcription Spoken Language Translation Text to Speech