Speech Recognition

Speech Recognition

Introduction • What is Speech Recognition? - Voice Recognition? • Where can it be used? - Dictation - System control/navigation - Commercial/Industrial applications - Hand held digital recorders

Contents: • Continuous/Discrete • How does it work? • Recent improvements • Current software options • Future of SR

Continuous or Discrete? • Continuous speech - dictation • Discrete speech - system controls

How does SR work? • Recognition • Training • Correction • Command/Control

Recognition (1) Voice Input Analog to Digital Acoustic Model Language Model Feedback Display Speech Engine

Recognition (2) Acoustic Modeling • Spoken words: “I think there are…..” • Phonemes: ‘ ay th-in-nk-kd dh-eh-r aa-r’ • H.M.M.’s: 5 state representation • Speech Engine

Recognition (3) Language Modeling • Word context • Word frequency • Transition possibilities

Voice Training (1) Can be done by: • Predetermined text segments • Individual words Compare new acoustic with old and combines • More training = better recognition

Voice Training (2) User specific Voice file • Voice qualities • Pronunciation • Patterns of word use • Preferred vocabulary

Making Corrections • Move cursor by voice command • Memorize edit commands • List of possible alternatives • Make correction manually

Command/Control • Desktop grid • Program or Link name/number • URL name • Memorized commands

Recent Improvements in SR • Faster training ~10 min. • Better recognition ~95% • More compatible software • Better system control/command

Current Software Options for PC • Dragon Systems – Naturally Speaking • Philips – FreeSpeech • IBM – ViaVoice • Lernout & Hauspie – Voice Xpress

How well do the work?

Future of SR • SUI – Speech-based User Interface • Improvements needed: - Greater accuracy - Greater system control/command - More compatible software

Conclusion • SR Uses • How does it work? • Current Software • Problems of SR • More SR coming soon….

References • 1. Alwang, Greg. “Speech Recognition,” PC Magazine, December 1 1999 • 2. Hauptmann, Alexander G. Jang, Photina Jaeyun. Carnegie Mellon University. “Learning to Recognize Speech by Watching Television,” IEEE Intelligent Systems, September/October 1999. • 3. Miastkowski, Stan. “Latest Speech Software Gets You Up and Running Faster,” PC World, November 1999.

Speech Recognition