1 / 17

Voice Recognition

Voice Recognition. Lawrence Pan Syen Hassan Jamme Tan. Overview. History of voice recognition Why voice recognition? Technology behind voice recognition Five major steps Common applications Current leaders Demonstrations Product Evaluation

cherie
Download Presentation

Voice Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Voice Recognition Lawrence Pan Syen Hassan Jamme Tan

  2. Overview • History of voice recognition • Why voice recognition? • Technology behind voice recognition • Five major steps • Common applications • Current leaders • Demonstrations • Product Evaluation • Implementation of our own voice recognition system • Grade retrieval system for EE3414 • Future Challenges

  3. History of Voice Recognition • Radio Rex (house trained dog), 1922 • U.S Department of Defense, 1940’s • Speech Understanding Research (SUR) program • Carnegie Mellon University & MIT • Automatic interception & translation of Russian radio transmissions (FAILURE) • Original message: “the spirit is willing but the flesh is weak” • Translated message: “the vodka is strong but the meat is disgusting.”

  4. History Cont’d • First major achievements • Bell Laboratories, 1952 • Successful recognition of numbers 0 to 9, spoken over telephone • MIT, 1959 • Successful recognition of vowels with 93% accuracy • Carnegie Mellon University, 1970’s • HARPY system: capable of recognizing complete sentences

  5. History Cont’d • Obstacles • Computing power: over 50 computers needed for HARPY system to perform • Ability to recognize speech from any person • Taking in account different accents, speech tones, etc. • Ability to recognize continuous speech • so…we…do…not…have…to…speak…like…this! • Commercialization of voice recognition systems

  6. History Cont’d Computation required and computation available in available processors over time Accuracy and task complexity progress over time

  7. Why Voice Recognition? • Convenience • Natural user interface: human speech • Improved services for the disabled • Wider range of users • Future possibilities and improvements • Internet use over phones through voice portals • Advanced applications implementing voice control in all areas

  8. Technology behind Voice Recognition • Five major steps used by speech recognizer

  9. Five major steps in voice recognition • Capture and Digitalization • System interacts with the telephony device to capture voice input at 8000 samples/sec • Spectral Representation • Voice samples converted to graphical representation • Segmentation • Speech signals are broken down into segmented parts. • Improves accuracy • Reduces computation: impossible to process entire signal in real time

  10. Graphical Representations

  11. Acoustic Model • Phonemes – smallest phonetic unit in a language • Creates distinction between other words • e.g. b in boy and t in toy • Allophone – different pronunciations of a phoneme/letter • E.g. t in tab, t in stab, tt in stutter • Database (Lexicon) of all words known to the system for a language • Should contain several recordings for certain words • E.g. “the” can be pronounced “duh” or “dee”

  12. Acoustic Model Cont’d • Trelliss • Data structure made up of all possible combinations of allophones • Training of Acoustic models • For single-user systems • Text is read by user and recognized by system • For multi-user systems • Utterances spoken by many users compiled into a database, then inputted into a recognizer • Weights are put on certain allophones

  13. Language Model • Languages have structures (i.e. grammar) • Difference between two words can be difficult to understand • Can be distinguished using context • E.g. “ours” and “hours” can be determined if previous word is “two”

  14. Common Applications • Call Center Automation • Widely used in all industries (consumer interface) • Airline companies: booking flights, general info, etc. • Banking companies: “pay by phone”, account balances, etc. • Delivery Services (FedEx): tracking orders, etc. • All general customer service systems • Computer Integration of voice recognition • Personal Computers • Speech to Text Dictation • Accessibility purposes: voice control of computers

  15. Common Applications cont’d • Integrated into automobiles: • Visteon Voice Technology™ used in Infiniti Q45 • Controls: • Climate • CD player • Navigation system

  16. Competing Standards • VoiceXML (extensible markup language) • Partners: AT&T, IBM, Motorola, Lucent Tech. • Used in implementation of most voice portals • Shifting target toward web developers • SALT (Speech Application Language Tags) • Partners: Microsoft, Intel, Cisco, SpeechWorks • Targeted toward web developers

  17. Future Challenges • Speech Technology • VoiceXML vs. SALT • Voice enabling web content • Real time access to source data • Stock market, traffic, sports, etc. • Clear connection needed for effective use of voice portals • Security Issues involved • Advertising based revenue

More Related