421 likes | 935 Views
The Speech Speech. casey chesnut brains-N-brawn.com Madison .NET April 2007. Powerpoint. Page Up Page Down. brains-N-brawn.com. Pervasive Computing Tablet PC (MVP 03) Compact Framework (MVP 04) Advanced Web Services (MVP 05) Media Center (MVP 06) Speech Location Based Services
E N D
The Speech Speech casey chesnut brains-N-brawn.com Madison .NET April 2007
Powerpoint • Page Up • Page Down
brains-N-brawn.com • Pervasive Computing • Tablet PC (MVP 03) • Compact Framework (MVP 04) • Advanced Web Services (MVP 05) • Media Center (MVP 06) • Speech • Location Based Services • Artificial Intelligence • 3D
Outline • Speech Overview • Vista Speech Recognition • SAPI 5.3 / System.Speech • Speech Server 2007
Outline : Speech Overview • Voice User Interface • How does it work? • Synthesis (TTS) • Recognition (SR)
Overview • Speech is just another presentation system • Synthesis = Output to user • Recognition = User input • Voice User Interface (VUI)
VUI Modes • Applications • Multi-modal • Voice-only
VUI Tips • Don't replicate the touch-tone-based menu system • Restrict options on the main (opening) menu to 4 or fewer • Make sure your opening greeting is short • Don't design the app solely for the new user • Focus on task completion above all • What can I say? http://blogs.msdn.com/anandis_thoughts/archive/2006/02/08/528181.aspx
Speech Synthesis • Text to Speech • Dynamic • Prompt database
How Synthesis Works • Text parsing • Sentences, numbers, symbols, pauses • Natural language processing • Part of speech, tense • Phonemes are looked up or sounded out • Diphones are appended together • Post process audio to add emphasis • Play speech audio
Demo /xnaSynth app Article http://www.brains-N-brawn.com/ttSpeech/ http://www.brains-N-brawn.com/xnaSynth/ (codebase from /ttSpeech) How Synthesis Works
Speech Recognition • Speech to Text • Dictation • Command and Control
Audio signal is processed Look for signals which might be speech Phonemes are found in audio signals Phonemes are mapped to a dictionary or words Dictation or grammar-based Apply natural language processing How Recognition Works
How Recognition Works • Demo • /wavReader app • Article • http://www.brains-N-brawn.com/noReco/ • http://www.brains-N-brawn.com/speakerVerify/ (codebase from /noReco)
Built-in to Vista’s shell Microphone bar Language support Can be trained to improve accuracy Command-and-control, also Dictation Automagic application support Horrible Office integration UAC problems Outline : Vista Speech Recognizer
Demo • Say what you see • Show numbers • Correct • Spell it • Mouse grid http://www.istartedsomething.com/20060808/vista-speech-recognition-screencast/
Hack http://news.bbc.co.uk/1/hi/technology/6320865.stm • /micBarExtend – tap and talk
Narrator • Vista’s screen reader
Desktop applications SAPI 5.3 System.Speech Outline : SAPI 5.3 / System.Speech
SAPI 5.3 • COM based • Native applications • Managed apps which need more control
System.Speech • Part of .NET 3.0 WPF • Managed wrapper built on SAPI 5.3 • Simple API • Standards support (SSML, SRGS) • Language support • Vista Speech Recognition integration • Does not work in XBAP
System.Speech.Synthesis • SpeechSynthesizer • SSML • PromptBuilder • Voices
System.Speech.Synthesis • Demo • /speechSamples - /speechSynth
System.Speech.Recognition • SpeechRecognizer / SpeechRecognizerEngine • SRGS • GrammarBuilder • Advanced users • Deep-link functionality • Mixed initiative
System.Speech.Recognition • Demo • /speechSamples - /speechReco
System.Speech • Demo • /micBarExtend • /mceSapiMcpl • Article • http://www.brains-N-brawn.com/speechSamples/ • http://www.brains-N-brawn.com/micBarExtend/ • http://www.brains-N-brawn.com/mceSapi/ (not updated for Vista yet)
What about Mobile Devices • OEMs can add VoiceCommand • VoiceCommand is not accessible to developers • WindowsMobile has the SAPI API, but no engines • PlatformBuilder is supposed to have engines • There are 3rd party engines for purchase
Speech Server 2007 • Telephony Applications • Outgoing calls • Speaker Independent
VOIP Language support VoiceXML / SALT Workflow development model Reports Still in beta Speech Server 2007
Speech Server 2007 • Speech Synthesis • Inline • PromptBuilder • SSML • Prompt databases • Speech Recognition • Inline • Dynamic Grammar • SRGS • Conversational Grammar Builder • DTMF
VoiceXML • Declarative language • Article • http://www.brains-N-brawn.com/vxml/ • http://www.brains-N-brawn.com/myVoices/ • http://www.brains-N-brawn.com/voiceBio/
SALT • Yet another declarative language • Multimodal support has been dropped • Article • http://www.brains-N-brawn.com/noHands/ • http://www.brains-N-brawn.com/speechMulti/ • http://www.brains-N-brawn.com/tabletWeb/ • http://www.brains-N-brawn.com/mceSalt/
Speech Workflow • Speech Sequence Workflow designer • Speech activities • Statement • QuestionAnswer • Debugging tools
Speech Workflow • Demo • /speechTextAdv • /speakerVerify • /mobileRecord • Article • http://www.brains-N-brawn.com/speechTextAdv/ • http://www.brains-N-brawn.com/speakerVerify/
Where • Accessibility • Telephony • Telematics • Home automation • Mobile Devices / Tablets • Gaming • Warehouses • …
Possible Future • Telematics • Service Pack for Office Support • Exchange Server 2007 • Speech Server 2007 release • Rumors that WindowsMobile will get a public API • Dictation has room to improve • Hope that System.Speech will ultimately work in XBAP