1 / 28

New challenge: telephone

New challenge: telephone. Text To Speech & audio Speech recognition VoiceXML Homework: sign up on studio.tellme.com. Telephone. Caller to system: speech recognition, using grammars (limited vocabulary, general audience, no training) optional use of touch tones (numbers)

Download Presentation

New challenge: telephone

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New challenge: telephone Text To Speech & audio Speech recognition VoiceXML Homework: sign up on studio.tellme.com

  2. Telephone • Caller to system: speech recognition, • using grammars (limited vocabulary, general audience, no training) • optional use of touch tones (numbers) • System to caller: recorded audio (wav files) plus TTS (text to speech) • Limited bandwidth, in comparison to other applications, but very familiar, ubiquitous medium • 800 long distance, some airline information systems, others?

  3. Problems in context • Speech recognition: very difficult if • no restrictions on speakers • grammar for all of English with aim of 'natural language understanding' • Text to speech: much easier problem (but English is more difficult than more fully phonetic languages like Spanish. (I've been told.) (More next class)

  4. studio.tellme.com • Company that provides ‘engine’ for applications • Provides developing environment • We are doing the tellme version of VoiceXML, but it appears to be standard. • Register as a developer: • Provide your own id; assigned a PIN • Scratchpad for quick testing • Put VoiceXML in ScratchPad place (no audio files) • 1-800-555-VXML (8965) • SAY id and then PIN. • Application URL for projects with multiple files • To look at someone else's project, you change your Application URL • called pointing your account to a new source.

  5. VoiceXML • XML document (VXML header) • VoiceXML has tags for flow-of-control and calculations. • Also can use <script> for JavaScript • Grammars come in different varieties. We will use the tellme way. • Grammars are included in CDATA tags to prevent XML interpretation. • Many grammars constructed for you. • <field name="answer" type="boolean" >…will listen for yes or no. <field name="price" type="currency" > … will listen for currency. • <menu > <choice > <choice> for list

  6. VoiceXML basics, continued • <form> element can contain • <block> elements, which can contain <audio>, <go>, other • <field> which can contain • <prompt> • <grammar> (if not one of built-in grammars) • <filled> • <var> tags can be at different levels (for example, document, block, or higher levels) • <if> <elseif><else> tags • <script> elements for JavaScript (which can also appear in expressions>

  7. VoiceXML basics: typical case • a form element • <field> • <prompt>, made up of <audio>, with reference to recorded wav file and backup text • <grammar>, if NOT using built-in grammars designated by type attribute of field. This is a CDATA section. • <filled> with (follow-on) code using field • <catch> for nomatch, noinput cases

  8. Caution A form contains various elements, including a field. If a field has a grammar and the grammar is satisfied, control goes to a filled tag

  9. obligatory… <?xml version="1.0"?> <vxml version="2.0"> <form> <block> <audio src="prompt1.wav">Hello, world </audio> </block> </form> </vxml> recorded using tellme studio backup using TTS, just in case src file missing

  10. Preparation: objects • JavaScript (and other languages) use classes and objects • Objects (aka object instances) are declared (created, instantiated) as members of a class • Objects have • properties ('the data') • methods (functions that you can use 'on' the objects) • static methods • Math.random

  11. Example: tm_date • var dt = new tm_date; creates a date/time object. • Use methods to extract/manipulate information held 'in' dt. var day = dt.get_day(); • Use static methods supplied to do common tasks: var dn=tm_date.to_day_of_week_name(day); or directly: var dn=tm_date.to_day_of_week_name(dt.get_day());

  12. outline • Header stuff • script with external reference • script (code) encased in CDATA notation • Form/Block, with text to speech using value produced by script • Closing stuff

  13. <?xml version="2.0"?> <vxml> <script src="http://resources.tellme.com/lib/code/tm_date.js"/> Will make use of data functions

  14. <script> <![CDATA[ var dt = new tm_date(); var monis = tm_date.to_month_name(dt.get_month()); var dateis = dt.get_date(); var dayis = tm_date.to_day_of_week_name(dt.get_day()); var yearis = tm_date.to_year_name(dt.get_full_year()); var houris= dt.get_hours() - 4; var minutesis=dt.get_minutes() var whole = 'The date is '+ monis+' '+dateis+'. It is ' + dayis+'. The time is ' + houris + ' ' + minutesis; ]]> </script> brute force correction from GMT

  15. <form> <block>Hello. <value expr="whole"/> Good bye. </block> </form> </vxml> Can use block for audio

  16. Example: my family • Directed responses to 3 family members: • Daniel, • question/response on activities • Aviva, • question/response on number of cranes • Esther • response • Calculations (arithmetic) done using variables • if tags • The cond attribute is a condition test. • limited error handled: exit on no-match event • alternative is to repeat prompt, generally using count attribute

  17. <vxml version="2.0"> <form> <field name="childid"> <prompt> <audio src="whosthis.wav">Hello. Who is calling?</audio> </prompt>

  18. <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [dan daniel (daniel meyer) (dan meyer)] {<childid "daniel">} [aviva (aviva meyer)] {<childid "aviva">} [esther (esther minkin) ] {<childid "esther">} ] ]]> </grammar>

  19. <catch event="noinput nomatch"> <audio src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch> <filled> <if cond="'daniel'==childid"> <goto next="#danfollowup"/> <elseif cond="'aviva'==childid"/> <goto next="#avivafollowup"/> <elseif cond="'esther'==childid"/> <goto next="#estherfollowup"/> <else/> <reprompt/> </if> </filled> </field> </form> never happens Note inner, single quote marks. Note double ='s

  20. <form id="danfollowup"> <field name="today" > <prompt> <audio src="congratsdan.wav" >Congratulations on the new job. Did you work on your thesis, or do aikido or jo today?</audio> </prompt> <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [aikido (i key dough)] {<today "aikido">} [thesis (work)] {<today "thesis">} [jo (joe) ] {<today "jo">} [both (all) (everything) ((i key dough) jo)]{<today "both">} [none nothing (sort of)] {<today "nothing">} ] ]]> </grammar> <catch event="noinput nomatch"> <audio >I didn't quite understand. Call or send e-mail.</audio> <exit/> </catch>

  21. <filled> <if cond="today=='aikido'" > <audio>Some aikido is fine. </audio> <elseif cond="today=='thesis'" /> <audio>Good, but do other things also.</audio> <elseif cond="today=='jo'" /> <audio>don't get hit in the head.</audio> <elseif cond="today=='both'" /> <audio>Doing some of everything is best. </audio> <elseif cond="today=='nothing'"/> <audio> You deserve a break, but remember you want to be done by September. </audio> <else/> <audio> See you soon.</audio> </if> </filled> </field> <block> <audio> Good bye </audio> </block> </form>

  22. <form id="avivafollowup"> <var name="rest" expr="1000"/> <field name="bcount" type="number"> <prompt> <audio src="howmanycranes.wav">Hello, Aviva. How many cranes have you made? </audio> </prompt> <grammar type="application/x-gsl" mode="voice" > <![CDATA[ NATURAL_NUMBER_THRU_9999 ]]> </grammar> <catch event="noinput nomatch"> <audio src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>

  23. can't use < <filled> <assign name="rest" expr="1000-bcount"/> <audio> <value expr="rest" /> </audio> <audio src="togo.wav"> to go. </audio> <if cond="rest&lt;200" > <audio src="homestretch.wav">You're in the home stretch </audio> <elseif cond="rest&lt;500" /> <audio src="morethanhalf.wav">More than half way </audio> <elseif cond="rest&lt;800" /> <audio src="goodstart.wav">Off to a good start </audio> <else/> <audio> Get a move on </audio> </if> <audio src="goodbye.wav">Good bye. </audio> </filled> </field> </form>

  24. <form id="estherfollowup"> <block> <audio >Hello, Mommy. This is all I can do now. </audio> </block> </form> </vxml>

  25. Application logic • VoiceXML elements (for example, <if> and <var>. • Note: more powerful than XSLT: <assign> tag • JavaScript code in attributes (for example, cond, expr) • JavaScript code in <script> </script> • Encase in CDATA to avoid problems with certain characters • external JavaScript code, cited using <script src=file address />

  26. Class work • EVERYONE (who hasn't already) signup studio.tellme.com tonight • Design simple application (you may work in groups): • Ask one question • Detect and respond to each of 2 or 3 answers • Use examples here for models • All text to speech • Pick (at least) one and implement. • (Do this a short time and then go on to next lecture. Resume after 9pm when minutes are free.)

  27. Homework • (Majors requirement overdue: there will be a deduction but better late than never.) • Go to studio.tellme.com & signup as developer. • try examples (using scratch pad) • record some voice samples • do tellme tutorials • ALSO try and report on • 800 long distance or some other commercial application

More Related