280 likes | 397 Views
New challenge: telephone. Text To Speech & audio Speech recognition VoiceXML Homework: sign up on studio.tellme.com. Telephone. Caller to system: speech recognition, using grammars (limited vocabulary, general audience, no training) optional use of touch tones (numbers)
E N D
New challenge: telephone Text To Speech & audio Speech recognition VoiceXML Homework: sign up on studio.tellme.com
Telephone • Caller to system: speech recognition, • using grammars (limited vocabulary, general audience, no training) • optional use of touch tones (numbers) • System to caller: recorded audio (wav files) plus TTS (text to speech) • Limited bandwidth, in comparison to other applications, but very familiar, ubiquitous medium • 800 long distance, some airline information systems, others?
Problems in context • Speech recognition: very difficult if • no restrictions on speakers • grammar for all of English with aim of 'natural language understanding' • Text to speech: much easier problem (but English is more difficult than more fully phonetic languages like Spanish. (I've been told.) (More next class)
studio.tellme.com • Company that provides ‘engine’ for applications • Provides developing environment • We are doing the tellme version of VoiceXML, but it appears to be standard. • Register as a developer: • Provide your own id; assigned a PIN • Scratchpad for quick testing • Put VoiceXML in ScratchPad place (no audio files) • 1-800-555-VXML (8965) • SAY id and then PIN. • Application URL for projects with multiple files • To look at someone else's project, you change your Application URL • called pointing your account to a new source.
VoiceXML • XML document (VXML header) • VoiceXML has tags for flow-of-control and calculations. • Also can use <script> for JavaScript • Grammars come in different varieties. We will use the tellme way. • Grammars are included in CDATA tags to prevent XML interpretation. • Many grammars constructed for you. • <field name="answer" type="boolean" >…will listen for yes or no. <field name="price" type="currency" > … will listen for currency. • <menu > <choice > <choice> for list
VoiceXML basics, continued • <form> element can contain • <block> elements, which can contain <audio>, <go>, other • <field> which can contain • <prompt> • <grammar> (if not one of built-in grammars) • <filled> • <var> tags can be at different levels (for example, document, block, or higher levels) • <if> <elseif><else> tags • <script> elements for JavaScript (which can also appear in expressions>
VoiceXML basics: typical case • a form element • <field> • <prompt>, made up of <audio>, with reference to recorded wav file and backup text • <grammar>, if NOT using built-in grammars designated by type attribute of field. This is a CDATA section. • <filled> with (follow-on) code using field • <catch> for nomatch, noinput cases
Caution A form contains various elements, including a field. If a field has a grammar and the grammar is satisfied, control goes to a filled tag
obligatory… <?xml version="1.0"?> <vxml version="2.0"> <form> <block> <audio src="prompt1.wav">Hello, world </audio> </block> </form> </vxml> recorded using tellme studio backup using TTS, just in case src file missing
Preparation: objects • JavaScript (and other languages) use classes and objects • Objects (aka object instances) are declared (created, instantiated) as members of a class • Objects have • properties ('the data') • methods (functions that you can use 'on' the objects) • static methods • Math.random
Example: tm_date • var dt = new tm_date; creates a date/time object. • Use methods to extract/manipulate information held 'in' dt. var day = dt.get_day(); • Use static methods supplied to do common tasks: var dn=tm_date.to_day_of_week_name(day); or directly: var dn=tm_date.to_day_of_week_name(dt.get_day());
outline • Header stuff • script with external reference • script (code) encased in CDATA notation • Form/Block, with text to speech using value produced by script • Closing stuff
<?xml version="2.0"?> <vxml> <script src="http://resources.tellme.com/lib/code/tm_date.js"/> Will make use of data functions
<script> <![CDATA[ var dt = new tm_date(); var monis = tm_date.to_month_name(dt.get_month()); var dateis = dt.get_date(); var dayis = tm_date.to_day_of_week_name(dt.get_day()); var yearis = tm_date.to_year_name(dt.get_full_year()); var houris= dt.get_hours() - 4; var minutesis=dt.get_minutes() var whole = 'The date is '+ monis+' '+dateis+'. It is ' + dayis+'. The time is ' + houris + ' ' + minutesis; ]]> </script> brute force correction from GMT
<form> <block>Hello. <value expr="whole"/> Good bye. </block> </form> </vxml> Can use block for audio
Example: my family • Directed responses to 3 family members: • Daniel, • question/response on activities • Aviva, • question/response on number of cranes • Esther • response • Calculations (arithmetic) done using variables • if tags • The cond attribute is a condition test. • limited error handled: exit on no-match event • alternative is to repeat prompt, generally using count attribute
<vxml version="2.0"> <form> <field name="childid"> <prompt> <audio src="whosthis.wav">Hello. Who is calling?</audio> </prompt>
<grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [dan daniel (daniel meyer) (dan meyer)] {<childid "daniel">} [aviva (aviva meyer)] {<childid "aviva">} [esther (esther minkin) ] {<childid "esther">} ] ]]> </grammar>
<catch event="noinput nomatch"> <audio src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch> <filled> <if cond="'daniel'==childid"> <goto next="#danfollowup"/> <elseif cond="'aviva'==childid"/> <goto next="#avivafollowup"/> <elseif cond="'esther'==childid"/> <goto next="#estherfollowup"/> <else/> <reprompt/> </if> </filled> </field> </form> never happens Note inner, single quote marks. Note double ='s
<form id="danfollowup"> <field name="today" > <prompt> <audio src="congratsdan.wav" >Congratulations on the new job. Did you work on your thesis, or do aikido or jo today?</audio> </prompt> <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [aikido (i key dough)] {<today "aikido">} [thesis (work)] {<today "thesis">} [jo (joe) ] {<today "jo">} [both (all) (everything) ((i key dough) jo)]{<today "both">} [none nothing (sort of)] {<today "nothing">} ] ]]> </grammar> <catch event="noinput nomatch"> <audio >I didn't quite understand. Call or send e-mail.</audio> <exit/> </catch>
<filled> <if cond="today=='aikido'" > <audio>Some aikido is fine. </audio> <elseif cond="today=='thesis'" /> <audio>Good, but do other things also.</audio> <elseif cond="today=='jo'" /> <audio>don't get hit in the head.</audio> <elseif cond="today=='both'" /> <audio>Doing some of everything is best. </audio> <elseif cond="today=='nothing'"/> <audio> You deserve a break, but remember you want to be done by September. </audio> <else/> <audio> See you soon.</audio> </if> </filled> </field> <block> <audio> Good bye </audio> </block> </form>
<form id="avivafollowup"> <var name="rest" expr="1000"/> <field name="bcount" type="number"> <prompt> <audio src="howmanycranes.wav">Hello, Aviva. How many cranes have you made? </audio> </prompt> <grammar type="application/x-gsl" mode="voice" > <![CDATA[ NATURAL_NUMBER_THRU_9999 ]]> </grammar> <catch event="noinput nomatch"> <audio src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>
can't use < <filled> <assign name="rest" expr="1000-bcount"/> <audio> <value expr="rest" /> </audio> <audio src="togo.wav"> to go. </audio> <if cond="rest<200" > <audio src="homestretch.wav">You're in the home stretch </audio> <elseif cond="rest<500" /> <audio src="morethanhalf.wav">More than half way </audio> <elseif cond="rest<800" /> <audio src="goodstart.wav">Off to a good start </audio> <else/> <audio> Get a move on </audio> </if> <audio src="goodbye.wav">Good bye. </audio> </filled> </field> </form>
<form id="estherfollowup"> <block> <audio >Hello, Mommy. This is all I can do now. </audio> </block> </form> </vxml>
Application logic • VoiceXML elements (for example, <if> and <var>. • Note: more powerful than XSLT: <assign> tag • JavaScript code in attributes (for example, cond, expr) • JavaScript code in <script> </script> • Encase in CDATA to avoid problems with certain characters • external JavaScript code, cited using <script src=file address />
Class work • EVERYONE (who hasn't already) signup studio.tellme.com tonight • Design simple application (you may work in groups): • Ask one question • Detect and respond to each of 2 or 3 answers • Use examples here for models • All text to speech • Pick (at least) one and implement. • (Do this a short time and then go on to next lecture. Resume after 9pm when minutes are free.)
Homework • (Majors requirement overdue: there will be a deduction but better late than never.) • Go to studio.tellme.com & signup as developer. • try examples (using scratch pad) • record some voice samples • do tellme tutorials • ALSO try and report on • 800 long distance or some other commercial application