580 likes | 603 Views
lti. Improving User Interaction with Spoken Dialog Systems via Shaping. Stefanie L. Tomko. 15 December 2006 PhD Thesis Defense Language Technologies Institute School of Computer Science Carnegie Mellon University. Thesis committee. Roni Rosenfeld, chair
E N D
lti Improving User Interaction with Spoken Dialog Systems via Shaping Stefanie L. Tomko 15 December 2006 PhD Thesis Defense Language Technologies Institute School of Computer Science Carnegie Mellon University
Thesis committee • Roni Rosenfeld, chair • Alex Rudnicky • Alex Waibel • Candy Sidner, MERL
when is that showing at the Manor? theater is Manor theater, what is show time? Question • How can we help users have more efficient interactions with spoken dialog systems? • Shaping: teach users to say what the system understands • When users say non-Speech Graffiti things, help them learn the Speech Graffiti version when is that showing at the Manor?
Thesis statement • Shaping can be used to induce more efficient user interactions with spoken dialog systems. • The shaping strategy can improve efficiency by increasing the amount of user input that is actually understood by the system, leading to increased task completion rates and higher user satisfaction. • This strategy can also reduce upfront training time, thus accelerating the process of realizing more efficient interaction.
Roadmap • Speech Graffiti basics • Related work, in brief • Proposed strategy • Evaluation: three user studies • Summary, conclusions & discussion
Speech Graffiti (i) • A protocol for structured interaction with simple machines • Why "Speech Graffiti?" • User adapts input style • Result is easier for system to process speech graffiti basics – related work – proposed strategy – evaluation – summary
Speech Graffiti (ii) • Addresses several issues with spoken dialog systems • Clarifies system boundaries • Simplifies development • Allows flexible, direct access to data • Provides universal interaction protocol speech graffiti basics – related work – proposed strategy – evaluation – summary
User input Speech Graffiti-based language model Speech Graffiti-based speech recognition hypothesis is input Speech Graffiti? Theater is Showcase North theater Dramas yes no give terse confirmation or query result give error beep Showcase Cinemas Pittsburgh North {error beep} speech graffiti basics – related work – proposed strategy – evaluation – summary
[slot] is [value] [slot] is [value] what is [slot]? [slot] is [value] Baseline Speech Graffiti interaction • USER: Theater is Showcase North theater • SYSTEM: Showcase Cinemas Pittsburgh North. • Dramas • {beep} • Genre is drama • Drama. • What movies are playing? • {beep} • Where was I? • Theater is Showcase Cinemas Pittsburgh North, genre is drama. • What is the title? • 2 titles: Flags of Our Fathers, The Departed. • Start over • Starting over. • Area is North Hills • South Side • Scratch that • Scratched • Area is North Hills, title is The Illusionist • North Hills, The Illusionist speech graffiti basics – related work – proposed strategy – evaluation – summary
Earlier Speech Graffiti evaluation • Compared Speech Graffiti with an NL system in the same domain (movies) (N=23) • On average, Speech Graffiti users had • Higher user satisfaction* • Lower error rates* • Lower task completion times • Similar task completion rates • Some users just didn't get it (N=6) speech graffiti basics – related work – proposed strategy – evaluation – summary
Related work • Restricted / subset / structured languages are a reasonable approach to HCI • Kelly (‘77), Black & Moran (‘82), Jackson (‘83), Sidner & Forlines (‘02), etc. • Humans adapt conversation at many levels • Pickering & Garrod (‘04), etc. • Zoltan-Ford (’91), Brennan (’96), Bell (’03), etc. • Convergence: “the process of interaction adaptation whereby one partner adopts behavior that is increasingly similar to that of the other partner” • (Burgoon, Stern, & Dillman, ‘95) speech graffiti basics – related work – proposed strategy – evaluation – summary
User input is input Speech Graffiti? dramas no theater is Showcase North theater yes is input shapeable? give terse confirmation or query result no yes what give error beep give shaping confirmation Showcase Cinemas Pittsburgh North {something to encourage user to say ‘genre is drama’} speech graffiti basics – related work – proposed strategy – evaluation – summary
Expanded grammar • Include a grammar that accepts more natural language input compared to Speech Graffiti • This is still not a full natural language grammar • Exploit the idea that knowledge of speaking to a restricted-language system limits input • 2-pass ASR process speech graffiti basics – related work – proposed strategy – evaluation – summary
Evaluations • Series of three user studies • Some iterative design changes • MovieLine • Current info about Pittsburgh theaters and movies speech graffiti basics – related work – proposed strategy – evaluation – summary
User Study I: baseline vs. simple shaping • Is a simple, adaptation-theoretic shaping strategy effective in increasing efficiency? • How well does 2-pass strategy work? • Generate corpus to inform shaping strategy refinements speech graffiti basics – related work – proposed strategy – evaluation – summary
Simple shaping confirmation • Adaptation & lexical entrainment is common in H-H and H-C interaction • Maybe this will work here? • Confirm expanded language input with Speech Graffiti slot+value form USER: Manor theater movie types USER: Manor theater movie types SYSTEM: theater is Cinemagic Manor theater, requesting genre. 5 matches: biography, comedy, crime, {…} speech graffiti basics – related work – proposed strategy – evaluation – summary
Study I: participants & conditions • 29 native Amer. Eng. speakers, ages 23-54, little to no computer programming exp. • Between-subjects, single domain (MovieLine) • ORIGINAL+tutorial (baseline Speech Graffiti, no shaping) • SIMPLE+tutorial • SIMPLE+no_tutorial speech graffiti basics – related work – proposed strategy – evaluation – summary
sample ORIGINAL interaction • Theater Manor, genre is comedy • Cinemagic Manor Theatre, comedy • What are movies? • 3 matches: Friends with Money, Thank You for Smoking, Tsotsi • Galleria • {beep} • Theater is Galleria • Carmike Galleria 6 • Genre is drama, what’s playing? • {confsig}, drama • Where was I? • Theater is Carmike Galleria 6, genre is drama • What • {beep} • sample SIMPLE interaction • Theater Manor, genre is comedy • Theater is Cinemagic Manor Theatre, genre is comedy • What are movies? • Requesting movie. 3 matches: Friends with Money, Thank You for Smoking, Tsotsi • Galleria • Theater is Carmike Galleria 6 • Theater is Galleria • Theater is Carmike Galleria 6 • Genre is drama, what’s playing? • Genre is drama, requesting movie. Sorry, there • are no matches. • Where was I? • Theater is Carmike Galleria 6, genre is drama, what is movie? • What • {beep} speech graffiti basics – related work – proposed strategy – evaluation – summary
Study I: setup • 15 MovieLine tasks • You want to see Fantastic Four at the Norwin Hills theater. Find out when it’s showing there. • Completed in lab, via telephone (20-40mins.) • SASSI-based user satisfaction survey (Hone & Graham, 00) • Seven user satisfaction factors • System response accuracy, Likeability, Cognitive demand, Annoyance, Habitability, Speed, + TTS speech graffiti basics – related work – proposed strategy – evaluation – summary
Study 1 results: efficiency & user satisfaction • No significant differences between ORIGINAL & SIMPLE • Trend towards greater efficiency for SIMPLE • Higher task completion • Lower median time & turns on task • Trend towards greater satisfaction for SIMPLE Original Simple
Study 1 results: tutorial • No significant differences between SIMPLE+tutorial and SIMPLE+no-tutorial • Pre-use tutorial is not necessary speech graffiti basics – related work – proposed strategy – evaluation – summary
Study 1 results: grammaticality • No significant differences • Intrasession grammaticality is key • Evidence of convergence • Significant within-subj. grammaticality increases for both groups • ORIGINAL group increased significantly more sharply • Stronger correlation between grammaticality and user satisfaction & task success for ORIGINAL Original Simple
User study II: more-explicit shaping • Can more-explicit shaping strategies have an effect on efficiency and Speech Graffiti convergence? speech graffiti basics – related work – proposed strategy – evaluation – summary
Study II: participants & conditions • 30 native Amer. Eng. speakers, ages 21-54, little to no computer programming exp. • Between-subjects, single domain (MovieLine) • SIMPLE • SUGGESTING • REQUIRING speech graffiti basics – related work – proposed strategy – evaluation – summary
SUGGESTING prompt • Give a Speech Graffiti example • Encourage user to speak that way next time • Not encourage immediate repeating • Not encourage a yes/no response • Leave open the possibility that there was a recognition error USER: Manor theater movie types SYSTEM: I think I heard “Manor the movie types.” Next time it would help to use Speech Graffiti, as in “theater is Cinemagic Manor theater, list genres.” Listing 5 genres: biography, comedy, crime, {…} USER: Manor theater movie types
REQUIRING prompt • Give a Speech Graffiti example • Ask the user to rephrase immediately • Leave open the possibility that there was a recognition error USER: Manor theater movie types USER: Manor theater movie types SYSTEM: Please rephrase that using Speech Graffiti. For example, “theater is Cinemagic Manor theater, list genres.”
Study 1I results: efficiency & user satisfaction • No sig. differences between 3 conditions • Satisfaction scores for REQUIRING tended to be lowest • Habitability: lowest for SIMPLE Simple Suggesting Requiring speech graffiti basics – related work – proposed strategy – evaluation – summary
Simple Suggesting Requiring Study 1I results: global grammaticality • No significant differences • Evidence of convergence • Significant within-subject grammaticality increases for all groups Simple Suggesting Requiring speech graffiti basics – related work – proposed strategy – evaluation – summary
Simple Suggesting Requiring Study II: initial grammaticality • Consider how quickly users become proficient in Speech Graffiti • Target grammaticality level for success: 80% • Low initial grammaticality • < 80% in 1st quarter • High initial grammaticality • ≥ 80% in 1st quarter speech graffiti basics – related work – proposed strategy – evaluation – summary
Study II: initial grammaticality (ii) • REQUIRING condition • Good for low-initial- grammaticality users • Helps them know what to say • Not so good for high-initial-grammaticality users • Not error-robust, so confusing and/or annoying • suggested a more flexible approach Simple Suggesting Requiring USER: theater is Cinemagic Manor theater SPEECH RECOGNITION HYPOTHESIS: cinemagic manor theater SYSTEM: Please rephrase that using Speech Graffiti. For example, theater is Cinemagic Manor theater.
User study III: adaptive shaping • How does an adaptive shaping strategy affect interaction efficiency? • How does interaction efficiency change over the course of a user’s experience with a system? • How well do users transfer skills from one Speech Graffiti application to another? speech graffiti basics – related work – proposed strategy – evaluation – summary
Study III participants & conditions • 27-22 native Amer. Eng. speakers, ages 23-54, little to no computer programming exp. • Two conditions • SUGGESTING (now including ASR hyp. only on every 3rd trigger) • ADAPTIVE speech graffiti basics – related work – proposed strategy – evaluation – summary
Study III: ADAPTIVE shaping • Users start with REQUIRING version • After proficiency established, shift to SIMPLE • Shift back to REQUIRING if user is excessively ungrammatical USER: theater is Cinemagic Manor theater SPEECH RECOGNITION HYPOTHESIS: cinemagic manor theater SYSTEM: theater is Cinemagic Manor theater. speech graffiti basics – related work – proposed strategy – evaluation – summary
Study III setup: longitudinal • 1st session in lab, 8 tasks • Sessions 2-6 independent, ~1 week apart • Four tasks in sessions 2, 3, 4 & 6; six in session 5 • SASSI-based user satisfaction survey • After sessions 1,4 & 6 speech graffiti basics – related work – proposed strategy – evaluation – summary
Study III setup: cross-domain • Sessions 1 through 4: MovieLine • Sessions 5 & 6: DineLine • Pittsburgh restaurant info • Same # of slots (9) as MovieLine • Non-shaping • Can shaping apps be training apps? Area is South Side. South Side.Cuisine is American. American.List restaurants. Listing 7 restaurants: City Grill, Hot Metal Grille, Mario’s Southside Saloon, and more. speech graffiti basics – related work – proposed strategy – evaluation – summary
Study III results: efficiency • Session 1, between subjects • Similar task completion • Generally lower median time/turns for ADAPTIVE • Significantly higher time/turns-to-completion for ADAPTIVE Adaptive Suggesting Adaptive Suggesting Adaptive Suggesting speech graffiti basics – related work – proposed strategy – evaluation – summary
Study III results: longitudinal & cross-domain • Task completion • Increased from S1S4 • Decreased to S6, significantly so for SUGGESTING • SUGGESTINGusers completed sig. fewer DineLine tasks • Time- and turns-to-completion • Decreased S1S4, increased to S6 • No real difference between initial MovieLine & initial DineLine • Median time- & turns-on-task sig. lower for ADAPTIVE DL
Study III results: user satisfaction Adaptive Suggesting Adaptive Suggesting Adaptive Suggesting speech graffiti basics – related work – proposed strategy – evaluation – summary
Study III results: grammaticality • Significant intrasession increases for whole population • Significant increases S1S4, S5, S6 • Somewhat stronger S1S5 for ADAPTIVE(p = 0.13) • 80% threshold: • S1: 6 users (22%) • S5: 16 users (70%) speech graffiti basics – related work – proposed strategy – evaluation – summary
Evaluation summary • User Study I: • Trend towards increased efficiency & satisfaction for shaping • Successful interactions without tutorial • Two-pass ASR successful • Overall intrasession convergence • User Study II: • Overall intrasession convergence • REQUIRING: • Strong local convergence • Trend towards lower satisfaction, & non-robust to errors • SIMPLE: lower habitability • User Study III: • Overall intra- & intersession convergence • Cross-domain transfer over all participants… • …to a standard Speech Graffiti application with no tutorial • ADAPTIVE: • More time/turns for completed tasks in initial session • Increased task completion across domains & likeability over time, trend towards greater grammaticality change speech graffiti basics – related work – proposed strategy – evaluation – summary
Thesis statement, revisited • Shaping can be used to induce more efficient user interactions with spoken dialog systems. • The shaping strategy can improve efficiency by increasing the amount of user input that is actually understood by the system, leading to increased task completion rates and higher user satisfaction. • Significantly reduced concept error (Study I) higher task completion, lower median time/turns (p < .25), higher mean satisfaction scores • This strategy can also reduce upfront training time, thus accelerating the process of realizing more efficient interaction. • Pre-use tutorial not necessary (Study 1) • Also supports cross-domain skill transfer (significant increase in initial grammaticality with new system) speech graffiti basics – related work – proposed strategy – evaluation – summary
Contributions (i) • Investigation of specific shaping strategies: • SIMPLE strategy: low habitability, especially for low-initial-grammaticality users • REQUIRING strategy: by far the strongest effect on local convergence, but non-robust to ASR errors • ADAPTIVE strategy: initially lower efficiency, but supports better cross-domain performance • Users generally exhibited intrasession grammaticality increases regardless of the particular shaping strategy, attesting to the power of convergence as a general phenomenon. speech graffiti basics – related work – proposed strategy – evaluation – summary
Contributions (ii) • More efficient interactions: • Shaping can eliminate need for tutorial, without a corresponding decline in interaction efficiency. • Integration of shaping and the two-pass recognition process allows users to complete tasks while using natural language and learning the Speech Graffiti format. • As successive iterations of shaping strategies have been implemented, mean user satisfaction scores have risen, indicating more effective interactions. • More efficient interactions for more users speech graffiti basics – related work – proposed strategy – evaluation – summary
Contributions (iii) • Demonstration of a spoken dialog system that • Is fully functional (not Wizard-of-Oz) • Is not directed-dialog • Accesses real-world data • And…leverages users’ propensity for convergence speech graffiti basics – related work – proposed strategy – evaluation – summary
Future work / extensions • Public system, á là Let’s Go! • Visual aids • From reference cards to multimodal systems • More complex domains (10s-100s of slots) • Integration into natural language system • Shape to acoustically preferred/unambiguous input? speech graffiti basics – related work – proposed strategy – evaluation – summary
Time for discussion • Thanks!
References • Bell, L. (2003). Linguistic adaptations in spoken human-computer dialogues – Empirical studies of user behavior. (PhD thesis, KTH, Stockholm). • Black, J.B. and Moran, T.P. (1982.) “Learning and Remembering Command Names.” In Proceedings of the Conference on Human Factors in Computing Systems, pp. 8-11. • Brennan, S.E. (1996.) “Lexical entrainment in spontaneous dialog.” In Proceedings of the International Symposium on Spoken Dialogue, pp. 41-44. • Burgoon, J.K., Stern, L.A., & Dillman, L. (1995). Interpersonal adaptation: Dyadic interaction patterns. Cambridge: Cambridge University Press. • Jackson, M.D. (1983.) “Constrained Languages Need Not Constrain Person/Computer Interaction.” SIGCHI Bulletin, 15(2-3):18-22. • Kelly, M. (1977). Limited vocabulary natural language dialogue. International Journal of Man-Machine Studies, 9, 479-501. • Pickering, M.J. & Garrod, S. (2004). Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences, 27, 169-226. • Sidner, C. and Forlines, C. (2002.) “Subset Languages for Conversing with Collaborative Interface Agents.” In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP), Denver, CO, pp. 281-284. • Zoltan-Ford, E. (1991.) “How to get people to say and type what computers can understand.” International Journal of Man-Machine Studies, 34:527-547.
Sphinx II ASR Linux Phoenix parser Sphinx control Gentner telephony control box Windows NT domain-independent Speech Graffiti engine application-specific domain manager Multimedia control Festival control .wav files database Festival Text-to-Speech System architecture
Study II setup • 15 MovieLine tasks • You want to see Fantastic Four at the Norwin Hills theater. Find out when it’s showing there. • Completed in lab, via telephone (20-40mins.) • SASSI-based user satisfaction survey • Seven user satisfaction factors • System response accuracy, Likeability, Cognitive demand, Annoyance, Habitability, Speed, + TTS
Study 1I results: efficiency • Efficiency measures: no significant differences between conditions Simple Suggesting Requiring Simple Suggesting Requiring Simple Suggesting Requiring