Ronnie W. Smith East Carolina University

Trying to Understand Misunderstanding: How Robust Can Spoken Natural Language Dialogue Systems Be? Ronnie W. Smith East Carolina University

Sponsors • National Science Foundation • Duke University • East Carolina University • DARPA • BBN

Collaborators 1987-1994: Dr. Alan Biermann, Dr. Ruth Day, Dr. Robert Rodman, Richard Hipp, Barry Koster, Dania Egedi, Robin Gambill, Curry Guinn 1994-2000: Dr. Steve Gordon, Robert Hoggard, Shannon Pollard, Chris Shaffer, Greg Keim, Jeremy Mauget, Devender Singh, Jason Carrow, Brian Manning

Quick Outline • Initial motivation and results • Careful studies • Not so careful studies • What I’d really like to do!

Initial Research Focus Develop a model of dialog processing that: • Uses problem-solving to achieve target goal • Exploits contextual knowledge from the discourse and user • Allows mixed-initiative interaction (computer or user controlled) • Allows voice interaction • Operates in real time to permit experimentation

Experimental Evaluation Issues: • Is computational model practical? • Would users dialog behavior change as they gained experience and were given control of the dialog? Hypotheses --As users gained experience and were given the initiative, they • Would complete the task faster • Would speak fewer utterances • Would speak longer utterances • Would spend more time thinking before producing a response Side effect: analyzable data

Data Collection: The Circuit Fix-It Shop • Constructed at Duke University • Assisted users in repairing an electronic circuit • 125 word vocabulary, speaker-dependent connected speech • Varying initiative fixed per interaction • 141 dialogs, 2840 user utterances • 114 or 84% of the dialogs were successfully completed • 50% utterance recognition rate • 81.5% correct interpretation rate

Sample Interaction – Part 1 C: This is the Circuit Fix-It Shop. How may I help you? U: This RS111 circuit is not working. C: I am familiar with that circuit. The LED is supposed to be displaying alternately flashing one and seven. U: The LED is flashing seven. C: Did you mean to say that the LED is flashing? U: The LED is displaying a flashing seven. C: Is the one on the LED displaying? U: No. C: Is anything else on the LED displaying other than the NOT** flashing seven? U: The seven is flashing. C: Is anything else on the LED displaying other than the flashing seven? U: No. C: What is the switch at when the LED is displaying only a flashing seven? U: It is up. C: What is the knob at when the LED is displaying only a flashing seven? U: The knob is at ten.

Sample Interaction – Part 2 C: Is there a wire between connector one eight and connector four one? U: Where is connector four one? C: Below the white knob with the orange circle is a pink box, called transistors. Now, in the upper right of this pink box, called transistors is connector four one. U: The wire is not there. C: Did you mean to say that there IS** a wire between connectors one eight and four one? U: There is no wire. C: Add a wire between connector one eight and connector four one.

Initial Miscommunication Handling Techniques Error-correcting Parsing (Hipp) • Convert input to “nearest” grammatical utterance • “nearest” is determined by a cost matrix for insertions, deletions, and substitutions of words • Costs are not all the same (e.g., “a” vs. “not”) Tell the user what went wrong • Only tell user what computer’s interpretation was • Only when misrecognition caused contradictory interpretation (but required for only 48% of these)

What to Do Next? Get a better speech recognizer! Well--- • Better is not the same as perfect! • Better => stretch its limits anyway • There will probably always be ungrammatical spoken inputs. • There will always be mismatched speaker/hearer background knowledge.

What to Do Next? Investigate strategies for the prevention, detection, and repair of miscommunication in natural language dialog • Detailed analysis of existing dialogs • Development and evaluation of strategies for handling miscommunication

Effects of Variable Initiative on Linguistic Behavior in Human-Computer Spoken Natural Language Dialog • Smith and Gordon (Computational Linguistics, March 1997) • Based on Circuit Fix-It Shop Data • Based on classifying utterances according to task phase • Introduction: establish task purpose • Assessment: establish current system behavior • Diagnosis: establish cause for errant behavior • Repair: establish completion of correction • Test: establish correctness of behavior

Result 1: Relative Number of Utterances Conclusion: Experienced users tend not to discuss details they can handle themselves.

Result 2: Frequency of User Subdialog Transistions Conclusion: Computer initiates most subdialogs except when experienced users are completing the task.

Result 3: Predictability of Subdialog Transistions Idealized Transition Model I D R A T F

Result 3: Predictability of Subdialog Transistions Empirical Transition Model 100 91 69 62 96 I D R A T F 8 80 75 72 97 19 12 39 53 25 Computer controlled % User controlled % 24 • Percentage “normal” dialogs • Computer-controlled: 64% • User-controlled: 33%

Study Conclusions Computer controlled dialogs--- • Have an orderly pattern of computer-initiated subdialogs • Have terse user responses • Are not amenable to user-correction during miscommunication User controlled dialogs--- • Are less orderly • Contain more user-initiated subdialogs • Indicate user willingness to exploit growing expertise

Analysis of Strategies for Selective Utterance Verification • Smith (ANLP, 1997; IJHCS, 1998) • Motivation---miscommunication due to speech recognition errors Spoken: I want to fix this circuit Recognized: power a six a circuit Spoken: there is no wire on connector one zero four Recognized: stays no wire I connector one zero four

Verification Subdialogs Computer: This is the circuit fix-it shop. How may I help you? Spoken: I want to fix a circuit. Recognized: power a six a circuit. Computer: Did you mean to say there is a power circuit? WHEN TO USE THIS??

Goal: SelectiveVerification • Initiate a verification subdialog only when it is believed to be needed. • Criteria for need: sufficiently unsure you’ve fully understood AND the need to fully understand is sufficiently great. • Terminology • Under-verification---system generates an incorrect meaning that is not verified • Over-verification---a correct meaning is verified • Ideal: minimize under-verifications while keeping over-verifications to a minimum as well

Measurements of Uncertainty • Parse Cost---sum of costs incurred by error-correcting parser in transforming input to a grammatical utterance • Expectation Cost---how expected was the response given the dialog context

Measuring Utterance Importance • Unexplored • Domain-dependent? • Fixed-threshold (depends on risk due to miscommunication)

Basic Results Generalization: This result holds under most threshold values.

Study Conclusions • Key parameters for selecting strategy and threshold value: expectation strength, word recognition rate, and penalty for under-verifications • Results depend on the level of system initiative • Results assume all verification subdialogs succeed • Limits to verification subdialogs => other strategies are needed.

General Problem • “Knowing” when you don’t know. OR • “Suspecting” when you don’t know. BUT • If you don’t know, how do you talk about it? THUS • System must provide some information on its understanding of context

Domain-Level Techniques • Detection of dead ends (suspicious error path checks out OK) C: The LED is supposed to be displaying alternately flashing one and seven. U: LED displaying seven (misrecognized and understood as “LED displaying flashing seven”) C: Is anything else on the LED on? U: No. C: Is there a wire between connectors 18 and 41? U: Yes C: Are you sure that the LED is displaying only a flashing seven?

Domain-Level Techniques • Detection of impossibilities (description not possible given “known” conditions) C: What is the LED displaying? U: The LED is displaying a one on the left hand side. C: Is anything else on the LED on? U: No. (NOTE: misstatement---seven also displaying) C: Is the one on the LED flashing? U: Yes. C: Are you sure that the one on the LED is flashing? U: Yes. C: Are you sure that the seven on the LED is off? U: The LED is displaying alternately flashing one and seven.

Domain-Level Techniques In general, these are detections of logical inconsistencies, but are their general dialogue strategies for discussing them?

Grounding---establishing common context of understanding • Beyond context free acknowledgments U: The LED is displaying a flashing seven. (misunderstood) C: Is the one on the LED displaying? U: No. C: Is anything else on the LED displaying other than the NOT** flashing seven? U: The seven is flashing. C: Is anything else on the LED displaying other than the flashing seven? U: No. • How much grounding is enough?

The Initiative Factor • Explicit semantic grounding leverages human intelligence • But only if we allow it! C: Whether there is a wire between connectors 111 and 120 needs to be observed. U: There is a wire between connectors 111 and 120 (misrecognized). C: There is not supposed to be a wire between connectors 111 and 1. U: (repeats and is understood)

The Initiative Factor • Too much computer control? C: What is the voltage between connectors 121 and 120? (computer suspects problem in power circuit) U: I believe there’s a problem with the LED circuit. C: What is the voltage between connectors 121 and 120? Suppose instead the response is--- U: Voltage is eight (misrecognized as control knob at six). Then computer response is okay.

Final Thoughts/Summary • Design systems to leverage human intelligence • The ability to follow step-by-step instruction? • Varying levels of system initiative • VERY challenging when user expertise evolves. • Menus vs. keyboard shortcuts???? • Explicit semantic grounding • Verification subdialogs, etc. • What’s the right amount? • We still need carefully designed studies with real systems!!!

Ronnie W. Smith East Carolina University

Ronnie W. Smith East Carolina University

Presentation Transcript

East Carolina University ASBESTOS AWARENESS

Weight Loss Surgery East Carolina University

East Carolina University Department of Materials Management

Aquatic Rehabilitation at East Carolina University

Welcome to East Carolina University Cashier’s Office

EAST CAROLINA UNIVERSITY OFFICE OF GRANTS AND CONTRACTS

East Carolina University

Tom W. Smith, NORC/University of Chicago

Terry Atkinson, East Carolina University Johna Faulconer, East Carolina University

Continuity Planning for East Carolina University

Brody School of Medicine East Carolina University

EAST CAROLINA UNIVERSITY OFFICE OF GRANTS AND CONTRACTS

Craig E. Landry East Carolina University

East Carolina University The Leadership University

Overview of East Carolina University Nephrology Fellowship Program

East Carolina University Onestop

Final Report for East Carolina University 2008-2009

East Carolina university

East Carolina University Dept of Materials Management

East Carolina University/Bertie Memorial Hospital

Appalachian State Barton College Campbell University Duke University East Carolina University

East Carolina University Onestop