1 / 27

Dialogue Structure and Pronoun Resolution

Dialogue Structure and Pronoun Resolution. Joel Tetreault and James Allen University of Rochester Department of Computer Science DAARC September 23, 2004. WELCOME TO DAARC!!!. Reference in Spoken Dialogue. Resolving anaphoric expressions correctly is critical in task-oriented domains

declan
Download Presentation

Dialogue Structure and Pronoun Resolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dialogue Structure and Pronoun Resolution Joel Tetreault and James Allen University of Rochester Department of Computer Science DAARC September 23, 2004

  2. WELCOME TO DAARC!!!

  3. Reference in Spoken Dialogue • Resolving anaphoric expressions correctly is critical in task-oriented domains • Makes conversation easier for humans • Reference resolution module provides feedback to other components in system • Ie. Incremental Parsing, Interpretation Module • Investigate how to improve RRM: • Discourse Structure could be effective in reducing search space of antecedents and improving accuracy (Grosz and Sidner, 1986) • Paucity of empirical work: Byron and Stent (1998), Eckert and Strube (2001), Byron (2002)

  4. Goal • To evaluate whether shallow approaches to dialogue structure can improve a reference resolution algorithm (LRC used as baseline model to augment) • Investigated two models: • Eckert &Strube (manual and automatic versions) • “Literal QUD” model (manual)

  5. Outline • Background • Dialogue Act synchronization (Eckert and Strube model) • QUD (Craige Roberts) • Monroe Corpus • Algorithm • Results • 3rd person pronoun evaluation • Dialogue Structure • Summary

  6. Past approaches in structure and reference • Veins: the nuclei of RST trees are the most salient discourse units, the entities in these units are this more salient than others • Tetreault (2003): Penn Treebank subset annotated with RST. Used G&S approximations to try to improve on LRC baseline. • Result: performed the same as baseline • Veins: decreased performance slightly • Problem: fine-grained approaches (RST) are difficult to annotate reliably and do in real-time. • Perhaps shallow approaches can work?

  7. literal QUD • Questions Under Discussion (Craige Roberts, Jonathan Ginzburg) – “what are we talking about?”: topics create discourse segments • Literally: questions or modals can be viewed as creating a discourse segment • Result – questions provide a shallow discourse structuring, and that maybe enough to improve performance, especially in a task-oriented domain • Entities in QUD main segment can be viewed as the topic • Segment closed when question is answered (use ack sequences, change in entities used) • only entities from answer and entities in question are accessible • Can be used in TRIPS to reduce search space of entities – set context size

  8. QUD Annotation Scheme • Annotate: • Start utterance • End utterance • Type (aside, repeated question, unanswered, open-ended, clarification) • Kappa (compared with reconciled data):

  9. Example - QUD utt06 U: Where is it? utt07 U: Just a second utt08 U: I can't find the Rochester airport utt09 S: It's -------------------------------------------------------- utt10 U: I think I have a disability with maps utt11 U: Have I ever told you that before utt12 S: It's located on brooks avenue utt13 U: Oh thank you utt14 S: Do you see it? utt15 U: Yes (QUD-entry :start utt06 :end utt13 :type clarification) (QUD-entry :start utt10 :end utt11 :type aside)

  10. Example - QUD (utt10-11 processed) utt06 U: Where is it? utt07 U: Just a second utt08 U: I can't find the Rochester airport utt09 S: It's [utt10,11 removed] -------------------------------------------------------- utt12 S: It's located on brooks avenue utt13 U: Oh thank you utt14 S: Do you see it? utt15 U: Yes (QUD-entry :start utt06 :end utt13 :type clarification) (QUD-entry :start utt10 :end utt11 :type aside)

  11. Example - QUD (s13 processed) [utt06-13 collapsed: {the Rochester airport, brooks avenue}] -------------------------------------------------------- utt14 S: Do you see it? utt15 U: Yes (QUD-entry :start utt06 :end utt13 :type clarification)

  12. QUD Issues • Issue 1: easy to detect Q’s (use Speech-Act information), but how do you know Q is answered? • Cue words, multiple acknowledgements, changes in entities discussed provide strong clues that question is finishing, but general questions such as “how are we going to do this?” can be ambiguous • Issue 2: what is more salient to a QUD pronoun – the QUD topic or a more recent entity?

  13. Dialogue Act Segmentation • E&S: model to resolve all types of pronouns (3rd person and abstract) in spoken dialogue • Intuition: grounding is very important in spoken dialogue • Utterances that are not acknowledged by the listener may not be in common ground and thus not accessible to pronominal reference

  14. Dialogue Act Segmentation • Each utterance marked as • (I): contains content (initiation), question • (A): acknowledgment • (C): combination of the above • (N): none of the above • Basic algorithm: utterances not ack’d or not in a string of I’s are removed from the discourse before next sentence is processed • Evaluation showed improvement for pronouns referring to abstract entities, and strong annotator reliability • Pronoun performance? Unclear, no comparison of measure without using DA model

  15. Example – DA model (I) (N) (I) (N) (I) (I) (I) (A) (I) (A) utt06 U: Where is it? utt07 U: Just a second utt08 U: I can't find the Rochester airport utt09 S: It's utt10 U: I think I have a disability with maps (removed) utt11 U: Have I ever told you that before utt12 S: It's located on brooks avenue utt13 U: Oh thank you utt14 S: Do you see it? utt15 U: Yes

  16. Parsing Monroe Domain • Domain: Monroe Corpus of 20 transcriptions (Stent, 2001) of human subjects collaborating on Emergency Rescue 911 tasks • Each dialogue was at least 10 minutes long, and most were over 300 utterances long • Work presented here focuses on 5 of the dialogues (1756 utterances) (278 3rd person pronouns) • Goals: develop a corpus of sentences parsed with rich syntactic, semantic, discourse information to • Able to parse 5 dialogue sub-corpus with 84% accuracy • More details see ACL Discourse Annotation ‘04

  17. TRIPS Parser • Broad-coverage, deep parser • Uses bottom-up algorithm with CFG and domain independent ontology combined with a domain model • Flat, unscoped LF with events and labeled semantic roles based on FrameNet • Semantic information for noun phrases based on EuroWordNet

  18. Parser information for Reference • Rich parser output is helpful for discourse annotation and reference resolution: • Referring expressions identified (pronoun, NP, impros) • Verb roles and temporal information (tense, aspect) identified • Noun phrases have semantic information associated with them • Speech act information (question, acknowledgment) • Discourse markers (so, but) • Semi-automatic annotation increases reliability

  19. Semantics Example: “an ambulance” • (TERM :VAR V213818 :LF (A V213818 (:* LF::LAND-VEHICLE W::AMBULANCE) :INPUT (AN AMBULANCE)) :SEM ($ F::PHYS-OBJ (SPATIAL-ABSTRACTION SPATIAL-POINT) (GROUP -) (MOBILITY LAND-MOVABLE) (FORM ENCLOSURE) (ORIGIN ARTIFACT) (OBJECT-FUNCTION VEHICLE) (INTENTIONAL -) (INFORMATION -) (CONTAINER (OR + -)) (TRAJECTORY -)))

  20. Reference Annotation • Annotated dialogues for reference w/undergraduate researchers (created a Java Tool: PronounTool) • Markables determined by LF terms • Identification numbers determined by :VAR field of LF term • Used stand-off file to encode what each pronoun refers to (refers-to) and the relation between pronoun and antecedent (relation) • Post-processing phase assigns an unique identification number to coreference chains • Also annotated coreference between definite noun phrases

  21. Reference Annotation • Used slightly modified MATE scheme: pronouns divided into the following types: • IDENTITY (Coreference) (278) • Includes set constructions (6) • FUNCTIONAL (20) • PROPOSITON/D.DEXEIS (41) • ACTION/EVENT (22) • INDEXICAL (417) • EXPLETIVE (97) • DIFFICULT (5)

  22. LRC Algorithm • LRC: modified centering algorithm (Tetreault ’01) that does not use Cb or transitions, but keeps a Cf-list (history) for each utterance • While processing utterance’s entities (left to right) do: Push entity onto Cf-list-new, for a pronoun p, attempt to resolve: • Search through Cf-list-new (l-to-r) taking the first candidate that meets gender, agreement, and binding and semantic feature constraints. • If none found, search past utterance’s Cf-lists starting from previous utterance to beginning of discourse • When p is resolved, push pronoun with semantic features from antecedent on to Cf-list-new • More details see SemDial ‘04

  23. LRC Algorithm with Structure Info • Augmented algorithm with extensions to handle QUD and E&S input • For QUD, at the start and end of processing an utterance, QUD’s are started (pushed on stack) or ended (entities are collapsed), so Cf-list history changes • For E&S, each utterance is assigned a DA code and then removed or kept depending on the next utterance (if it is an acknowledgement, or a series of I’s)

  24. Results

  25. Error Analysis • Though QUD and +sem baseline performed the same (89 errors), they each got 3 pronouns right the other did not • Baseline: • 3 collapsing nodes removes correct antecedent • QUD: • 2 right associated with blocking off aside • 1 associated with collapsing (intervening nodes blocked) • 15 pronouns, both got wrong, but made different predictions • Remaining 71, both made same error

  26. Issues • Structuring methods are probably more trouble than they are worth with the corpora available right now • Also only affect a few pronouns • Segment ends are least reliable • What constitutes an end? • 3 errors show either boundaries are marked incorrectly if pronouns are accessing elements in a “closed” DS • Or perhaps collapsing routine is too harsh • Small corpus size • Hard to draw definite conclusions given only 3 criss-crossed errors • need more data for statistical evaluations

  27. Issues • E&S Model has advantage over QUD of being easiest to automate, but fares worse since it takes into account a small window of utterances (extremely shallow) • QUD model can be semi-automated (detecting question starts is easy) but detecting ends and type are harder • QUD could definitely be improved by taking into account plan initiations and suggestions, instead of limiting to questions only, but tradeoff is reliability

More Related