180 likes | 333 Views
ANNOTATING EVENT ANAPHORA: A CASE STUDY. Tommaso Caselli and Irina Prodanof ILC-CNR, Pisa tommaso.caselli@ilc.cnr.it irina.prodanof@ilc.cnr.it. LREC-10 – May, 19th, La Valletta, Malta. Outline. Motivations Coreference annotation in TimeML Annotating event anaphora: a preliminary scheme
E N D
ANNOTATING EVENT ANAPHORA: A CASE STUDY Tommaso Caselli and Irina Prodanof ILC-CNR, Pisa tommaso.caselli@ilc.cnr.it irina.prodanof@ilc.cnr.it LREC-10 – May, 19th, La Valletta, Malta
Outline • Motivations • Coreference annotation in TimeML • Annotating event anaphora: a preliminary scheme • Annotation methodology and results • Lesson learned and future works
Motivations • Eventualities represent the building blocks of the informative content of a document • Eventualities give rise to relations which create a rich informative network. • temporal relations • sharing of participants • factivity • coreferential relations • Coreferential relations among eventualities plays an important role for facilitating access to content and extract relevant information
Coref. in TimeML • TimeML & ISO-TimeML are standards for the annotation of events, temporal expressions and a set of relations between these entities (temporal, subordinating and aspectual relations) • Main contribution of TimeML: standard definition of event and methodology for its annotation • It-TimeML: Italian adaptation of TimeML (updated version on request) and part of ISO-TimeML • It-TimeML is currently used for the creation of the Italian TimeBank (172 news articles from ISST, PAROLE and Web, 67,140 tokens)
Coref. in TimeML (2) • TimeML tags involved: EVENT and TLINK (temporal link) • TimeML has not a specific link for coreference annotation • workaround: use of a special value of the TLINK tag: “identity” • “identity” is used to: • connect two tokens which are part of a single event instance (e.g. light verbs) • connect coreferential relations between events, namely set-subset
Coref. in TimeML (3) – Use of “identity” fare la spesa [to do shopping]. <EVENT id="e1">fare</EVENT> la <EVENT id="e2">spesa</EVENT> <TLINK lid="l1" eventInstanceID="e1" relatedToEventInstance="e2“ relType="IDENTITY"/>
Coref. in TimeML – Use of “identity” (3) La sessione privata servira’ a tre adempimentij . Innanzitutto, all’ approvazionej della proposta di Abete (ISST sole006). The private session will be used for three [fulfillments]j . First, the [approval]j of the proposal of Abete. La <EVENT id="e1">sessione</EVENT> privata <EVENT id="e2">servira’</EVENT> a tre <EVENT id="e3">adempimenti</EVENT>. <SIGNAL id="s1">Innanzitutto</SIGNAL>, all’ <EVENT id="e4>approvazione</EVENT> della <EVENT id="e5">proposta</EVENT>di Abete. <TLINK lid="l1" eventInstanceID="e4“ relatedToEventInstance="e3" relType="IDENTITY"/>
Coref. in TimeML (4) • The use of the value “identity” is not satisfactory since it is NOT homogeneous • During the (current!) annotation effort for the creation of the Italian TimeBank we have observed that this value could be applied to other cases such as: • synonyms • hypernyms • coreference (strict coreference – same referent in the world)
Event Anaphora • Previous works: Hasler et al 2006; Bejan & Harabagiu 2008 • Hasler et al. 2006: only NPs coreference (strict definition), detailed guidelines – but NO specifications for the annotation; • which events? ACE event frame (LIFE, CONFLICT, MOVEMENT, JUSTICE….) • TimeML compliant • Bejan & Harabagiu 2008: event coreference as a side effect of event structure. • Event coreference is considered when two predicates express same predicate, synonyms or hypernyms and share same arguments • TimeML compliant
Event Anaphora - Methodology (2) • Our approach: • no event frames nor event templates; all instances of event annotated in the Italian TimeBank (TimeML compliant); • open-domain text/discourse • coarse grained bottom up approach in the definition of the annotation scheme • reduced and limited set of guidelines active discovery of what is needed through annotation and observations from the data • event anaphora: strict coreference + indirect coreference
Event Anaphora - Annotation scheme (3) <MARKABLE> = <EVENT> BUT extended includes annotation of pronouns and adverbs. JJJJJJIII MA
Event Anaphora - Annotation scheme (4) <EMPTY> = to annotate cases of zero anaphora and ellipsis (frequent in Italian) <TOPIC> = to annotate entire portions of text; it provides anchor to those linguistic entities which can refer to discourse topic “Stiamo ancora parlando, come certamente deve essere, e continueremo a consultarci”j. James Baker, segretario al Tesoro americano, ha commentato cosi’ji risultati dell’assemblea. (ISST els019) “[We are still speaking, as it should be, and we will keep consulting]”j. James Baker, the American Treasure secretary, commented [so]jthe results of the assembly.
Event Anaphora - Annotation scheme (4) <EMPTY> = to annotate cases of zero anaphora and ellipsis (frequent in Italian) <TOPIC> = to annotate entire portions of text; it provides anchor to those linguistic entities which can refer to discourse topic <LINK> = it marks up an anaphoric relations. The attribute “anaphorType” explicits which type of anaporic relation “src” marks the anchor
Event Anaphora – Results (5) • Annotation tool: PALinkA (Orasan, 2003) • 3 annotators / 1,792 tokens • no K scores • Low agreement on the identification of anaphora but relative good on the anchors • More specific guidelines and information • Event anaphora is a widespread phenomenon
Lession Learned and Future Work • Event anaphora is a widespread phenomenon which must be addressed in separate tasks • Relations between full event N, V, PP and Adj • no pronominal anaphoras • New annotation scheme: • 2 tags: <EVENT> and <AnafLink> • different attributes for <EVENT>: FACTIVITY, GENERICITY, POLARITY • relations between particular events according to the attributes' values • reduced type of anaphors (two values: direct vs. indirect) • Tracking of the participants: how to? • Event anaphora annotation as a further link in TimeML or as a separate task which can be built upon the TimeML annotation • New Tool: BAT (thanks to Marc Verhagen)