1 / 87

CROWDSOURCING (ANAPHORIC) ANNOTATION

CROWDSOURCING (ANAPHORIC) ANNOTATION. Massimo Poesio University of Essex Part 1: Intro, Microtask crowdsourcing. ANNOTATED CORPORA: AN ERA OF PLENTY?.

lena
Download Presentation

CROWDSOURCING (ANAPHORIC) ANNOTATION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CROWDSOURCING (ANAPHORIC) ANNOTATION Massimo Poesio University of Essex Part 1: Intro, Microtaskcrowdsourcing

  2. ANNOTATED CORPORA: AN ERA OF PLENTY? • With the release of the OntoNotes and ANC corpora for English and a number of corpora for other languages (Prague Dependency Treebank especially) we may think we won’t need to do any more annotation for a while • But this is not the case: • Size still an issue • Problems • Novel tasks

  3. THE CASE OF ANAPHORIC ANNOTATION • After many years of scarcity we now live in an era of abundance to study anaphora • The release after 2008 of a number of anaphoric corpora annotated according to more linguistically inspired guidelines has enabled computational linguists to start working on a version of the problem more resembling the way linguists look at it • Also one hears less of coreference and more of anaphora • Recent competitions all use corpora of this type: • SEMEVAL 2010, CONLL 2011 / 2012, EVALITA 2011, …

  4. BUT … • The larger corpora • Not that large: still only 1M words at the most • See problems of overfitting with Penn Treebank • Cover only a few genres (news mostly) • Anno schemes still pretty basic

  5. ONTONOTES: OUT OF SCOPE There was not a moment to be lost: away went Alice like the wind, and was just in time to hear it say, as it turned a corner, 'Oh my ears and whiskers, how late it's getting!' She was close behind it when she turned the corner, but the Rabbit was no longer to be seen: she found herself in a long, low hall, which was lit up by a row of lamps hanging from the roof. There were doors all round the hall, but they were all locked; and when Alice had been all the way down one side and up the other, trying every door, she walked sadly down the middle, wondering how she was ever to get out again.

  6. ONTONOTES: OUT OF SCOPE There was not a moment to be lost: away went Alice like the wind, and was just in time to hear it say, as it turned a corner, 'Oh my ears and whiskers, how late it's getting!' She was close behind it when she turned the corner, but the Rabbit was no longer to be seen: she found herself in a long, low hall, which was lit up by a row of lamps hanging from the roof. There were doors all round the hall, but they were all locked; and when Alice had been all the way down one side and up the other, trying every door, she walked sadly down the middle, wondering how she was ever to get out again.

  7. ONTONOTES: OUT OF SCOPE There was nothing so VERY remarkable in that; nor did Alice think it so VERY much out of the way to hear the Rabbit say to itself, 'Oh dear! Oh dear! I shall be late!' (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually TOOK A WATCH OUT OF ITS WAISTCOAT-POCKET, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge.

  8. ALSO … • In many respects the doubts concerning the empirical approach to anaphora raised by Zaenen in her 2006 CL squib (Markup barking up the wrong tree) still apply • “anaphora is not like syntax – we don’t understand the phenomenon quite as well” • Even the simpler types of anaphoric annotation are still problematic • If not for linguists, for computational types • And we only started grappling with the more complex types of anaphora

  9. PLURALS WITH COORDINATED ANTECEDENTS: SPLIT ANTECEDENTS (GNOME,ARRAU,SERENGETI,TUBA/DZ?) • She could hear the rattle of the teacups as [[the March Hare] and [his friends]] • shared their never-ending meal • 'In THAT direction,' the Cat said, waving its right paw round, 'lives [a Hatter]: and in THAT direction,' waving the other paw, 'lives [a March Hare]. Visit either you like: they're both mad.' • Alice had no idea what [Latitude] was, or • [Longitude] either, but thought they were nice grand words to say.

  10. AMBIGUITY: REFERENT 15.12 M: we’re gonna take the engine E3 15.13 : and shove it over to Corning 15.14 : hook [it] up to [the tanker car] 15.15 : _and_ 15.16 : send it back to Elmira (from the TRAINS-91 dialogues collected at the University of Rochester)

  11. AMBIGUITY: REFERENT About 160 workers at a factorythat made paper for the Kent filters were exposed to asbestos in the 1950s. Areas of the factorywere particularly dusty where the crocidolite was used. Workers dumped large burlap sacks of the imported material into a huge bin, poured in cotton and acetate fibers and mechanically mixed the dry fibers in a process used to make filters. Workers described "clouds of blue dust" that hung over parts of the factory, even though exhaust fans ventilated the area. www.phrasedetectives.com

  12. AMBIGUITY: EXPLETIVES 'I beg your pardon!' said the Mouse, frowning, but very politely: 'Did you speak?' 'Not I!' said the Lory hastily. 'I thought you did,' said the Mouse. '--I proceed. "Edwin and Morcar, the earls of Mercia and Northumbria, declared for him: and even Stigand, the patriotic archbishop of Canterbury, found it advisable--"' 'Found WHAT?' said the Duck. 'Found IT,' the Mouse replied rather crossly: 'of course you know what "it" means.'

  13. AND ANYWAY … • A great deal, if not most, research in (computational) linguistics requires anno of new types • A new syntactic phenomenon • Named entities in a new domain • A type of anaphoric annotation not yet annotated (or annotated badly)

  14. OUR CONTENTION • CROWDSOURCING can be (part of) the solution to these problems • Microtaskcrowdsourcing for small-to-medium scale annotation projects • Including the majority of annotation carried out by linguists / psycholinguists (see eg Munro et al 2010) • Games-with-a-purpose for larger scale projects • Either to gather more evidence about the linguistic phenomena

  15. OUTLINE OF THE LECTURES • Crowdsourcing in AI and NLP (today) • What is crowdsourcing; applications to NLP • Games with a purpose • ESP, Verbosity • Phrase Detectives • Analyzing crowdsourced data

  16. CROWDSOURCING

  17. THE PROBLEM • Many AI problems require knowledge on a massive scale • Commonsense knowledge (700M facts?) • Vision • Previous common wisdom: • Impossible to codify such knowledge by hand (witness CYC) • Need to learn all from scratch • New common wisdom: “Given the advent of the World Wide Web, AI projects now have access to the minds of millions” (Singh 2002)

  18. CROWDSOURCING • Take a task traditionally performed by one of a few agents • Outsource it to the crowd on the web

  19. THE WISDOM OF CROWDS • By rights using the ‘crowd’ should lead to poor quality • But in fact it often turns out that the judgment of the crowd is as good or better than that of the experts

  20. WIKIPEDIA

  21. WIKIPEDIA AND CROWDSOURCING • Wikipedia, although not an AI project, is perhaps the best illustration of the power of crowdsourcing – how putting together many minds may result in an output often of incredible quality • It is also a great illustration of what works and doesn’t: • E.g., Editorial control must be exercised by the web collaborators themselves

  22. OTHER WEB COLLABORATION PROJECTS • The OPEN DIRECTORY PROJECT • Crater mapping (results) – Kanefsky • Citizen Science • Cognition and Language Laboratory • Web Experiments • Galaxy Zoo – Oxford University www.phrasedetectives.com

  23. Galaxy Zoo

  24. GalaxyZoo • Launched in July 2007 • 1M galaxies imaged • 50M classifications in first year, from 150,000 visitors

  25. COLLECTIVE RESOURCE CREATION FOR AI: OPEN MIND COMMONSENSE

  26. Singh 2002:“Every ordinary person has common sense of the kind we want to give to our machines” WEB COLLABORATION IN AI: OPEN MIND COMMONSENSE www.phrasedetectives.com

  27. OPEN MIND COMMONSENSE

  28. OPEN MIND COMMONSENSE • A project started in 1999 (Chklovski, 1999) to take to collect commonsense from NETIZENS • Around 30 ACTIVITIES organized to collect knowledge • About taxonomies (CATS ARE MAMMALS) • About the uses of objects (CHAIRS ARE FOR SITTING ON)

  29. WHAT’S IN OPEN MIND COMMONSENSE: CAR

  30. COLLECTING COMMONSENSE KNOWLEDGE • Originally: • Using TEMPLATES • Asking people to write stories • Now: just templates?

  31. OPEN MIND COMMONSENSE: ADDING KNOWLEDGE

  32. TEMPLATES FOR ADDING KNOWLEDGE

  33. OPEN MIND COMMONSENSE: CHECKING KNOWLEDGE

  34. FROM OPENMIND COMMONSENSE TO CONCEPT NET • ConceptNet (Havasi et al, 2009) is a semantic network extracted from OpenMind Commonsense assertions using simple heuristics

  35. CONCEPT NET

  36. FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET • A lime is a very sour fruit • isa(lime,fruit) • property_of(lime,very_sour)

  37. OTHER USES OF WEB COLLABORATION IN AI • Learner / Learner2 / 1001 Paraphrases – Chklovski • FACTory – CyCORP • Hot or Not – 8 Days • Semantic Wikis: www.semwiki.org www.phrasedetectives.com

  38. CROWDSOURCING: INCENTIVES

  39. What motivated thousands or millions of people to collaborate on the web? • Shared intent • Wikipedia • Citizen Science • Financial incentives • Microtaskcrowdsourcing • Enjoyment • Games-with-a-purpose

  40. MICROTASK CROWDSOURCING

  41. THE FINANCIAL INCENTIVE: MECHANICAL TURK • Wikipedia, OpenMind Commonsense, all rely on the voluntary effort of web users • Mechanical Turk was developed by Amazon to take advantage of the willingness of large numbers of web users to do some work for very little pay

  42. THE MECHANICAL TURK

  43. AMAZON MECHANICAL TURK

  44. HITs • On the Mechanical Turk site, a REQUESTER creates a HUMAN INTELLIGENCE TASK (HIT) and specify how much he is willing to pay for TURKERS to complete it • Typically, the payment is of the order of 1 to 10 cents per task

  45. A TYPICAL HIT

  46. CREATING A HIT • Design • Publish • Manage

  47. RESOURCE CENTER

  48. DESIGN

More Related