1 / 102

Information Extraction

Information Extraction. Shallow Processing Techniques for NLP Ling570 December 5, 2011. Roadmap. Information extraction: Definition & Motivation General approach Relation extraction techniques: Fully supervised Lightly supervised. Information Extraction. Motivation:

zarita
Download Presentation

Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011

  2. Roadmap • Information extraction: • Definition & Motivation • General approach • Relation extraction techniques: • Fully supervised • Lightly supervised

  3. Information Extraction • Motivation: • Unstructured data hard to manage, assess, analyze

  4. Information Extraction • Motivation: • Unstructured data hard to manage, assess, analyze • Many tools for manipulating structured data: • Databases: efficient search, agglomeration • Statistical analysis packages • Decision support systems

  5. Information Extraction • Motivation: • Unstructured data hard to manage, assess, analyze • Many tools for manipulating structured data: • Databases: efficient search, agglomeration • Statistical analysis packages • Decision support systems • Goal: • Given unstructured information in texts • Convert to structured data • E.g., populate DBs

  6. IE Example Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip to some cities also served by lower-cost carriers. American Airlines, a unit of AMR Corp. immediately matched the move, spokesman Time Wagner said. United, a unit of UAL Corp., said the increase took effect Thursday an applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Denver to San Francisco. • Lead airline: • Amount: • Effective Date: • Follower:

  7. IE Example Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip to some cities also served by lower-cost carriers. American Airlines, a unit of AMR Corp. immediately matched the move, spokesman Time Wagner said. United, a unit of UAL Corp., said the increase took effect Thursday an applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Denver to San Francisco. • Lead airline: United Airlines • Amount: $6 • Effective Date: 2006-10-26 • Follower: American Airlines

  8. Other Applications • Shared tasks: • Message Understanding Conferences (MUC)

  9. Other Applications • Shared tasks: • Message Understanding Conferences (MUC) • Joint ventures/mergers:

  10. Other Applications • Shared tasks: • Message Understanding Conferences (MUC) • Joint ventures/mergers: • CompanyA, CompanyB, CompanyC, amount, type • Terrorist incidents

  11. Other Applications • Shared tasks: • Message Understanding Conferences (MUC) • Joint ventures/mergers: • CompanyA, CompanyB, CompanyC, amount, type • Terrorist incidents • Incident type, Actor, Victim, Date, Location • General:

  12. Other Applications • Shared tasks: • Message Understanding Conferences (MUC) • Joint ventures/mergers: • CompanyA, CompanyB, CompanyC, amount, type • Terrorist incidents • Incident type, Actor, Victim, Date, Location • General: • Pricegrabber • Stock market analysis • Apartment finder

  13. Information ExtractionComponents • Incorporates range of techniques, tools

  14. Information ExtractionComponents • Incorporates range of techniques, tools • FSTs (pattern extraction) • Supervised learning: • Classification • Sequence labeling

  15. Information ExtractionComponents • Incorporates range of techniques, tools • FSTs (pattern extraction) • Supervised learning: • Classification • Sequence labeling • Semi-, un-supervised learning: clustering, bootstrapping

  16. Information ExtractionComponents • Incorporates range of techniques, tools • FSTs (pattern extraction) • Supervised learning: • Classification • Sequence labeling • Semi-, un-supervised learning: clustering, bootstrapping • Part-of-Speech tagging • Named Entity Recognition • Chunking • Parsing

  17. Information Extraction Process • Typically pipeline of major components

  18. Information Extraction Process • Typically pipeline of major components • First step: Named Entity Recognition (NER) • Often application-specific • Generic type: Person, Organiztion, Location • Specific: genes to college courses

  19. Information Extraction Process • Typically pipeline of major components • First step: Named Entity Recognition (NER) • Often application-specific • Generic type: Person, Organiztion, Location • Specific: genes to college courses • Identify NE mentions: • specific instances

  20. Example: Pre-NER • Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip to some cities also served by lower-cost carriers. American Airlines, a unit of AMR Corp. immediately matched the move, spokesman Time Wagner said. United, a unit of UAL Corp., said the increase took effect Thursday an applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Denver to San Francisco.

  21. Example: post-NER • Citing high fuel prices, [ORGUnited Airlines] said [TimeFriday] it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].

  22. Information Extraction:Reference Resolution • Builds on NER • Clusters entity mentions referring to same things • Creates entity equivalence classes

  23. Information Extraction:Reference Resolution • Builds on NER • Clusters entity mentions referring to same things • Creates entity equivalence classes • Includes anaphora resolution • E.g. pronoun resolution (it, he, she…; one; this, that)

  24. Information Extraction:Reference Resolution • Builds on NER • Clusters entity mentions referring to same things • Creates entity equivalence classes • Includes anaphora resolution • E.g. pronoun resolution (it, he, she…; one; this, that) • Also non-anaphoric, general mentions

  25. Example: post-NER • Citing high fuel prices, [ORGUnited Airlines] said [TimeFriday] it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].

  26. Example: post-NER • Citing high fuel prices, [ORGUnited Airlines] said [TimeFriday]it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].

  27. Relation Detection & Classification • Relation detection & classification: • Find and classify relations among entities in text

  28. Relation Detection & Classification • Relation detection & classification: • Find and classify relations among entities in text • Mix of application-specific and generic relations • Generic relations:

  29. Relation Detection & Classification • Relation detection & classification: • Find and classify relations among entities in text • Mix of application-specific and generic relations • Generic relations: • Family, employment, part-whole, membership, geospatial

  30. Example: post-NER • Citing high fuel prices, [ORGUnited Airlines] said [TimeFriday]it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].

  31. Relation Detection & Classification • Relation detection & classification: • Find and classify relations among entities in text • Mix of application-specific and generic relations • Generic relations: • Family, employment, part-whole, membership, geospatial • Generic relations: • United is part-of UAL Corp. • American Airlines is part-of AMR • Tim Wagner is an employee of American Airlines

  32. Relation Detection & Classification • Relation detection & classification: • Find and classify relations among entities in text • Mix of application-specific and generic relations • Generic relations: • Family, employment, part-whole, membership, geospatial • Generic relations: • United is part-of UAL Corp. • American Airlines is part-of AMR • Tim Wagner is an employee of American Airlines • Domain-specific relations: • United serves Chicago; United serves Denver, etc

  33. Example: post-NER • Citing high fuel prices, [ORGUnited Airlines] said [TimeFriday]it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].

  34. Event Detection &Classification • Event detection and classification: • Find key events

  35. Event Detection &Classification • Event detection and classification: • Find key events • Fare increase by United • Matching fare increase by American • Reporting of fare increases • How cued?

  36. Event Detection &Classification • Event detection and classification: • Find key events • Fare increase by United • Matching fare increase by American • Reporting of fare increases • How cued? • Instances of “said” and ”cite”

  37. Temporal Expressions:Recognition and Analysis • Temporal expression recognition • Finds temporal events in text

  38. Example: post-NER • Citing high fuel prices, [ORGUnited Airlines] said [TimeFriday]it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].

  39. Temporal Expressions:Recognition and Analysis • Temporal expression recognition • Finds temporal events in text • E.g. Friday, Thursday in example text

  40. Temporal Expressions:Recognition and Analysis • Temporal expression recognition • Finds temporal events in text • E.g. Friday, Thursday in example text • Others: • Date expressions: • Absolute: days of week, holidays • Relative: two days from now, next year • Clock times: • noon, 3:30pm

  41. Temporal Expressions:Recognition and Analysis • Temporal expression recognition • Finds temporal events in text • E.g. Friday, Thursday in example text • Others: • Date expressions: • Absolute: days of week, holidays • Relative: two days from now, next year • Clock times: • noon, 3:30pm • Temporal analysis: • Map expressions to points/spans on timeline • Anchor: e.g. Friday, Thursday: tied to example dateline

  42. Template Filling • Texts describe stereotypical situations in domain • Identify texts evoking template situations • Fill slots in templates based on text

  43. Template Filling • Texts describe stereotypical situations in domain • Identify texts evoking template situations • Fill slots in templates based on text • In example: fare-raise is stereo-typical event

  44. Information Extraction Steps • Named Entity Recognition • Reference Resolution • Relation Detection & Classification • Event Detection & Classification • Temporal Expression Recognition & Analysis • Template-filling

  45. Relation Detection & Classification

  46. Common Relations

  47. Relations and Entitiesfrom Example

  48. Supervised Approaches to Relation Analysis • Two stage process:

  49. Supervised Approaches to Relation Analysis • Two stage process: • Relation detection: • Detect whether a relation holds between two entities

  50. Supervised Approaches to Relation Analysis • Two stage process: • Relation detection: • Detect whether a relation holds between two entities • Positive examples:

More Related