1 / 15

A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group Unive

A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK. The problem. IE systems are time consuming to build using knowledge engineering approaches

amber
Download Presentation

A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group Unive

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Semantic Approach to IE Pattern InductionMark Stevenson and Mark GreenwoodNatural Language Processing GroupUniversity of Sheffield, UK

  2. The problem • IE systems are time consuming to build using knowledge engineering approaches • U. Mass report their system took 1,500 person-hours of expert labour to port to MUC-3 • Learning methods can help reduce the burden • Supervised methods often require large amounts of training data • Weakly supervised approaches require less annotated data than supervised ones and less expert knowledge than knowledge engineering approaches

  3. Learning Extraction Patterns Patterns Iterative Learning Algorithm • Begin with set of seed patterns which are known to be good extraction patterns • Compare every other pattern with the ones known to be good • Choose the highest scoring of these and add them to the set of good patterns • Stop if enough patterns have been learned, else goto 2. Candidates Seeds Rank

  4. Semantic Approach • Assumption: • Relevant patterns are ones with similar meanings to those already identified as useful • Example: “The chairman resigned” “The chairman stood down” “The chairman quit” “Mr. Smith quit the job of chairman”

  5. Pattern Similarity • Semantic patterns are SVO-tuples extracted from each clause in the sentence: chairman+resign • Tuple fillers can be lexical items or semantic classes (eg. COMPANY, PERSON) • Patterns can be represented as vectors encoding the slot role and filler: chairman_subject, resign_verb • Similarity between two patterns defined as follows:

  6. Matrix Population • Matrix W is populated using semantic similarity metric based on WordNet • Wij = 0 for different roles or sim(wi, wj) using Jiang and Conrath’s (1997) similarity measure Example matrix for patterns ceo+resigned and ceo+quit • Semantic classes are manually mapped onto an appropriate WordNet synset

  7. Advantage • Adapted cosine metric allows synonymy and near-synonymy to be taken into account resign_verb ceo+resigned sim(ceo+resigned, ceo+quit) = 0.95 ceo_subject ceo+quit quit_verb

  8. Experimental Setup • Text pre-processed using GATE to tokenise, split into sentences and identify semantic classes • Parsed using MINIPAR (adapted to deal with semantic classes marked in input) • SVO tuples extracted from dependency tree Seed Patterns COMPANY+appoint+PERSON COMPANY+elect+PERSON COMPANY+promote+PERSON COMPANY+name+PERSON PERSON+resign PERSON+quit PERSON+depart

  9. Example Learned Patterns COMPANY+hire+PERSON PERSON+hire+PERSON PERSON+succeed+PERSON PERSON+appoint+PERSON PERSON+name+POST PERSON+join+COMPANY PERSON+own+COMPANY COMPANY+aquire+COMPANY

  10. Evaluation • Comparison with existing approach • “Document centric” method described by Yangarber, Grishman, Tapanainen and Huttunen (2000) • Based on alternative assumption that useful patterns will occur in similar documents to those which have already been identified as relevant • Two evaluation regimes using MUC-6 “management succession” task • Document filtering • Sentence filtering

  11. Document Filtering Evaluation • MUC-6 corpus (590 documents) • Task involves identifying documents which contain management succession events • Similar to MUC-6 document filtering task • Document centric approach benefited from a supplementary corpus: 6,000 newswire stories from the Reuters corpus (3,000 with code “C411” = management succession events)

  12. Document Filtering Results

  13. Sentence Filtering Evaluation • Version of MUC-6 corpus in which sentences containing events were marked – Soderland (1999) • Evaluate how accurately generated pattern set can distinguish between “relevant” (event describing) and non-relevant sentences

  14. Sentence filtering results

  15. Conclusion • Introduce a new approach to pattern acquisition for Information Extraction • Approach relies on semantic similarity of patterns using information from WordNet • Extracts useful patterns using a small set of seed patterns and unannotated text • Superior to existing approach on fine-grained evaluation • Document filtering may not be suitable evaluation regime for this task

More Related