1 / 28

Overview of Machine Learning for NLP Tasks: part II

Overview of Machine Learning for NLP Tasks: part II. Named Entity Tagging: A Phrase-Level NLP Task. Outline. Identify a (hard) problem Frame the problem ‘appropriately’ (...so that we can apply our tools, find appropriate labeled data) Preprocess data Apply FEX and SNoW

lel
Download Presentation

Overview of Machine Learning for NLP Tasks: part II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Machine Learning for NLP Tasks: part II Named Entity Tagging: A Phrase-Level NLP Task

  2. Outline • Identify a (hard) problem • Frame the problem ‘appropriately’ • (...so that we can apply our tools, find appropriate labeled data) • Preprocess data • Apply FEX and SNoW • Process output from FEX, SNoW to annotate new text • FEX and SNoW server modes

  3. Named Entity Tagging • Identify e.g. people, locations, organizations After receiving his [MISC M.B.A.] from [ORG Harvard Business School], [PER Richard F. America] accepted a faculty position at the [ORG McDonough School of Business] ([ORG Georgetown University]) in [LOC Washington].

  4. Framing NE-tagging Problem • Not an easy problem: • We won’t seek stellar results – • Just want to show that tools work, and how to apply them • Where to begin? • Need labeled data • Data must work with FEX

  5. Ways to Approach NE-tagging • BIO/Open-Close Chunking: • Word-level classification + inference • BIO/Open-Close chunking found depends on labels you train with (e.g. NE labels) • Impose common-sense constraints on open/close labels • Optimize based on classifier confidence • V. Punyakanok and D. Roth, “The Use of Classifiers in Sequential Inference” NIPS-13, Dec, 2000 • Use chunker to find phrase boundaries: • phrase-level predicate – learn labels for phrases • can use FEX’s phrase mode

  6. Framing NE-tagging Problem • We have some labeled Named Entity data • We can identify Noun-phrases with our chunker... • See the Demos page for an example • ...and FEX has a phrase mode... • ...So we can frame this as a (noun) phrase classification problem (assume all NEs are NPs) • avoids working with invalid phrases • avoids inference (as opposed to open-close classifiers)

  7. Raw Text Formatted Text Feature Extraction Preprocessing Machine Learner Function Parameters Labels Labels Inference Classifier(s) Review: Machine Learning System Feature Vectors Training Examples Testing Examples

  8. Solution Sketch • Use labeled data to develop core classifier • Adapt our labeled data to our model of the problem • Experiment with FEX and SNoW to get good performance using our labeled data • Use the FEX and SNoW resources we develop as the core of our NE Tagger • Write tools to preprocess raw text into appropriate form for input to FEX, SNoW • Write tools to convert SNoW output to labels for preprocessed data • Convert labeled preprocessed data into desired output format • For the training/evaluation data, we’ve done the pre- and post-processing for you…

  9. CONLL03 data • Have some column-format data... any problems? O 0 0 B-NP PRP He x TXT/1 0 O 0 1 B-VP VBD said x TXT/1 0 O 0 2 I-NP DT a x TXT/1 0 O 0 3 I-NP NN proposal x TXT/1 0 O 0 4 B-NP JJ last x TXT/1 0 O 0 5 I-NP NN month x TXT/1 0 O 0 6 B-PP IN by x TXT/1 0 B-ORG 0 7 B-NP NNP EU x TXT/1 0 O 0 8 I-NP NNP Farm x TXT/1 0 O 0 9 I-NP NNP Commissioner x TXT/1 B-PER 0 10 I-NP NNP Franz x TXT/1 0 I-PER 0 11 I-NP NNP Fischler x TXT/1 0

  10. Design Decisions • NE phrases are a subset of NPs • We can find NPs, so label only NPs • Given chunking, can use FEX phrase mode • CONLL03 data: NPs not labeled as NEs • NE phrases could be embedded • How to resolve embeddings? • Avoid embedding – ‘enlarge’ NE phrases • Data has been preprocessed to reflect our needs

  11. Setting up... • Download NE data from CogComp tools page • ne_tut_processed.tar.gz • Download sample FEX script • link: ‘sample NE FEX script’ • file: NE-simple.scr

  12. Review: What FEX is doing... • Think of FEX as generating a list of boolean variables, X1, X2, … , Xn • Lexicon maps boolean variable Xi to a propositional logic term • e.g. “1204 w[rejects*]” could be written X1024 == BEFORE(X, TARG) where X == “rejects”, TARG є {too, to, two} • In FEX output: • If boolean variable is present, it is active • If boolean variable is not present, it is inactive

  13. FEX advanced modes: Phrase Mode • Why do we need extensions? • The original design of FEX is “word-based” • Each element is a word, and so is the target • Phrase detection/classification problem: The target is a phrase. • E.g. Named Entity tagging, Shallow Parse tagging • Document classification problem:The target is the whole document. • Relations: Target is at some intermediate level of representation. • FEX also has an Entity-Relation mode…

  14. Basic Structure • Two types of elements: phrases & words • FEX’s window semantics are different for phrase mode • Column format input only Phrase W1 W2 W7 W8 W3 W4 W5 W6

  15. Changes to Fex for Phrase Mode • Only accepts COLUMN format input • 1st column is used to store (phrase) labels. • 2nd column is used to store named entity tags. • Both use BIO format. • Columns 2-6 have fixed meanings: • 2 NE; • 3 Index; • 4 Phrase boundary; • 5 POS; • 6 Word

  16. Sample Column Format Data O 0 0 I-NP PRP He x TXT/1 0 O 0 1 I-VP VBD said x TXT/1 0 O 0 2 I-NP DT a x TXT/1 0 O 0 3 I-NP NN proposal x TXT/1 0 O 0 4 B-NP JJ last x TXT/1 0 O 0 5 I-NP NN month x TXT/1 0 O 0 6 I-PP IN by x TXT/1 0 B-ORG 0 7 I-NP NNP EU x TXT/1 0 O 0 8 I-NP NNP Farm x TXT/1 0 O 0 9 I-NP NNP Commissioner x TXT/1 B-PER 0 10 I-NP NNP Franz x TXT/1 0 I-PER 0 11 I-NP NNP Fischler x TXT/1 0

  17. Phrase Mode Option • FEX command line option –P <length> • -P takes an integer as its argument, which stands for the maximum length of the candidate phrases. • For example, “fex -P 4” will generate examples for every phase of length 1, 2 ,3 and 4 from the corpus file. • If the length is equal to 0, then only positive examples will be generated. > fex –P 0 ne.scr ne.lex ne.corp ne.out

  18. Window Range in Phrase Mode • The meaning of the offsets in the window is different in Phrase mode: w1 w2 w3 W4 W5 W6 w7 w8 w9 -3 -2 -1 0 0 0 1 2 3 “-1: w[0,0]” returns w[W4], w[W5], w[W6]. “-1 loc: w[0,0]” returns w[*W4]*, w[*_W5]*, w[*__W6]*. (NOTE: * after [] indicates ‘within phrase’) “-1 loc: w[-2,-1]” returns w[w2_*], w[w3*]. “-1 loc: w[1, 2]” returns w[*w7], w[*_w8].

  19. Phrase Type Sensors • How to specify patterns within phrase? • Several phrase type sensors can be used. • “-1 phLen[0,0]” returns 3 for the above corpus file, since "W4 W5 W6" contains 3 words. • phNoSmall is active if all words in the target phrase are either capitalized (initial), symbols, or numbers. • phAllWord is active if all the elements in the target phrase are words (a-z,A-Z) • Many other custom sensors – check the FEX source code (Sensor.h)

  20. RGF operator conjunct w1 w2 w3 W4 W5 W6 w7 w8 w9 -3 -2 -1 0 0 0 1 2 3 • “conjunct(-1:w[-2,-1]; -1:phLen[0,0]; -1:w[1,2])” generates • w[w2]--phLen[3]--w[7], w[w2]--phLen[3]--w[8] • w[w3]--phLen[3]--w[7], w[w3]--phLen[3]--w[8]

  21. Choose FEX, SNoW parameters • Use FEX phrase mode: % ./fex –P 0 ne.scr ne.lex data.in ne-snow.ex • Train SNoW with the resulting examples: % ./snow –train –I ne-snow.ex –F ne.net –W:0-5 • Test SNoW with examples from test data: % ./snow –test –I ne-snow2.ex –F ne.net –o allpredictions –R ne.res

  22. Improving Classifier Performance • Tune fex script: experiment with different sensors • InitialCapitalized, NotInitialCapitalized, AllCapitalized • Tune SNoW using Test data • analyze.pl – a tool to help with tuning • Gives accuracy for each label • Requires SNoW’s ‘-o allpredictions’ mode % ./analyze.pl snow.res

  23. We now have a classifier… • Need a way to apply it to new text… • No formatting or Gold Standard labeling • Need to enrich with POS, SP • Need to track SNoW output and use it to label the data • Sample tools: • link: ‘NE tagging: tools for new data’ • file: tut_ne_postprocess.tar.gz

  24. Classifying New Data First, let’s enrich our input: • POS-tagging – POS tagger • Chunking – Shallow Parser • NOTE: SP output format is not FEX-compatible • Convert to Column format • Tool available from ccg tools page % ./chunk-to-column.pl inputFile > outputFile • Run data through FEX and SNOW servers • One file at a time • Doesn’t reload lexicon/network each time • Can pipe test data through both together

  25. Making life easier... • Starting SNoW server: % ./snow –server <port> -F network.net & • Starting FEX server: % ./fex –s <port> -P 0 <script> <lexicon> & • Need client scripts to interact with the servers • See Snow_v3.1/tutorial/example-client.pl for SNoW • See fex/fexClient.pl for FEX • Clean up after use… • ‘ps’ • kill server processes

  26. Post-processing • SNoW ‘-o winners’ mode %./snow –test –I ... –F ... –R text.winners.res –o winners • Adding results to original data • SNoW output mode must be ‘winners’ % ./numbers-to-labels.pl text.winners.res ne.lex > text.lab % ./apply-labels.pl text.col text.lab > text.col.lab • In my solution, seeming disparity between performance on held-out data and on the completely unseen text • WHY? • What is the best way to improve the performance? (i.e., what is likely to give the best return per unit time invested?)

  27. Summary: SNoW and FEX • SNoW is supervised learning system • Needs labeled data • Performance constrained by the quality of the features it is given • Works with numerical features – needs preprocessing stage to extract those features • Fast, and good performance • FEX provides a framework for feature engineering • Designed to represent examples in SNoW input format • Does *not* generate features automatically – not a replacement for human expert! • Requires certain input formats • Fairly modular – write new sensors to capture new feature types • Terse, expressive feature descriptors

  28. Summary: solving NLP problems • Need to frame problem appropriately (e.g. NE as noun phrase tagging) • Need appropriate labeled data • If you want an application, will have to write pre- and post-processing • SNoW and FEX work close to the mathematical models underlying machine learning • User has good control over ML algorithms • Be prepared to spend some time on error analysis and feature engineering!

More Related