Natural Language Processing

Natural Language Processing Lecture Jim Martin

Today • Briefly review sequence labeling • HMMs vs. MEMMs • Information extraction (Ch 22) Speech and Language Processing - Jurafsky and Martin

Statistical Sequence Labeling • With POS taggingwe trained systems to tag using annotated training data and HMMs. • Training data • Hand tag a bunch of data with POS tags • We also saw that you can do the same thing for partial parsing (chunking) using a clever scheme for encoding the chunks as tags Speech and Language Processing - Jurafsky and Martin

Encoding • With IOB encoding you can turn a labeled bracketing task into a tagging task. And then proceed exactly as we did with POS Tagging. • We’ll use what’s called IOB labeling to do this • I -> Inside • O -> Outside • B -> Begin Speech and Language Processing - Jurafsky and Martin

IOB encoding • This example shows the encoding for just base-NPs. There are 3 tags in this scheme. • This example shows full coverage. In this scheme there are 2*N+1 tags. Where N is the number of constituents in your set. Speech and Language Processing - Jurafsky and Martin

Methods • Argmax P(Tags|Words) • HMMs • Discriminative Sequence Classification • Using any kind of standard ML-based classifier. Speech and Language Processing - Jurafsky and Martin

HMM Tagging • Same as with POS tagging • Argmax P(T|W) = P(W|T)P(T) • The tags are the hidden states • Works ok, but has one major shortcoming • The typical kinds of things that we might think would be useful in this task aren’t easily squeezed into the HMM model • We’d like to be able to make arbitrary features available for the statistical inference being made. Speech and Language Processing - Jurafsky and Martin

Supervised Classification • Training a system to take an object represented as a set of features and apply a label to that object. • Methods typically include • Naïve Bayes • Decision Trees • Maximum Entropy (logistic regression) • Support Vector Machines • … Speech and Language Processing - Jurafsky and Martin

Sequence Classification • Applying this to tagging… • The object to be tagged is a word in the sequence • The features are • features of the word, • features of its immediate neighbors, • and features derived from the entire sentence. • Sequential tagging means sweeping the classifier across the input assigning tags to words as you proceed. Speech and Language Processing - Jurafsky and Martin

Typical Features • Typical setup involves • A small sliding window around the object being tagged • Features extracted from the window • Current word token • Previous/next N word tokens • Current word POS • Previous/next POS • Previous N chunk labels • Capitalization information • ... Speech and Language Processing - Jurafsky and Martin

Statistical Sequence Labeling Speech and Language Processing - Jurafsky and Martin

Notes... • Viewing complex string tasks as tagging tasks can open up a lot of possibilities • Finite-state methods • Statistical learning methods • The combination of Markov-style sequence processing with standard ML techniques is very productive • We’ll see this combo again and again and again Speech and Language Processing - Jurafsky and Martin

Evaluation • Suppose you employ this scheme. What’s the best way to measure performance. • Probably not the per-tag accuracy we used for POS tagging. • Why? • It’s not measuring what we care about • We need a metric that looks at the chunks not the tags Speech and Language Processing - Jurafsky and Martin

Example • Suppose we were looking for PP chunks for some reason. • If the system simply said O all the time it would do pretty well on a per-label basis since most words reside outside any PP. Speech and Language Processing - Jurafsky and Martin

Precision/Recall/F • Precision: • The fraction of chunks the system returned that were right • “Right” means the boundaries and the label are correct given some labeled test set. • Recall: • The fraction of the chunks that system got from those that it should have gotten. • F1: Harmonic mean of those two numbers. Speech and Language Processing - Jurafsky and Martin

F1 F1 balances the performance of precision and recall. Speech and Language Processing - Jurafsky and Martin

Information Extraction • Turns out that these partial/parsing and chunking methods for syntax, give us the means to address shallow semantic problems as well • Figure out the entities (the players, props, instruments, locations, etc.) in a text • Figure out how they’re related • Figure out what kind of events/activities they’re all up to • And do each of those tasks in a loosely-coupled data-driven manner Speech and Language Processing - Jurafsky and Martin

Information Extraction • Ordinary newswire text is often used in typical examples. • And there are lots of useful applications out there • But the real interest/money is in specialized domains • Bioinformatics • Electronic medical records • Stock market analysis • Intelligence analysis • Social media Speech and Language Processing - Jurafsky and Martin

Example Application Speech and Language Processing - Jurafsky and Martin

Services Calais (Thomson Reuters) Alchemy API (Denver based) Speech and Language Processing - Jurafsky and Martin

Information Extraction CHICAGO (AP) — Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip on flights to some cities also served by lower-cost carriers. American Airlines, a unit AMR, immediately matched the move, spokesman Tim Wagner said. United, a unit of UAL, said the increase took effect Thursday night and applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Atlanta and Denver to San Francisco, Los Angeles and New York Speech and Language Processing - Jurafsky and Martin

Information Extraction CHICAGO (AP) — Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip on flights to some cities also served by lower-cost carriers. American Airlines, a unit AMR, immediately matched the move, spokesman Tim Wagner said. United, a unit of UAL, said the increase took effect Thursday night and applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Atlanta and Denver to San Francisco, Los Angeles and New York. Speech and Language Processing - Jurafsky and Martin

Named Entity Recognition • Find and classify all the named entities in a text. • What’s a named entity? • A reference to an entity via the mention of its name. • Colorado Rockies • This is a subset of the possible mentions... • Rockies, the team, it, they... • Find means identify the exact span of the mention. • Classify means determine the category of the entity being referred to. Speech and Language Processing - Jurafsky and Martin

NER Approaches • As with chunking, there are two basic approaches (and hybrids) • Rule-based (regular expressions) • Lists of names • Patterns to match things that look like names • Patterns to match the environments that classes of names tend to occur in. • ML-based approaches • Get annotated training data • Extract features • Train systems to replicate the annotation Speech and Language Processing - Jurafsky and Martin

ML Approach Speech and Language Processing - Jurafsky and Martin

Encoding for Sequence Labeling • We can use the same IOB encoding here that we used for chunking: • For N classes we have 2*N+1 tags • An I and B for each class and a O for outside any class. • Each token in a text gets a tag. Speech and Language Processing - Jurafsky and Martin

NER Features Speech and Language Processing - Jurafsky and Martin

NER Features • The usefulness of different features varies by domain and by language • But features should be superficial and easily extracted from the text to be analyzed • Can’t solve a problem by using a feature that’s harder to extract than the actual problem! • The “shape” feature turns out to be amazingly useful in many domains Speech and Language Processing - Jurafsky and Martin

NER as Sequence Labeling Speech and Language Processing - Jurafsky and Martin

NER Evaluation • As with chunking, it is a bad idea to evaluation sequence labelers at the tag level. • Most labels are O; so just guessing O gives a learning algorithm a lot of credit. • So we need to evaluate precision, recall and F at the entity level. • But we may not care equally about all kinds of entities • So we might weight them differently in our evaluation. Speech and Language Processing - Jurafsky and Martin

Information Extraction CHICAGO (AP) — Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip on flights to some cities also served by lower-cost carriers. American Airlines, a unit AMR, immediately matched the move, spokesman Tim Wagner said. United, a unit of UAL, said the increase took effect Thursday night and applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Atlanta and Denver to San Francisco, Los Angeles and New York. Speech and Language Processing - Jurafsky and Martin

NER and Entities • Traditionally, NER only refers to entities that are referred to with an explicit mention of a name. • “Jane Smith” vs. “she” • “Twitter” vs. “it” or “they” • “Tesla Model S” vs. “the car” • General entity reference and tracking is a bigger problem. Speech and Language Processing - Jurafsky and Martin

Relations • Once you have captured the entities in a text, you might want to ascertain how they relate to one another. • Here we’re just talking about explicitly stated relations Speech and Language Processing - Jurafsky and Martin

Relation Types • As with named entities, the list of relations is application specific. For generic news texts... Speech and Language Processing - Jurafsky and Martin

Relations • By relation we really mean sets of tuples. • Think about populating a database. Speech and Language Processing - Jurafsky and Martin

Relation Analysis • We can divide relation analysis into two parts • Determining if 2 entities are related • And if they are, classifying the relation • There are 2 reasons to do this • Cutting down on training time for classification by eliminating most pairs • Producing separate feature-sets that are appropriate for each task. Speech and Language Processing - Jurafsky and Martin

Relation Analysis • Let’s just worry about named entities within the same sentence Speech and Language Processing - Jurafsky and Martin

Features • We can group the features (for both tasks) into three categories • Features of the named entities involved • Features derived from the words between and around the named entities • Features derived from the syntactic environment that governs the two entities Speech and Language Processing - Jurafsky and Martin

Features • Features of the entities • Their types • Concatenation of the types • Headwords of the entities • George Washington Bridge • Words in the entities • Features between and around • Particular positions to the left and right of the entities • +/- 1, 2, 3 • Bag of words between Speech and Language Processing - Jurafsky and Martin

Features • Syntactic environment • Constituent path through the tree from one to the other • Base syntactic chunk sequence from one to the other • Dependency path Speech and Language Processing - Jurafsky and Martin

Example • For the following example, we’re interested in the possible relation between American Airlines and Tim Wagner. • American Airlines, a unit AMR, immediately matched the move, spokesman Tim Wagner said. Speech and Language Processing - Jurafsky and Martin

Case Study:Bioinformatic NLP • An example domain • Very important: basic science, clinical practice, billing, etc. • Practitioners care about the technology • They have problems they’re trying to solve • Lots and lots of text available • Lots of interesting problems Speech and Language Processing - Jurafsky and Martin

Lots and Lots of Text Speech and Language Processing - Jurafsky and Martin

Lots and Lots of Text Electronic health records are the next frontier for this kind of research Mandated for health care providers Speech and Language Processing - Jurafsky and Martin

Natural Language Processing