I256: Applied Natural Language Processing

I256: Applied Natural Language Processing Marti Hearst Sept 25, 2006

Shallow Parsing • Break text up into non-overlapping contiguous subsets of tokens. • Also called chunking, partial parsing, light parsing. • What is it useful for? • Entity recognition • people, locations, organizations • Studying linguistic patterns • gave NP • gave up NP in NP • gave NP NP • gave NP to NP • Can ignore complex structure when not relevant

A Relationship between Segmenting and Labeling • Tokenization segments the text • Tagging labels the text • Shallow parsing does both simultaneously.

Chunking vs. Full Syntactic Parsing “G.K. Chesterton, author of The Man who was Thursday”

Representations for Chunks • IOB tags • Inside, outside, and begin • Why do we need a begin tag?

Representations for Chunks • Trees • Chunk structure is a two-level tree that spans the entire text, containing both chunks and non-chunks

CONLL Collection • From the Conference on Natural Language Learning Competition from 2000 • Goal: create machine learning methods to improve on the chunking task

CONLL Collection • Data in IOB format from WSJ: • Word POS-tag IOB-tag • Training set: 8936 sentences • Test set: 2012 sentences • Tags from the Brill tagger • Penn Treebank Tags • Evaluation measure: F-score • 2*precision*recall / (recall+precision) • Baseline was: select the chunk tag that is most frequently associated with the POS tag, F =77.07 • Best score in the contest was F=94.13

nltk_lite and CONLL2000 Note that raw hides the IOB format

nltk_lite and CONLL2000

nltk_lite and CONLL2000 pp() stands for “pretty print” and applies to the Tree data structure.

nltk_lite chunks in treebank

nltk_lite parses in treebank

Chunking with Regular Expressions • This time we write regex’s over TAGS rather than words • <DT><JJ>?<NN> • <NN.*> • <JJ|NN>+ • Compile them with parse.ChunkRule() • rule = parse.ChunkRule(‘<DT|NN>+’) • chunkparser = parse.RegexpChunk([rule], chunk_node = ‘NP’) • Resulting object is of type Tree • Top-level node called S • Can change this label if you want, in third argument to RegexpChunk

Chunking with Regular Expressions

Chunking with Regular Expressions • Rule application is sensitive to order

Chinking • Specify what does not go into a chunk. • Kind of like specifying punctuation as being not alphanumeric and spaces. • Can be more difficult to think about.

Practice Regexp Chunking • Write rules to be able to produce this kind of chunking:

Next Time • Evaluating Shallow Parsing • Begin Text Summarization

I256: Applied Natural Language Processing

I256: Applied Natural Language Processing

Presentation Transcript

PROCESSING OF CERAMICS AND CERMETS

Lexical Semantics and Ontologies Tutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing

Natural Language Processing Applications

PSYCHOLOGY OF LANGUAGE

CS 388: Natural Language Processing: Part-Of-Speech Tagging, Sequence Labeling, and Hidden Markov Models (HMMs)

Language processing: introduction to compiler construction

LING / C SC 439/539 Statistical Natural Language Processing

Statistical Natural Language Parsing

Chapters to be lectured

Natural Language Processing (4)

Leica Geo Office GNSS Processing

Speech perception as a window into language processing:

Natural Language Processing COMPSCI 423/723

Natural Language Processing, Linguistics and Terminology

Membranes for Gas Conditioning

Assembly Language

Topic Tracking, Detection, and Summarization: Some IE Applications

Image Processing

Natural Language Processing

Kernel Methods in Natural Language Processing