250 likes | 537 Views
Learn about NLTK, a suite of classes for NLP tasks such as parsing, POS tagging, and more. Discover text processing tools, corpora like Brown and Penn Treebank, and how NLTK helps with tasks like part of speech tagging, parsing, named entity recognition, and sentiment analysis. Explore resources like WordNet, linguistic tasks, and how NLTK can aid in tasks like machine translation, summarization, and information extraction. Dive into learning how to identify parts of speech in a sentence using NLTK's capabilities.
E N D
Introduction to NLTK ELN – Natural Language Processing Giuseppe Attardi
Installing NLTK • Download and Install • http://nltk.org/install.html • Download NLTK data >>> import nltk >>> nltk.download()
NLTK • Suite of classes for several NLP tasks • Parsing, POS tagging, classifiers… • Several text processing utilities, corpora • Brown, Penn Treebank corpus… • Your data was divided into sentences using ‘punkt’
NLTK • Text material • Raw text • Annotated Text • Tools • Part of speech taggers • Semantic analysis • Resources • WordNet, Treebanks
Linguistic Tasks • Part of Speech Tagging • Parsing • Word Net • Named Entity Recognition • Information Retrieval • Sentiment Analysis • Document Clustering • Topic Segmentation • Authoring • Machine Translation • Summarization • Information Extraction • Spoken Dialog Systems • Natural Language Generation • Word Sense Disambiguation
Part of Speech Tagging • Task: Given a string of words, identify the parts of speech for each word. A man walks into a bar. Det Noun Verb Prep Det Noun
POS Tag Usage • Surface level syntax. • Primary operation • Parsing • Word Sense Disambiguation • Semantic Role labeling • Segmentation • Discourse, Topic, Sentence
How to do it? • Learn from Data. • Annotated Data: A man walks into a bar. Det Noun Verb Prep Det Noun • Unlabeled Data: A man walks home. The pitcher issued four walks.
‘import nltk’ • You will need to import the necessary modules to create objects and call member functions • import ~ include objects from pre-built packages • FreqDist, ConditionalFreqDist are in nltk.probability • PlaintextCorpusReader is in nltk.corpus
Exercise 1. • Run examples from Chapter 1 of NLTK book: • http://nltk.googlecode.com/svn/trunk/doc/book/ch01.html
Exercise 2. • Run examples from Chapter 3 of NLTK book • http://nltk.googlecode.com/svn/trunk/doc/book/ch03.html