160 likes | 257 Views
Formal Language Theory. Homework. Read documentation on Graphviz http://graphviz.org/ http://www.graphviz.org/pdf/dotguide.pdf Use graphviz to generate figures like these (more or less):. Back to Regular Expressions. 10 . A more interesting example. import re
E N D
Homework • Read documentation on Graphviz • http://graphviz.org/ • http://www.graphviz.org/pdf/dotguide.pdf • Use graphviz to generatefigures like these (more or less):
Back to Regular Expressions • 10. A more interesting example import re myString="I have red shoes and blue pants and a green shirt. My phone number is 8005551234 and my friend's phone number is (800)-565-7568 and my cell number is 1-800-123-4567. You could also call me at 18005551234 if you'd like.” phoneNumbersRegEx=re.compile(''1?-?\(?\d{3}\)?-?\d{3}-?\d{4}'') print phoneNumbersRegEx.findall(myString) Answer is here, but let’s derive it together
Formal Definition of Regular Expressions • <expr> character • <expr> ( <expr> ) • Concatenation: <expr> <expr> <expr> • Union: <expr> <expr> + <expr> • Kleene Star: <expr> ( <expr> ) * • Characters: • lower case: a-z • upper case: A-Z • digits: 0-9 • special cases: \t \n • octal codes: \000 • any single character: .
An Equivalence Relation (=R) • A Partition of S ≡ Set of Subsets of S • Mutually Exclusive & Exhaustive • Equivalence Classes ≡ A Partition such that • All the elements in a class are equivalent (with respect to =R) • No element from one class is equivalent to an element from another • Example: Partition integers into evens & odds • Even integers: 2,4,6… • Odd integers: 1,3,5… • x =Ry x has the same parity as y • Three Properties • Reflexive: a =Ra • Symmetric: a =Rbb =Ra • Transitive: a =Rb & b =Rca =Rc
Word Net (Ch2):An Equivalence Relation >>> for s in wn.synsets('car'): print s.lemma_names ['car', 'auto', 'automobile', 'machine', 'motorcar'] ['car', 'railcar', 'railway_car', 'railroad_car'] ['car', 'gondola'] ['car', 'elevator_car'] ['cable_car', 'car'] >>> for s in wn.synsets('car'): print flatten(s.lemma_names) + ': ' + s.definition car auto automobile machine motorcar: a motor vehicle with four wheels; usually propelled by an internal combustion engine car railcar railway_carrailroad_car: a wheeled vehicle adapted to the rails of railroad car gondola: the compartment that is suspended from an airship and that carries personnel and the cargo and the power plant car elevator_car: where passengers ride up and down cable_car car: a conveyance for passengers or freight on a cable railway
A Partial Order (≤R) • Powerset({x,y,z}) • Subsets ordered by inclusion • a≤Rb ab • Three properties • Reflexive: • a≤a • Antisymmetric: • a≤b &b≤aa=b • Transitivity: • a≤b & b≤ca≤c
Wordnet: A Partial Order >>> for h in wn.synsets('car')[0].hypernym_paths()[0]: print h.lemma_names ['entity'] ['physical_entity'] ['object', 'physical_object'] ['whole', 'unit'] ['artifact', 'artefact'] ['instrumentality', 'instrumentation'] ['container'] ['wheeled_vehicle'] ['self-propelled_vehicle'] ['motor_vehicle', 'automotive_vehicle'] ['car', 'auto', 'automobile', 'machine', 'motorcar']
Help s = wn.synsets('car')[0] >>> s.name 'car.n.01' >>> s.pos 'n' >>> s.lemmas [Lemma('car.n.01.car'), Lemma('car.n.01.auto'), Lemma('car.n.01.automobile'), Lemma('car.n.01.machine'), Lemma('car.n.01.motorcar')] >>> s.examples ['he needs a car to get to work'] >>> s.definition 'a motor vehicle with four wheels; usually propelled by an internal combustion engine' >>> s.hyponyms()[0:3] [Synset('stanley_steamer.n.01'), Synset('hardtop.n.01'), Synset('loaner.n.02')] >>> s.hypernyms() [Synset('motor_vehicle.n.01')]
The Chomsky Hierarchy • Type 0 > Type 1 > Type 2 > Type 3 • Recursively Enumerable > CS > CF > Regular • Examples • Type 3: Regular (Finite State): • Grep & Regular Expressions • Right-Branching: A a A • Left-Branching: B B b • Type 2: Context-Free (CF): • Center-Embedding: C … x C y • Parenthesis Grammars: <expr> ( <expr> ) • w wR • Type 1: Context-Sensitive (CS): w w • Type 0: Recursively Enumerable • Beyond Type 0: Halting Problem
Syntax & Semantics • Syntax: Symbol pushing / Parsing • Parsing: use context-free grammar to map string tree • Semantics: Meaning (making sense of trees) • Is synonymy an equivalence relation? • Dichotomy is important both for • Natural Languages (English, FIGS, CJK, etc.) • FIGS: French, Italian, German & Spanish • CJK: Chinese, Japanese & Korean • as well as Artificial Languages • Python, HTML, Javascript, SQL, C
Summary Chapter 1 Chapters 2-8 Chapter 3: URLs Chapter 2 Equivalence Relations: Parity Synonymy (?) Partial Orders: Wordnet Ontology Chapter 8: CF Parsing Chomsky Hierarchy CS > CF > Regular • NLTK (Natural Lang Toolkit) • Unix for Poets without Unix • Unix Python • Object-Oriented • Polymorphism: • “len” applies to lists, sets, etc. • Ditto for: +, help, print, etc. • Types & Tokens • “to be or not to be” • 6 types & 4 tokens • FreqDist: sort | uniq –c • Concordances