290 likes | 480 Views
PART. A unification – based syntactic parser. Student: Alexandru Iliescu. What it’s ?. “parsing”.
E N D
PART A unification – based syntactic parser Student: Alexandru Iliescu
What it’s ? “parsing”
Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammer. The term parsing comes from Latin pars, meaning part (of speech).
The term has slightly different meanings in different branches of linguistics and computer science. Traditional sentence parsing is often performed as a pedagogical exercise, especially in inflected languages such as the Romance languages or Latin, sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the importance of grammatical divisions such as subject and predicate.
Parsing a computer language with two levels of grammar: lexicaland syntactic.
The first stage is the token generation, or lexical analysis, by which the input character stream is split into meaningful symbols defined by a grammar of regular expressions.
For example, a calculator program would look at an input such as "12*(3+4)^2" and split it into the tokens 12, *, (, 3, +, 4, ), ^, 2, each of which is a meaningful symbol in the context of an arithmetic expression.
The next stage is parsing or syntactic analysis, which is checking that the tokens form an allowable expression.
D-PART PC-PART
D-PART D-PART is a development environment for unification-based grammers on Xerox 1100 series work stations. The first version of D-PART, was written at the Scandinavian Summer Workshop for Computational Linguistics in Helsinki, Finland, in 1985.
D-PART This formalism is suitable for encoding a wide variety of grammers.
D-PART D-PART consists of four basic parts: • A unification package; • Interpreter for rules and lexical items; • Input/output routines for directed graphs; • An Earley style chart parser.
D-PART Parsing and Unification x restore x unify copy z’ z restore y y The method entails making only one copy, not two, when the operation succeds. In the event of failure, D-PART simply restores the original structures without copying anything.
D-PART Rules A rule in D-PART is a list of atomic constituent labels that may be followed by specifications.
D-PART Rules Example of a rule: S -> NP VP In D-PART notation is written as (S NP VP)
D-PART Rules Before a rule is used by the parser, D-PART compiles it to a feature set. A feature set can be displayed in different ways – for example, as a matrix or as a direct graph.
D-PART Lexical Rules A lexical rule is a special kind of template with two attributes: in and out.
D-PART Lexical Rules In applying a lexical rule to a graph, the latter is first unified with the value of in. If the operation succeds, the value of out is passed on as the result.
D-PART D-PART is not a commercial product. It is made available to users outside SRI who might wish to develop unification-based grammars.
PC-PART PC-PART is a implementation of PART-II computational linguistic formalism for personal computers, available for MS-DOS, Microsoft Windows, Macintosh and Unix, and is still under devlopment.
PC-PART PC –PART has the following parts: • Chart parser; • Unification package; • Interpreter for grammar and lexical rules;
PC-PART PC-PATR uses a left corner chart parser with these characteristics: • bottom-up parse with top-down filtering based on the categories; • left-to-right order-after each word is added to the chart.
PC-PART Unification Unification is the basic operation applied to feature structures in PC-PATR. It consists of the merging of the information from two feature structures. Two feature structures can unify if their common features have the same values, but do not unify if any feature values conflict.
PC-PART Grammar rules A PC-PATR grammar rule has these parts, in the order listed: the keyword Rule; an optional rule identifier enclosed in braces ({}); the nonterminal symbol to be expanded; an arrow (->) or equal sign (=); zero or more terminal or nonterminalsymbols; an optional colon (:); zero or more feature constraints; an optional period (.).
PC-PART Grammar rules The optional rule identifier consists of one or more words enclosed in braces.
PC-PART Grammar rules For example, this rule says that any category in the grammar rules can be replaced by two copies of the same category separated by a CJ. Rule X -> X_1 CJ X_2 <X cat> = <X_1 cat> <X cat> = <X_2 cat> <X arg1> = <X_1 arg1> <X arg1> = <X_2 arg1>
PC-PART Lexical rules A PC-PATR lexical rule has these parts, in the order listed: the keyword Define; the name of the lexical rule; the keyword as; the rule definition; an optional period (.).
PC-PART Several people have contributed to the development of PC-PATR over the past few years.AlanBuseman, Jim Skon, Bob Kasper, and Nathan Miles all contributed to an earlier program named SILPATR that contained the same basic parsing and unification functions.
Bilbliography: • D-PART: A Development Environment for Unification-Based Grammars, LauriKarttunen; • PC-PART Reference Manual, Stephen McConnel; • Internet.