290 likes | 413 Views
J AMOOS An Object-Oriented Language for Grammars. MODULE A A X Y Z; X “JAM”; Y {“O” … }++ Z “S”; END. Yuri Tsoglin Supervised by: Dr. Yossi Gil. A simple example. How do we write a grammar for Pascal program in YACC ? Note: even this is not sufficient…
E N D
JAMOOSAn Object-Oriented Language for Grammars MODULE A A X Y Z; X “JAM”; Y {“O” … }++ Z “S”; END Yuri Tsoglin Supervised by: Dr. Yossi Gil
A simple example How do we write a grammar for Pascal program in YACC ? Note: even this is not sufficient… Must define in a lexical file that semicolon means “;”. Program: program Name semicolon Decls Body; Decls: Decls Decl | /* empty */ ; …………………………………………………… Conditional: if Exp then Statement OptElse; OptElse: else Statement | /* empty */ ; ……………………………………………………
Problems with Yacc • List and optional elements are defined in an unnatural way. • Tokens must be defined in a separate file. • All productions must have semantic features of the same type, which is defined separately. • Error handling using special error token. • No support for language library. • No internal symbol table handling.
What do we have in JAMOOS • Equivalence between programs and grammars. • Lingual features: • Class definitions in EBNF form • Default fields • Automatic field naming • Type OK • Error handling and error types • Tree computation metaphor • Modular definitions and generic modules • Dictionaries
Grammatical features: • Extended BNF grammars • Three predefined kinds of tokens • Generic (parametrized) grammars • Language embedding • Improved parse error handling • Internal symbol tables handling
Class definitions • Each class definition defines also a grammar production • Extended BNF: structures like lists, optional components, choices are represented as such. • No need for a separate “lexical” file. All the tokens are written as they are within the grammar definition. The above Yacc example can be redefined this way: Program program Name “;” {Decls …} Body; …………………………………………………………... Conditional if Exp then Statement [else Statement];
Class Rule Procedure • Every definition in Jamoos can be read and understood as all of the following: • Rule (in a BNF) • Class (as in OO) • Procedure (as in imperative programming) with local variables, input, output, and input-output arguments. • Example: AX Y; • Rule: Symbol A can derive an X and a Y. • Class: Class Ahas two components, X and Y. • Procedure: Procedure Acalls procedures X and Y. • A definition has fields …
Classification of Fields • Every field represents a value to be computed, a syntactical or semantical element, or a component of a class. • Properties of a field: _: Name: Type := Initializer • Type: Almost always exists • Could be a primitive type, a class, or a compound type. • Name: Optional (automatic naming can be used) • Initializer: Optional • Perishability prefix: Optional
Detailed Example Addition Expression “+” ExpressionFEATURES value:INTEGER := [[return$Expression#1.value + $Expression#2.value ;]]END This can be understood in the following three ways: Addition is a production of two Expressions with a “+”between them, having an integer semantic feature valuewhose value is computed using the given C++ code. Addition is a class consisting of three fields: two unnamed of type Expression and one of type INTEGERnamed value. The constructor of this type gets two parameters of type Expression, assigns their values to the first two fields and assigns the result of the C++ computation to the third field. Addition is a procedure which gets two IN-OUT parameters of type Expression and assigns a value to a third OUT parameter computed using the C++ code.
The Special Return Field • Let us slightly change the above example:Addition Expression “+” ExpressionFEATURES value:INTEGER := [[return$1.value + $2.value ;]] • Why not write just $1+$2 like in Yacc? • A field named return is a default field. To refer it, its name can be omitted.Addition Expression “+” ExpressionFEATURES return:INTEGER := [[ return$1 + $2 ; ]] -- assuming Expression also has a return field
Program program _:Name “;” decls:{ Decl … } Body FEATURES num_vars: INTEGER := decls.num_vars;END VarsDeclaration var _:vars:{ (var_list:{ Name “,” … }+ “:” Type “;”) … }+ FEATURES variables: { (Variable Type) … }+ := [[ for(int i=0; i<$@vars; i++) for(int j=0; j<$@vars[i].var_list; j++)ADD (vars [i].var_list [j] vars [i].Type); ]] END Notice howJAMOOS and C++ are mutually embedded.
Internal Classes We have no methods!!! • There are no methods as such. • Internal classes can be used as methods. • A constructor call for an internal class is like a method call. • If method needs local variables, these are fields of the internal class. • The returnfield may be used to “return” only the necessary value.
Tree Computation • An execution of JAMOOS program is nothing but • A nested chain of constructor calls, or, • An execution of a bottom-up or top-down parser, • A nested execution of procedures and functions. • Each constructor call builds an object which becomes a node in the abstract syntax tree. • The constructor computes all the attributes by executing their initializer, in the order of their appearance in the definition. • When parsing, constructor calls are made implicitly by parser. • At the start, constructor of class Main is called.
Four kinds of compound types: • List (similar to arrays) • Optional (similar to pointers) • Choice (as in C’s union or Pascal’s variant records) • Sequence (as in C’s struct) More examples: CompoundStatement begin{ Statement “;” … }end; ForLoop for Var “:=“ lower:Exp up OF to | down OF downto upper:Exp do Statement;
Tokens There are three types of tokens: • Keyword - any sequence of letters and digits (beginning with a letter). if begin abc345 • String - any quoted sequence of characters. “(” “…” “A^” • Regular expression. <[A-Za-z]*> <abc(de)*>
Primitive Types • Tokens define objects of primitive types, by default - STRING • JAMOOS primitive types are:INTEGERREALBOOLEANCHARACTERSTRINGOK
Unit Type • Unit type is called OK. • Used primarily to designate imperative code fragments (usually in C++). • An expression of this type may appear at any place within a constructor argument list. Program program _:Name “;” decls:{ Decl … } Body FEATURES print_num_vars: OK := [[cout << $decls.num_vars;]] END
Error Types • Both syntax and semantic errors are handled using error types. • An object of an error type can “legally” be in an illegal state. Procedure Header? Body; -- Header can be illegal VariableName IdFEATURES type:Type? := … -- Type can be illegalEND • A special case is type OK? which can be used to define assertions. • Errors are generated by special ERROR command. • Any object can be tested for being in an illegal state.
What about inheritance? Abstract class:the right hand side defines all the subclasses. Statement Assignment | Loop | Conditional | Compound | ProcCall; Loop ForLoop | WhileLoop | RepeatLoop; Grammatically, this is just a selection element of EBNF.
Field Inheritance • Fields of an abstract class can be inherited or overridden in a subclass. • When overridden, field can be made either component or attribute. Loop StepLoop | CondLoop; CondLoop WhileLoop | RepeatLoop FEATURES cond: Expression; END WhileLoop while @cond do Statement; RepeatLoop repeat { Statement “;” … } until @cond;
There are also “inherited features”! Dictionaries • Each field can depend on the fields of the descendants (so called “generated features”). Symbol tables can help in most practical cases. In JAMOOS they are called dictionaries.
Dictionaries (cont.) • A dictionary is a mapping from strings to some type. • So, to define a dictionary, we define the type of its elements. • There is a stack of dictionaries for each dictionary type. • Three operations on a dictionary: • INSERT (a_string, an_element) • SEARCH (a_string)- only current dictionary • FERRET (a_string)- search through stack • A class can be assigned a dictionary; the dictionary will be pushed on stack each time an object of that class is accessed.
Dictionaries (cont.) Example: DICTIONARY Identifiers; ………………………………….. ProcDecl procedure Name Identifiers “(“ {Param “,”} “)”; The place of Identifiers within the definition defines when the dictionary must be constructed. In this case, the dictionary is constructed afterprocedure and Name are matched.
Modularity and Genericity • Definitions can be modularized. • Each module can have type parameters. MODULE Expression (Op)Expression Unary | Binary | Parenthesized;Unary Op Expression;Binary Expression Op Expression;Parenthesized “(“ Expression “)”;END Similar to templates! Each class in the module is a template class parametrized by Op.
Now, any other module can use this module. PascalExp Expression (Op=PascalOp); PascalOp Arith | Bool; Arith plus OF “+” | minus OF “-” | mult OF “*” | div OF “/”;
Calls between grammars • Sometimes, we need one parser to call another parser. For example: • A version of Pascal allowing embedded code in Assembler. • PARSEcommand is used to call another parser. EmbeddedAssemler PARSE(“\””,Assembly,“\””); EmbeddedCPP PARSE(“[[“,CPP,”]]”);