330 likes | 484 Views
Introduction to Language Processing Technology. Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University. Outline. Level of Programming Languages. Language Processors. Specification of Programming Languages. swap(int v[], int k) { int temp; temp = v[k];
E N D
Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University
Outline • Level of Programming Languages. • Language Processors. • Specification of Programming Languages.
swap(int v[], int k) { int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; } swap: muli $2, $5, 4 add $2, $4, $2 lw $15, 0($2) ... 000010001101101100110000 000010001101101100110000 000010001101101100110000 000010001101101100110000 ... Level of Programming Languages • High level: C / Java / Pascal • Low level: Assembly / Bytecode • Machine Language C Compiler Assembler
High-Level Language Characteristics • Expressions: a = b + (c – d)/2; • Data types: • Integer, character, boolean. • Record, array. • Control structures: • Selective. • Iterative.
High-Level Language Characteristics • Declarations: • Identifier can be constant, variable, procedure, function, and type. • Abstraction: • Object-oriented concept. • Concern only what, not how. • Encapsulation: • Object-oriented concept. • Information hiding.
Language Processors • Program that manipulates programs express in some programming languages. • Example: • Editor. • Translator / Compiler. • Interpreter.
Translator sourceprogram objectprogram error messages Translator • Translate a “source” program into an “equivalent” “object” program. C C++ FORTRAN Java VB Assembly C Bytecode p-code
P Sort Sort Web Browser L Java x86 x86 Tombstone Diagrams • Ordinary program Program P Written with Language L
Web Browser Web Browser x86 x86 Tombstone Diagrams • Machine x86 SPARC Machine M M x86 SPARC
S ® T Java ® x86 Java ® x86 Java ® C L C x86 C++ Tombstone Diagrams • Translator S is translated to T Translator is written with Language L
Sort Sort Sort C ® x86 x86 C x86 x86 x86 Tombstone Diagrams • Compilation x86
Sort Sort Sort C ® SPARC SPARC SPARC C x86 x86 Tombstone Diagrams • Cross Compilation SPARC
Sort Sort Sort Java ® C C ® x86 C Java x86 x86 x86 x86 x86 Tombstone Diagrams • Two-stage compilation
C ® x86 Pascal ® x86 Pascal ® x86 x86 C x86 x86 Tombstone Diagrams • Compiling a compiler
Sort Basic Tombstone Diagrams • Interpreter Basic x86 Interpret source S S L Basic x86 SQL SPARC x86 Written in language L
HW1 HW1 C ® x86 370 370 x86 370 C 370 x86 370 x86 Tombstone Diagrams • Abstract machine = hardware emulator • interpreter for low-level language. 370 x86 = x86
Java ® JVM M Tombstone Diagrams • Java • Portable environment: write-once-run-anywhere. • Interpretive compiler. JVM = Bytecode JVM M
Sort Sort Sort Sort Java ® JVM JVM JVM JVM Java x86 JVM x86 JVM SPARC x86 x86 SPARC Tombstone Diagrams
C ® NNP NNP Tombstone Diagrams • Bootstrapping • Compiler L that is written on L language. • Full bootstrap • Start from nothing. • Half bootstrap • Start from other machine.
Full Bootstrap Csub® NNP Csub® NNP Csub® NNP Csub® NNP Csub® NNP C ® NNP C ® NNP NNP Csub NNP NNP Csub NNP NNP Tombstone Diagrams NNP NNP
C ® NNP C ® NNP C ® NNP C NNP NNP Tombstone Diagrams NNP
C ® NNP C ® NNP Csub® NNP C ® NNP Csub® NNP C ® NNP Csub® NNP NNP NNP Csub C NNP Csub NNP Tombstone Diagrams NNP NNP NNP
Half Bootstrap C ® x86 C ® NNP C ® NNP C ® X86 C ® NNP C ® NNP C ® NNP NNP C x86 x86 x86 x86 C x86 Tombstone Diagrams x86 x86
Specification of Programming Language • Specification • Syntax • Define symbol and structure of the language. • Grammar. • Contextual constraints • Constraints beyond grammar. • Rules of the language: scope rules, type rules, etc. • Semantics • Meaning of program: its behaviors when run. • How to translate a sentence S of the language L to a machine code on M
Syntax • Context-free grammar • Terminals. • Non-terminals / Variables. • Start symbol. • Production rules. • Usually being expressed with BNF notation.
BNF Notation • Backus-Naur Form. • Given production rule: N ®a N ®b • Can be written as: N ::= a | b
Example: Mini-Triangle Program ! This is a comment. It continues to the end-of-line. let const m ~ 7; var n: Integer in begin n:= 2 * m * m; putint(n); end Terminals begin const do else end if in let then var while ; : := ~ ( ) + - * / < > = \
Mini-Triangle Syntax Program ::= Command Command ::= single-Command | Command ; single-Command single-Command ::= V-name := Expression | Identifier ( Expression ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end
Mini-Triangle Syntax Expression ::= primary-Expression | Expression Operator primary-Expression primary-Expression ::= Integer-Literal | V-name | Operator primary-Expression | ( Expression ) V-name ::= Identifier Declaration ::= single-Declaration | Declaration ; single-Declaration single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoter
Mini-Triangle Syntax Type-denoter ::= Identifier Operator ::= + | - | * | / | < | > | = | \ Identifier ::= Letter | Identifier Letter | Identifier Digit Integer-Literal ::= Digit | Integer-Literal Digit Comment ::= ! Graphic* eol Letter ::= a | b | … |z Digit ::= 0 | 1 | 2 | … | 9
Syntax Tree • Ordered tree with • Internal nodes: non-terminals. • Leaf nodes: terminals. • N-tree of G is a syntax tree with N as the root.
Mini-Triangle Syntax Tree Expression ::= primary-Expression | Expression Operator primary-Expression primary-Expression ::= Integer-Literal | V-name | Operator primary-Expression |( Expression ) V-name ::= Identifier …