1 / 50

Compiler design

Compiler design. Text Book: Compilers: principles, theory, and techniques by Aho, Sethi, and Ullman. Topics: Compiler phases Lexical analysis Syntax analysis Code generation Home work: there will be two major programming assignments. They must be done independently.

lavender
Download Presentation

Compiler design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler design Text Book: Compilers: principles, theory, and techniques by Aho, Sethi, and Ullman. Topics: Compiler phases Lexical analysis Syntax analysis Code generation Home work: there will be two major programming assignments. They must be done independently. Examinations: there will be two hourly exams and a final exam. Grading: the total grade will be computed as follows: 20% for each hourly exam 15% for the home work 50% for the final exam.

  2. Overview of Compiler • Compiler is a program (written in a high-level language) that converts / translates / compiles source program written in a high level language into an equivalent machine code. compiler source program machine code or object code

  3. What is a Compiler? • Definition: A compiler is a program that translates one language to another • Usually, the translation takes place between a high-level language and a low-level language • Clearly, our first step is to discuss some terminology…

  4. Terminology • Source language – the language that is being translated • Object language – the language into which the translation is being done • High-level language – a language that is far removed from a computer; one which is close to the problem area(s) for which the language is designed

  5. Terminology… • Low-level language – a language that is close to the machine (computer) upon which the language will run (execute) • Object language – (sometimes called machine code) the language of some computer. This language usually is not human readable (and is expressed in bits or hex)

  6. Terminology… • Intermediate language – a language that is used either: • because it is a temporary step in the translation process; or, • because it is neither particularly, high, nor low, and is the output of a translation • Assembly language – a language that translates almost one-to-one to machine language, but is in human readable form

  7. What’s a Compiler?... • Today, compilers are written using high-level languages (such as Java, C++, etc.) • The earliest compilers were written using assembly language (e.g., FORTRAN and COBOL around 1954) • Sometimes a compiler is written in the same language for which one is writing a compiler. This is done through Bootstrapping.

  8. Why Should I learn Compiler Construction? • How do compilers work? • How do computers work? (instruction set, registers, addressing modes, run time data structures, …) • What machine code is generated for certain language constructs? (efficiency considerations) • Getting "a feeling" for good language design

  9. Why Compilers? A Brief History • The first computers were “hard-wired” • That is, they were collections of physical devices that connected to one-another, in an assemblage designed to calculate particular kinds of results

  10. Why Compilers? A Brief History… • For example, Babbage’s Analytic Engine and his Difference Engine were assemblages of gears that solved numeric problems • The primary driving force was the calculation of ballistics tables for artillery • Jacquard’s loom is another example • And Holleriths’ work for the US Census bureau is another

  11. Why Compilers? A Brief History… • In the late 1940’s John von Neumann “invented” the stored program computer • The “invention” is the observation that just as you can store data in the memory of a computer, the data can be machine instructions • Then the computer can not only take its instructions from memory…

  12. Why Compilers? A Brief History… • But the computer can modify the instructions in its memory… • And, in fact, can write its own programs, storing them in memory • It quickly became apparent that the simplest way to store information in a computer was in the form of binary numbers

  13. Why Compilers? A Brief History… • So, to program a computer, you only needed to enter a sequence of binary numbers into memory, and then tell the computer at which memory address to start execution • This was programming in machine language • Instructions (and data) were entered from a console, one word (in binary) at a time…

  14. Why Compilers? A Brief History… • This form of coding (note the word!) quickly was replaced by programming in assembly language • A program was written (in machine language) which translated assembly language to machine language (called an assembler)

  15. Why Compilers? A Brief History… • After the first assembler was written, no one needed to code in machine language any longer • But, coding x = 3; can take many instructions… • So, the thought was – can we create a program that translates something like x = 3; into assembly language or into machine language?

  16. Why Compilers? A Brief History. Formal Languages • About the same time, in the mid-1950’s, Noam Chomsky (M.I.T.) began investigating the formal structure of natural languages • His work led to the Chomsky hierarchy of type 0, 1, 2, 3 languages and their associated grammars

  17. Why Compilers? A Brief History. Formal Languages… • The type 2 (context-free) grammars turned out to be very good at describing computer languages • And, efficient ways to recognize the structure of a source program using a type 2 were developed • Such recognition is called parsing

  18. Why Compilers? A Brief History. Formal Languages… • Very closely related to context-free grammars are the type 3 grammars • These are equivalent to finite automata and regular grammars • An entire sub-branch of mathematics studies automata; it’s called automata theory

  19. Why Compilers? A Brief History. Formal Languages… • It turns out that type 3 (regular) grammars are very good at describing the “atoms” used in computer languages • These “atoms” are the reserved words, symbols, and user-defined words that are used in a computer language • Recognizing atoms is called scanning (or lexing)

  20. Why Compilers? A Brief History… • By far the most difficult and complicated problem has been how to generate object code that is concise, and most importantly, executes efficiently • This is called “optimization”

  21. Why Compilers? A Brief History… • Far simpler are the front-end issues of scanning and parsing = recognizing the source code • This is due to the fact that we’ve developed (semi-) automatic ways to create scanners and parsers… • using scanner generators and parser generators

  22. Programs Related to Compilers… • Interpreters – directly executes the code upon recognition; usually statement by statement • Assemblers – translate assembly language to machine language • Macro Assemblers – ditto, but with (powerful) macro capabilities

  23. Programs Related to Compilers… • Linkers – combine object modules to produce an executable module • Linkage Editors – manage the linking process, and are able to create/maintain object libraries

  24. Programs Related to Compilers… • Loaders – load executable modules into memory, and launch execution • Dynamic Loaders – loaders that stay around during execution to handle the loading of DLLs (dynamically loadable libraries)

  25. Programs Related to Compilers… • Preprocessors – usually a separate program whose input is source code and whose output is source code; perform macro expansion, comment deletion, etc. Sometimes the first phase of a compiler

  26. Programs Related to Compilers… • Editors – allow the user to create and update source code • Smart Editors – include syntax coloring, parenthesis balancing, etc. • Debuggers – a program that provides an environment in which code may be debugged; including single stepping, symbol tables, etc.

  27. Programs Related to Compilers… • IDEs – integrated development environments; provide integrated editor-debugger-execution environments • Profilers – collects statistics about where programs spend their time during execution; important for optimizing at the source code level

  28. Programs Related to Compilers… • Project Managers – programs that help software managers deal with hundreds or thousands of modules; build reports, etc. • SCCS – source code control systems; provide for multiple access to shared code in a control manner

  29. The Translation Process • The translation process consists of a collection of phases, with the output of one phase feeding the input of the next • The original source code is transformed into a sequence of intermediate representations (IRs) during this process

  30. The Translation Process

  31. Phases of Compiler Parallel to all other phases are two activities: • Symbol table manipulation. Symbol table is one of the primary data-structures that a compiler uses. This data-structure is used by all of the phases. • Error detecting and handling

  32. The Scanner • The scanner reads the source program, as a stream of characters, and it performs lexical analysis – collecting sequences of characters into meaningful units called tokens • The scanner also may create a symbol table and a literal table

  33. The Parser • The parser reads the tokens produced by the scanner and performs syntactic analysis – creating an IR (a parse tree or a syntax tree) showing the structure of the program • Syntax trees (abstract syntax trees) are reduced representations of the tree, with many irrelevant nodes eliminated

  34. The Semantic Analyzer • The semantics of a program are its “meaning” – what it is intended to accomplish • The semantic analyzer creates an intermediate data structure that contains this meaning – these are the static semantics • The dynamic semantics of a program only can be determined by executing the program

  35. The Semantic Analyzer… • An example of the static semantics of a program is the data types of the variables (and expressions) • These static semantics usually are represented in the intermediate representations (IRs) as attributes • The IR usually is a tree, “decorated” with these attributes

  36. (Source) Code Optimization • Optimization may occur during several phases • Source code optimization rearranges the source (or the IR of the source) in order to produce more optimal results • E.g., x = 7 + 9; can become x = 16; • This is called constant folding

  37. (Source) Code Optimization… • Duplicated computations can be saved as temporaries and then their values re-used • Recursion can be converted to iteration • Repeated calculations can be moved out of loops • The possibilities are endless…

  38. The Code Generator • The code generator takes the IR and generates code for the target machine • Here the details of how various numeric and non-numeric quantities are represented become important • E.g., word length, hardware stack, hardware calling conventions, memory access, etc.

  39. The Target Code Optimizer • The target code optimizer examines the emitted target code to see if further possibilities for optimization are present and then capitalizes upon them • E.g., reuse of registers, using a shift instruction to replace a multiplication or division, etc.

  40. Phases of the compiler Source Program Scanner Lexical Analyzer Tokens Parser Syntax Analyzer Parse Tree SemanticAnalyzer Abstract Syntax Tree with attributes

  41. Sample Program Compiled • Consider the example: int a, b{ a = 100; b = f (a) + 3} Source Program Lexical Analyzer Token stream

  42. Sample Program Compiled • Tokens are entities defined by the compiler writer which are of interest. A sequence of characters with collective meanings are grouped to form a token. • Examples of Tokens: • Single Character operator: = + - * > < • More than one character operator: ++, --,==,<= • Numeric Constants: 1997 45.89 19.9e+7 • Key Words: int, while, for • Identifiers: x, my_name, Your_Name, a • Homework: Identify all token types in C programs.

  43. Example Program Compiled-Continued What are the tokens in the example?

  44. Example Continued The parser produces a parse tree: it is a heterogeneous tree (nodes have different data types) root_node stmt1 stmt2 stmt1 stmt2 = = a 100 b + f 3 ( a )

  45. Intermediate-Code Generation • Using temporary location to save values • t1 = 100 • store t1, a • load a, t2 • t3 = f(t2) • t4 = t3 + 3 • store t4, b

  46. Intermediate-Code Optimization • Eliminate unnecessary code or statements that want be executed • t1 = 100 • store t1, a • t3 = foo(t1) • t4 = t3 + 3 • store t4, b

  47. Target-code Generation • Machine code generated for some machine • R1 = 100 • store r1, 0x10 • jsr _f • r2 = r0 + 3 • store r2, 0x16

  48. Compiler ArchitectureSingle pass vs. multi pass architecture Single pass: all passes interleaved, driven by parser

  49. Multi pass Each pass finishes before next starts • Saves main memory, communicate through files • Used if the language is complex or portability is important

  50. Front end & Back end • Front end: is the phases or parts of phases that depend on the source language. • Back end: is phases or part of phases that depend on the target machine.

More Related