1.22k likes | 1.26k Views
A short tutorial. With contributions from Steven Derrien, Antoine Floc’h, Antoine Morvan, Kevin Martin, Ludovic L’Hours, Maxime Naullet, Amit Kumar. Outline of the presentation. Introduction First step with Gecos Installation guidelines Execution of a Gecos script The Gecos basic IR
E N D
A short tutorial With contributions from Steven Derrien, Antoine Floc’h, Antoine Morvan, Kevin Martin, Ludovic L’Hours, Maxime Naullet, Amit Kumar
Outline of the presentation • Introduction • First step with Gecos • Installation guidelines • Execution of a Gecos script • The Gecos basic IR • General background on compiler IRs • A EMF meta-modeled IR • Main IR classes (Blocks, Instructions, etc.) • The Gecos DAG IR • The IGraph framework • The Extended Dataflow IR • The DAG/IGraph adapter
What is Gecos • Gecos is an open-source C compiler framework • Compiler Framework = pieces of software that you can assemble to build your own custom compiler. • Targeted toward semi-custom hardware synthesis (ASIP/HLS) • Since 2010, it is also a source to source (CC++) compiler • Polyhedral based loop transformation and analysis • Gecos leverages on Eclipse + Java +MDE • Eclipse : very popular IDE in the embedded world • MDE : Leading edge software engineering technologies • We keep on telling system designers that they should raise the abstraction level of their design flow. • Why wouldn’t we start by applying those principles to ourselves when we design complex software & design automation tools ?
Gecos features • C99 front-end (with support for Mentor templates) • Based on the Eclipse CDT framework (robust C/C++ parser) • Mixed AST/CDFG Intermediate representation • Many existing features • Standard and SSA based IR with standard analysis and opts. • Polyhedral loop transformations framework • Powerful AST pattern matching/term rewriting engine • Extensible IR based on plugin extensions • You can add new « Concepts » in the IR and its extend existing passes without hurting the codebase, thanks to « plugins » • This could be used to manage an IR suited to Scilab/ALMA
Installing Gecos • The super easy way • VirtualBox image with Ubuntu+Gecos ready (Ask for my usbKey) • The simple way • Full featured Gecos/Eclipse distro for Linux/MacOS 32/64bits • See Gecos homepage • The regular way • Install Eclipse, use Gecos update site to install gecos • Recommended for updating Gecos, not for installing • The bold way • The truth lies in the source (Gecos SVN) • You’d better ask for help in this case
Demo • On my non virtual machine …
Reporting Bugs https://gforge.inria.fr/tracker/index.php?aid=7570&group_id=510
Gecos scripting language • A compiler = sequence of analysis/optimizations • In a retargetable compiler framework, the user needs to control which pass and when a given pass is used. • To avoid recompiling the main driver class every time the flow need to be reorganized, Gecos uses a script mechanism. • In Gecos compilation flows are expressed as scripts • A script specifies how passes are executed • A pass = Java class performing the analysis/transformation • Gecos passes declared as plugin extensions in Eclipse • Allow for seamless/flexible integration within the flow • Avoids the need of a centralized list of available passes
Gecos Script Editor: • Syntax highlighting, self completion, etc.
Gecos Script Language constructs • Types • String, int and float Litterals are natively supported • Any java type (scalar) or collection of Java Types • Arrays not directly supported • Built-in functions/variables • stripext, dirname, echo : simple housekeeping primitives • Arguments ($1, …) can be passed to the script execution engine • Control structures • Iterating over a collection • Example : iterating other procedure object in a procedureSet
Gecos Script Language grammar • Is briefly describe in a (partially) outdated pdf file • I can send it upon request • We are working on a “Gecos CookBook” • To gather and update all important information
Running a Gecos Script • Scripts are special files that can be run through the GUI • Select a .cs file, right-click and select run as compiler script
Running a Gecos Script • The execution context can be customized • Menu run, run configuration command.
Demo • Script that parses a C file, regenerates the C code
Creating a new Gecos pass • Sooner or later you will need to write your own pass • To merge a combination of existing passes into a single one • To implement your own analysis or transformations • A Gecos pass is a Java class conforming to some rules • It must have (at least) one public constructor • It must implement a public compute() method • This method may return an object of any Class. • When a pass is used in a script • Function call parameters are passed as argument to the constructor to create a new object of that class. • The script engine calls the compute method on this newly created object.
Creating a new compiler pass (1/2) • Use the provided Wizard
Creating a new compiler pass • Template Java code • Immediately usable in Scripts
Creating a new compiler pass without UI • To use a Java class as a pass you have to ‘register’ it • Tell the Gecos infrastructure that a new pass is available • Registration is done by modifying the plugin.xml file • Each Eclipse Plugin project has it own plugin.xml file • The plugin.xml is used by the Eclipse infrastructure to ‘know’ about available plugins. • In Eclipse framework, Gecos passes are called modules. • Eclipse provide an GUI for editing plugin.xml files • You can also edit the file manually
Hacking the plugin.xml file • Not recommended …
Demo • Create a new Dummy Pass
Part IIThe Gecos Intermediate representationBackgroundWhy using a Meta-model ?
If cond then else == = = = + i 2 1 j j i 1 1 i Tree based IRs • IR is a tree where nodes are language constructs. • They can be more or less complex (from parse trees to full IRs) • They are well suited for source to source compilers • Analysis are generally more difficult to implement, • data-flow and control-flow information is not explicit if (i==1) { i=i+1; j=2; } else { j=1; }
B1 i=i-1; jmp B2 if i=0 else B3 B2 B3 i=i-1; jmp B4 i=i+1; Jmp B4 B4 j++; Jmp B1 Graph based IRs • The program is represented as a Graph of BasicBlocks • A BasicBlock are atomic execution units • Connections between blocs represent the control flow • All control flow is expressed using branch/goto instructions • Pros/Cons • static analysis easier to implement • Regenerating source code is difficult
Gecos Intermediate Representation • We merge the two type of IR into a single one • Brings the “best of both world” • Downside is that the IR more complex to maintain • The whole Gecos IR is specified as a metamodel • You can see this as a kind of UML class diagram • File cdfg.ecore in fr.irisa.cairn.model.gecos/model • Can be seen as a formal specification of the IR structure • The metamodel is used to generate the IR API and other tools • Validation API, Viewer/Editor API, XML output/parser.
Blocks Core Instructions Type ProcedureSet ForBlock SetInstruction BaseType Procedure IfBlock SymbolInst PtrType Scope WhileBlock GenericInst ArrayType Symbol CompositeBlck BranchInst StructType BasicBlock FuncType Main concepts in the IR • The Gecos IR revolves around several constructs • Block to model for high level control constructs (for, while, if) • Instruction to mode dataflow and low level control flow • Symbol, Type and Scope for objects manipulated by the program • Procedure/ProcedureSet to model functions and modules
Gecos CDFG meta-model Claim : The meta-model is the documentation An object of Class Instruction can belong to only one BasicBlock
A few important notions • A containment reference means that • The container knows its content (obvious) • The contained object knows who is its unique container • This bidirectional relationship is enforced by the EMF API • An opposite non containment reference means that • Both side of the relation know about each other • This opposite relationship is enforced by the EMF API • Benefits of this distinction • We can provide a copy operation with a sound semantics • Traversal operations on the containment tree are easy • Avoids many implementation bugs (reference aliasing)
Design patterns in Gecos Gecos API is based on EMF which uses design patterns Design Patterns : empirical good OO design practices Three patterns you won’t be able to live without in Gecos The Façade pattern : decoupling spec. and implementation EMF manipulates Interfaces not objects classes The Visitor pattern : traversing and/or transforming the IR To enable polymorphic dispatch The Factory pattern : for creating new objects The Adapter/Decorator patterns : for extending the framework
BranchInstr. SetInstruction … children children children BasicBlock insts in out Containment in meta-models Interface BasicBlock { EList<Instruction> insts; … } BasicBlock Interface CtrlEdge { BasicBlock src; BasicBlock dst; } insts in out CtrlEdge CtrlEdge src src dst dst SymbolInstr containment symbol non containment, with EOpposite GenericInst name = add children Symbol CtrlEdge name=« i » src dst InstInstruction SymbolInstr Scope Value =1 symbol symbols
… BranchInstr. SetInstruction children children children BasicBlock insts in out Containment in meta-models a b.getInstrs().add(c) BasicBlock Interface CtrlEdge { BasicBlock src; BasicBlock dst; } insts in out b SymbolInstr c symbol GenericInst name = add children Symbol name=« i » InstInstruction SymbolInstr Scope Value =1 symbol symbols
Containment in meta-models BranchInstr. SetInstruction … children children children BasicBlock insts in out a b.getInstrs().add(c) BasicBlock Interface CtrlEdge { BasicBlock src; BasicBlock dst; } insts in out b SymbolInstr c symbol GenericInst name = add children Symbol name=« i » InstInstruction SymbolInstr Scope Value =1 symbol symbols
SetInstruction BranchInstr. … children children children Symbol name=« i » Copy in EMF models a Instruction d = ECoreUtil.copy(c) BasicBlock insts in out d GenericInst name = add children SymbolInstr c symbol GenericInst name = add InstInstruction SymbolInstr Value =1 symbol children InstInstruction SymbolInstr Value =1 symbol
An immediate benefit of using EMF/Ecore is the ability To serialize the IR to XMI (variant of XML) To generate a Tree based editor automatically IR viewer and editor
Demo • Serialize the model and view/edit it.
Part IIThe Gecos Intermediate representationGecos IR overview
GecosProject & ProcedureSet class • They are the toplevel objects • GecosProject contains a list of modules • ProcedureSet contains procedures and global symbols definitions of a module • It corresponds to a compilation unit (i.e a C file)
The Procedure class • Contains the definition of a sub-program • And blocks (start, end, body), instructions and local symbols • A procedure is also a special type of symbol (ProcedureSymbol)
The Scope class • Contains symbols and types definitions • Nested structure, a local Scope is contained by a parent Scope
The Scope class Global variables and type definitions Local variables and type definitions
The Symbol class • Symbols can be variables or functions • Symbols have a name, a type and may have an initial value • They are contained by a Scope object instance
The Type classes Types contained in a Scope object (generally root) Examples BaseType are standard primitive types (Int, float, void) ArrayType represent statically declared arrays types PtrType represent pointer types FunctionType represent function prototypes Type qualifiers
The Block classes • The Block interface defines high level control structure • Source level control structure (excepted BasicBlock) • Containment hierarchy (blocks contain other blocks) • In the following, we will describe 7 over 9 type of blocks • They form the back-bone of the Gecos IR, but some other exist.
Contains a list of instructions executed atomically No control flow inside a BB, except for the last instr. BB known its predecessors & successors (ControlEdge class) The BasicBlock class Example :
The ControlEdge class • Models the control flow between BasicBlocks • And only between BasicBlocks The next BasicBlock in the control Flow for B11 is B12. The control flow information is build explicitely on user demand using passes BuildControlFlow & ClearControlFlow
The CompositeBlock class • A CompositeBlockis used as a container for blocks • It models the sequencing of blocks in the program flow • A CompositeBlock also contains symbols (variables) definitions
The IfBlock class • Model if/then/else language constructs • The predicate is stored as a Block object (condBlock) but is always a BasicBlock object that contains only one instruction. • Then/else branches are in thenBlock/elseBlock fields • They can be of any type