680 likes | 867 Views
CPE 471. Assemblers. Assembly Instructions. Four parts to an instruction: Label(s) Operation Operands Comments We are using a flexible format Instruction is still on one line (except label) Spaces and tabs can appear between “tokens” Alternative: fixed format
E N D
CPE 471 Assemblers
Assembly Instructions • Four parts to an instruction: • Label(s) • Operation • Operands • Comments • We are using a flexible format • Instruction is still on one line (except label) • Spaces and tabs can appear between “tokens” • Alternative: fixed format • Label in column 1; Op in column 16; Operands in col 24, etc. CPE 471
Assembly Instructions • Label definition: • Symbolic name for instructions address • Clarifies branching to an address and data's address • Often severely restricted in length • Starts anywhere before operation • Contains letters (a-zA-Z), digits, ‘$’, ‘_’, ‘.’ CPE 471
Assembly Instructions • Operation field • Mnemonic for an instruction (ADD, BRA) or • Mnemonic for a pseudo-instruction (DC, EQU) • Operand field • Addresses and registers used by instruction • In other words, what to add, where to branch • Comment field: no affect on assembler • Used for documentation. CPE 471
What's an Assembler • Translation: • Source program translated into target • Source and target define levels • Source is not directly executed (what is that?) • Target or object file is executed or translated later • Assembly language means the source is a symbolic representation of machine language • When the source is higher level the translator is usually called a compiler CPE 471
Advantages of Assembly • Compared to machine code • Easier to remember (HLT vs 26577) (symbolic) • Similarly for addresses in program (symbolic) • BGT loop1 vs • BGT 6210 • Over high-level language • Access to full capabilities of the machine • testing overflow flag, test & set (atomic), registers • Speed CPE 471
Advantages of Assembly • Often a holdover from when machines were expensive and people where cheap • Systems programming often done in a language like C • Old myth: "if a program will be used a lot, it should (for efficiency) be written in assembly" • Hard to write (10 lines/day independent of language) • Hard to read: high maintenance, high turnover • Reality: good compilers, fast machines CPE 471
Modern Approach • Write in high level language • Analyze to find where time spent • Invariably, it's a tiny part of the code • Improve that tiny part (perhaps with assembly) • Problem oriented language allows high level insights • Algorithmic insights save tremendously • Assembly programmer immersed in bit-twiddling (penny wise, pound foolish) CPE 471
Types of Assemblers • Assemblers can be classified according to the number of passes (scanning the source file) to: • One-pass assembler: some restrictions are imposed on the forward referencing such as eliminating forward references to data. • Two-pass assembler: the only restriction on forward referencing is in the symbol definition, i.e., all assembler directives (e.g. EQU and ORG) that define symbols can only use symbols that are previously defined. • Multi-pass assembler, restrictions are made on the level of nesting in forward referencing. CPE 471
Assembler Tasks • Parse assembly instructions • Check for syntax • Tokenize string • Assigns addresses to instructions • Maintains location counter • LC = eventual location in memory of this instruction • Generate machine code • Evaluation mnemonic instructions (replace w/opcode) • Evaluate operand sub-field (replace symbols with value) CPE 471
Assembler Tasks • Concatenate to form instructions • Process pseudo-ops • generate header record • evaluate DC, DS, … etc • Write output object file • Nothing here seems all that hard! CPE 471
Example: First Attempt • Read each input line and generate machine code • Associate symbol with location counter • Lookup mnemonic and get opcode • Generate instruction • Example: CPE 471
Example Test ORIG $0100 A EQU 16 Begin LD R0,N LD R1, #A ST R0, ANS HLT N DC 13 ANS DS 1 END Begin CPE 471
Data Structures • Location counter (LocCounter) • Search a table (OpTable) for mnemonic • Get opcode • Prepare to handle arguments (group like instructions together?) • Translate arguments • Lookup/add symbol names (SymTable) and replace with location • Watch out for relative offsets CPE 471
Location Counter • Eventual address of instruction • Initialized with 0 • Increment with each instruction (see OpTable). Always two for our machine CPE 471
2-pass Assembler • Solution: 2-pass assembler • Pass #1: identify and define all labels • As location of each label is determined, save in SymTable • Requires some knowledge of instructions • Pass #2: generate object code • Whenever a label is used, replace with value saved in table CPE 471
Pass I: • Determine length of machine instructions. • Keep track of Location Counter (LC). • Generate a table of symbols for pass 2. • Process some pseudo operations. • Remember literals. CPE 471
Pass II • Look up value of symbols. • Generate machine code for instructions. • Generate data for DS, DC, and literals. • Process pseudo operations. CPE 471
Pass 1 & 2 Communication • Scan text file twice • Save symbol locations first pass, then plug in 2nd • Simpler • Disadvantage: slow • 6,800 instructions medium size file • 52,000 instructions large file CPE 471
Big Picture: Labs 1 & 2 Assembly File Assembler Lab 2 Object File Disassembler Lab 1 Linker Lab 3 Executable File CPE 471
Table Driven Software • Many times software is very repetitive • Use functions! • Many times the information processes is repetitive • Use loops • Many times the information to write the code is repetitive but static: • Use tables CPE 471
Table Driven Software • Easier to modify: add new entries, change existing ones, well centralized • Code is easier, eventually, to understand • Works if there are not many exceptions CPE 471
Machine OP Table • This table is static (unchanging) • For our machine: • All opcodes are 6 bits • Instructions size is one or two words. • Formats do differ CPE 471
Machine Op Table • Variable length instructions: • "Branch relative": PC <- PC + operand • Near: operand is 9 bits, far operand is 16 bits. • Varying formats • Fixed format makes parsing simple CPE 471
Machine Op Table • Fields might include: • Name: “add” • Type: ALU • opcode: 000000 • Size: 2 or 4 • One entry for each instruction CPE 471
Symbol Table • Pass 1: • Each symbol is defined. Every time a new symbol seen (i.E., In a label) put it in the symbol table • If already there, error CPE 471
Symbol Table (Pass #1) • Each symbol gives a value • Explicit: A EQU 16 • Easy: just put operand in table • Implicit: N DC 20 • Must know address of instruction • Therefore, keep track of addresses as program is scanned • Use location counter (LC) CPE 471
Symbol Table (Pass #2) • Symbols in operand replaced with value • Look up symbol in symbol table • If there, replace with value • Else, ERROR CPE 471
Literals • Implicit allocation & initialization of memory • Allows us to put the value itself right in the instruction • Prefice with “=“ • Example: • LD D3,=#16 • This means: • allocate storage at end of program • initialize this storage with the value #16 • use this address in the instruction CPE 471
Literal Table • Pass #1 • literals are identified and placed in the table • “name”, “value”, and “size” fields updated • duplicates can be eliminated • To complete pass #1: • literals are added to the end of the program • “address” field can now be calculated • Pass #2: • literals in instructions are replaced with the • __________ field from the Literal Table • what if the literal is not in the table? CPE 471
Pseudo Operations • Unlike operations, typically do not have machine instruction (opcode) equivalents • Give information to the assembler itself. Another term: assembler directive • Not intended to implement higher level control structures (if-then, loops) • Uses: segment definition, symbol definition, memory initialization, storage allocation • Low level bookeeping easily done by machine CPE 471
Psuedo-op Table • Also a static table • Some lengths are 0, some 1, others? CPE 471
Object File • Header record: contains information on program length, symbols that other modules may be looking for, and the name of this module. • Format: 1 H for header 2-5 Program length in HEX + a space 6-80 List of entry names and locations Entry name (followed by a space). Entry location in HEX 4 columns + a space • The first entry should be the first ORG. CPE 471
Object File II • Text (type T): contains the hex equivalent of the machine instruction. • Format: 1 T for text 2-5 Address in HEX where this text should be placed in memory. 6-80 Instructions and/or data in HEX. • Each word (byte) should be followed by an allocation character to determine the type: • S (Absolute – Does not need modification), R (Relative: needs relocation), and X (external). CPE 471
Object File III • End (type E): indicates the end of the module and where the execution would begin. • Format: • Column 1 E for END 2-5 Execution start address in HEX CPE 471
Information Flow Pass 1/2 CPE 471
Two Pass Assembler:Limitations • Q: Does our 2-pass approach solve all forward-reference problems? • A: no! Something is still broken… CPE 471
Forward Reference Restriction • To avoid this trouble, impose a restriction: • ------------- • What about DS? Is a forward symbol allowed as the operand? Consider: • X EQU Y • Y EQU 0 • Y EQU 0 • X EQU Y CPE 471
Containers • What does this mean • We can't just generate a machine instruction from each assembly instruction -- must save info • We need to start using some data structures! • Table means any data structure or container CPE 471
Absolute Programs • Programmer decides a priori where program will reside • e.g., Prog ORIG $3176 • But memory is shared • concurrent users, concurrent jobs/processes • several programs loaded simultaneously CPE 471
Absolute Programs: Limitation CPE 471
Absolute Programs: Limitation II • Would like the loading to be flexible • decide at load time where it goes! (not at ____________ time) • this decision is made by • __________________ • What the programmer wants: “find a free slot in memory that is big enough to fit this program” CPE 471
Motivating Relocation -Example Prog ORIG 0 X DC.W Y Start LD.L D1,X ST.L D1,Y HLT Y DS.W 1 END Start • In Memory, this program appears as: CPE 471
Motivating Relocation -Example * One slight change Prog ORIG $0100 X DC.W Y Start LD.L R1,X ST.L R1,Y HLT Y DS.L 1 END Start • In Memory, this program appears as: CPE 471
Relocation • The loader must update some (parts of) text records, but not all • after load address has been determined • The assembler does 2 things: • assemble with a load address • tell the loader which parts will need to be updated CPE 471
Modification Records • One approach: define a new record type that identifies the address to be modified • We could add the following record to the object file: M address e.g. M 0004 • Also need to indicate size of quantity being relocated: • Two sizes: 9 bit pgoffset and 16 bit full word • M0000_16 • M0001_9 • M0002_9 • One disadvantage of this approach: • - CPE 471
Alternative: Bit Masks • Use 1 bit / memory cell (or byte) • bit value is 0 means no relocation necessary • bit value is 1 means relocation necessary • Size of relocation data independent of number of records needing modification • but more densely packed (1 bit / text record) • Hard to read (debug, grade,…) CPE 471
Kinds of Data • Our machine has two flavors of data: • relative (to the load address) • absolute • • The first must be modified, the second not • • Let’s look at how these kinds arise… CPE 471
Symbols • Some are relative: • e.g., • Some are absolute: • e.g., • Symbol Table CPE 471
Searching • We must associate a name with a value. • Example: A symbol table is a collection of <key, value> pairs: • Search is given a key, return the corresponding value. • Very important for assemblers. Every line has an instruction (look up in MOT or POT). Lots of symbols (or literals) used. 50% of time searching tables. CPE 471