1 / 69

Day 10 Agenda

Day 10 Agenda. Exceptions System Design Memory Interface Synchronization Input / Output. Exception Handling. 0x1C. 0x18. 0x14. 0x10. 0x0C. 0x08. 0x04. 0x00. When an exception occurs, the ARM: Copies CPSR into SPSR_<mode> Sets appropriate CPSR bits Change to ARM state

gautam
Download Presentation

Day 10 Agenda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Day 10 Agenda • Exceptions • System Design • Memory Interface • Synchronization • Input / Output

  2. Exception Handling 0x1C 0x18 0x14 0x10 0x0C 0x08 0x04 0x00 • When an exception occurs, the ARM: • Copies CPSR into SPSR_<mode> • Sets appropriate CPSR bits • Change to ARM state • Change to exception mode • Disable interrupts (if appropriate) • Stores the return address in LR_<mode> • Sets PC to vector address • To return, exception handler needs to: • Restore CPSR from SPSR_<mode> • Restore PC from LR_<mode> This can only be done in ARM state. FIQ IRQ (Reserved) Data Abort Prefetch Abort Software Interrupt Undefined Instruction Reset Vector Table Vector table can be at 0xFFFF0000 on ARM720T and on ARM9/10 family devices

  3. PSR Mode Bit Values

  4. Normal and High Vector Address

  5. Reset • When the nRESET signal goes LOW, the core abandons executing instruction and • Forces the PC to fetch the next instruction from address 0x00. • When nRESET goes HIGH again, then Core • Overwrites R14_svc and SPSR_svc by copying the current values of the PC and CPSR into them. The value of the saved PC and SPSR is not defined. • Forces M[4:0] to 10011 (Supervisor mode), sets the I and F bits in the CPSR, and clears the CPSR's T bit. • Execution resumes in ARM state.

  6. Undefined Exception • When the core comes across an instruction which it cannot handle, it takes the undefined instruction trap. • This mechanism may be used to extend either the THUMB or ARM instruction set by software emulation. • R14_udf = Address of next instruction address after the undefined instruction • SPSR_udf = CPSR • CPSR[4:0] = 0b11011 (Mode bits forced to undef state) • CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) • Forces the PC to fetch the next instruction from address 0x04 or 0xFFFF0004 • After emulating the failed instruction, the trap handler should execute the following irrespective of the state (ARM or Thumb) • CPSR = SPSR_udf • MOVS PC,R14_und (This restores the CPSR and returns to the instruction following the undefined instruction)

  7. Software Interrupts 0 31 28 27 24 23 Cond 1 1 1 1 SWI number (ignored by processor) Condition Field • The software interrupt instruction (SWI) is used for entering Supervisor mode, usually to request a particular supervisor function. • R14_svc = Address of next instruction after the SWI instruction • SPSR_svc = CPSR • CPSR[4:0] = 0b10011 • CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) • Forces the PC to fetch the next instruction from address 0x08 or 0xFFFF0008 • Upon Exiting SWI • CPSR = SPSR_svc • MOVS PC,R14_svc (This restores the PC and CPSR, and returns to the instruction following the SWI)

  8. Pre-fetch Abort Instruction • If a pre-fetch abort occurs, the pre-fetched instruction is marked as invalid, but the exception will not be taken until the instruction reaches the head of the pipeline. If the instruction is not executed - for example because a branch occurs while it is in the pipeline - the abort does not take place. • R14_abt = Address of aborted instruction + 4 • SPSR_abt = CPSR • CPSR[4:0] = 0b10111 • CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) • Forces the PC to fetch the next instruction from address 0x0C or 0xFFFF000C • Upon Exiting Pre-Fetch Abort • CPSR = SPSR_abt • SUBS PC,R14, #4 (This restores the PC and CPSR, and returns to the instruction following the Pre-Fetch abort)

  9. Data Abort • If a data abort occurs, the action taken depends on the instruction type: • Single data transfer instructions (LDR, STR) write back modified base registers: the Abort handler must be aware of this. • The swap instruction (SWP) is aborted as though it had not been executed. • Block data transfer instructions (LDM, STM) complete. • If write-back is set, the base is updated. • If the instruction would have overwritten the base with data (ie it has the base in the transfer list), the overwriting is prevented. • All register overwriting is prevented after an abort is indicated, which means in particular that R15 (always the last register to be transferred) is preserved in an aborted LDM instruction. • The abort mechanism allows the implementation of a demand paged virtual memory system. In such a system the processor is allowed to generate arbitrary addresses. When the data at an address is unavailable, the Memory Management Unit (MMU) signals an abort.

  10. Data Abort • The abort handler must then work out the cause of the abort, make the requested data available, and retry the aborted instruction. The application program needs no knowledge of the amount of memory available to it, nor is its state in any way affected by the abort • Entering Data Abort • R14_abt = Address of aborted instruction + 8 • SPSR_abt = CPSR • CPSR[4:0] = 0b10111 • CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) • Forces the PC to fetch the next instruction from address 0x10 or 0xFFFF0010 • Upon Exiting Data Abort • CPSR = SPSR_abt • SUBS PC,R14, #8 (This restores the PC and CPSR, and re-executes the aborted instruction) • SUBS PC,R14, #4 (This restores the PC and CPSR, and returns to the instruction following the data abort instruction)

  11. Interrupt Request (IRQ) Exception • The IRQ (Interrupt Request) exception is a normal interrupt caused by a LOW level on the nIRQ input. IRQ has a lower priority than FIQ and is masked out when a FIQ sequence is entered. It may be disabled at any time by • setting the I bit in the CPSR, though this can only be done from a privileged (non-User) mode. • Entering IRQ • R14_irq = Address of next instruction + 4 • SPSR_irq = CPSR • CPSR[4:0] = 0b10010 • CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) • Forces the PC to fetch the next instruction from address 0x18 or 0xFFFF0018 • Exiting IRQ • CPSR = SPSR_irq • SUBS PC,R14_irq, #4 (This restores the PC and CPSR, and returns to the instruction)

  12. Fast Interrupt Request (FIQ) Exception • The FIQ (Fast Interrupt Request) exception is designed to support a data transfer or channel process, and in ARM state has sufficient private registers to remove the need for register saving (thus minimizing the overhead of context switching). • FIQ is externally generated by taking the nFIQ input LOW. This input can accept either synchronous or asynchronous transitions, depending on the state of the ISYNC input signal. When ISYNC is LOW, nFIQ and nIRQ are considered asynchronous, and a cycle delay for synchronization is incurred before the interrupt can affect the processor flow. • Entering FIQ • R14_fiq = Address of next instruction + 4 • SPSR_fiq = CPSR • CPSR[4:0] = 0b10001 • CPSR[T,FIQ,IRQ] = 0b011 (ARM State, and Disable FIQ’s & IRQs) • Forces the PC to fetch the next instruction from address 0x1C or 0xFFFF001C • Exiting FIQ • CPSR = SPSR_fiq • SUBS PC,R14_fiq, #4 (This restores the PC and CPSR, and returns to the instruction)

  13. Return Address Calculation

  14. Exception Priorities Highest priority: • 1. Reset • 2. Data abort • 3. FIQ • 4. IRQ • 5. Pre-fetch abort Lowest priority: • 6. Undefined Instruction and Software interrupt.

  15. Agenda Exceptions • System Design Memory Interface Synchronization Input / Output

  16. Example ARM-based System ARM Core Peripherals 32 bit RAM 16 bit RAM Interrupt Controller I/O nIRQ nFIQ 8 bit ROM

  17. AMBA Advanced Microcontroller Bus Architecture Open specification framework for System-on-Chip (SoC) Designs AMBA Arbiter Reset ARM TIC Timer Remap/ Pause External ROM External Bus Interface Bus Interface Bridge External RAM Interrupt Controller On-chip RAM Decoder AHB or ASB APB System Bus Peripheral Bus

  18. AHB The widely adopted AHB System Bus connects embedded processors such as an ARM core to high-performance peripherals, DMA controllers, on-chip memory and interfaces. APB The AMBA APB (Advanced Peripheral Bus) is a simpler bus protocol designed for ancillary or general purpose peripherals ADK The AMBA Design Kit is a library of components which enables system developers to build AMBA based systems quickly and accurately. ACT The AMBA Compliance Testbench, a comprehensive environment which enables the rapid development of tests to certify the IP as AMBA compliant. PrimeCell ARM’s AMBA compliant peripherals AMBA

  19. Agenda Exceptions System Design • Memory Interface Synchronization Input / Output

  20. Memory Interface • Memory Hierarchy Memory Size and Speed ARM MMU Memory Interfacing

  21. Memory • Memories come in many shapes, sizes and types • Shapes means packages like TQFP, TSOP, DIP Surface Mount • Size: Like 4Mx8-Bit, 16Kx1­bit)

  22. Memory Technologies • DRAM: Dynamic Random Access Memory • upside: very dense (1 transistor per bit) and inexpensive • downside: requires refresh and often not the fastest access times • often used for main memories • SRAM: Static Random Access Memory • upside: fast and no refresh required • downside: not so dense and not so cheap • often used for caches • ROM: Read­Only Memory • often used for bootstrapping and such

  23. Exploiting Memory Hierarchy • Users want large and fast memories! SRAM access times are 2 - 25ns at cost of $100 to $250 per Mbyte.DRAM access times are 60-120ns at cost of $5 to $10 per Mbyte.Disk access times are 10 to 20 million ns at cost of $.10 to $.20 per Mbyte. • Try and give it to them anyway • build a memory hierarchy 1997

  24. The Memory Pyramid

  25. Locality • A principle that makes having a memory hierarchy a good idea • If an item is referenced,temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be referenced soon. Why does code have locality? • Our initial focus: two levels (upper, lower) • block: minimum unit of data • hit: data requested is in the upper level • miss: data requested is not in the upper level

  26. Cache • Two issues: • How do we know if a data item is in the cache? • If it is, how do we find it? • Our first example: • block size is one word of data • "direct mapped" For each item of data at the lower level, there is exactly one location in the cache where it might be. e.g., lots of items at the lower level share locations in the upper level

  27. Direct Mapped Cache • Cache Memory • 64-way set-associative cache with I-Cache and D-Cache 16KB each • 8words length per line with one valid bit and two dirty bits per line • Pseudo random or round robin replacement algorithm • Write-through or write-back cache operation to update the main memory • The write buffer can hold 16 words of data and four addresses. 64 Cache Line Index CAM RAM

  28. Memory Interface Memory Hierarchy • Memory Size and Speed ARM MMU Memory Interfacing

  29. Storage Basics • CPU sees the RAM as one long, thin line of bytes • That doesn't mean that it's actually laid out that way • Real RAM chips don't store whole bytes, but rather they store individual bits in a grid, which you can address one bit at a time

  30. SRAM Memory Timingfor Read Accesses 2147H High-Speed 4096x1-bit static RAM 2147H Dout A11-A0 WE CS Din • Address and chip select signals are provided tAA before data is available • Outputs reflect new data tRC tAA Address A11-A0 old address new address CS WE high impedance undef Address Bus Dout Data Valid tHz tACS tRC = Read cycle time tAA = Address access time tACS = Chip select access time tHZ = Chip deselections to high­Z out

  31. SRAM Memory Timing for Write Accesses • Address and data must be stable tS time-units before write enable signal falls tWC tAA Address A11-A0 old address new address tS CS WE Address Bus 2147H High-Speed 4096X1-bit static RAM Din old data new data tHz 2147H tACS Din A11-A0 tS = Signal setup time tRC = Read cycle time tAA = Address access time tACS = Chip select access time tHZ = Chip deselections to high­Z out WE CS Din

  32. DRAM Organization and Operations • In the traditional DRAM, any storage location can be randomly accessed for read/write by inputting the address of the corresponding storage location. • A typical DRAM of bit capacity 2N * 2M consists of an array of memory cells arranged in 2N rows (word-lines) and 2M columns (bit-lines). • Each memory cell has a unique location represented by the intersection of word and bit line. • Memory cell consists of a transistor and a capacitor. The charge on the capacitor represents 0 or 1 for the memory cell. The support circuitry for the DRAM chip is used to read/write to a memory cell.

  33. DRAM Organization and Operations • Address decoders to select a row and a column • Sense amps To detect and amplify the charge in the capacitor of the memory cell. • Read/Write logic To read/store information in the memory cell. • Output Enable logic Controls whether data should appear at the outputs. • Refresh counters To keep track of refresh sequence.

  34. DRAM Memory Access • DRAM Memory is arranged in a XY grid pattern of rows and columns. • First, the row address is sent to the memory chip and latched, then the column address is sent in a similar fashion. • This row and column-addressing scheme (called multiplexing) allows a large memory address to use fewer pins. • The charge stored in the chosen memory cell is amplified using the sense amplifier and then routed to the output pin. • Read/Write is controlled using the read/write logic.

  35. How DRAM Works

  36. DRAM Memory Access A typical DRAM read operation: • The row address is placed on the address pins visa the address bus • RAS pin is activated, which places the row address onto the Row Address Latch. • The Row Address Decoder selects the proper row to be sent to the sense amps. • The Write Enable is deactivated, so the DRAM knows that it’s not being written to. • The column address is placed on the address pins via the address bus • The CAS pin is activated, which places the column address on the Column Address Latch • The CAS pin also serves as the Output Enable, so once the CAS signal has stabilized, the sense amps place the data from the selected row and column on the Data Out pin so that it can travel the data bus back out into the system. • RAS and CAS are both deactivated so that the cycle can begin again.

  37. DRAM Performance Specs • Important DRAM Performance Considerations • Random access time: time required to read any random single cell • Fast Page Cycle time: time required for page mode access ­­ read/write to memory location on the most recently­accessed page (no need to repeat RAS in this case) • Extended Data Out (EDO): allows setup of next address while current data access is maintained • SDRAM ­ Burst Mode: Synchronous DRAMs use a self­incrementing counter and a mode register to determine the column address sequence after the first memory location accessed on a page ­­ effective for applications that usually require streams of data from one or more pages on the DRAM • Required refresh rate: minimum rate of refreshes

  38. Turning Bits Into Bytes (2x This Picture)

  39. Memory Interface Memory Hierarchy Memory Size and Speed • ARM MMU Memory Interfacing

  40. ARM MMU • Complex VM and protection mechanisms • Presents 4 GB address space (why?) • Memory granularity: 3 options supported • 1MB sections • Large pages (64 KBytes) ­ access control within a large page on 16 KBytes • Small pages (4 KBytes) ­ access control within a large page on 1 Kbytes • Puts processor in Abort Mode when virtual address not mapped or permission check fails • Change pointer to page tables (called the translation table base, in ARM jargon) to change virtual address space • useful for context switching of processes

  41. Example: Single-Level Page Table 31 12 11 0 Virtual Address value = x value = y 32 bits page table x page frame y 220 entries 212 entries data 8 bits Size of page table = 220 * 32 bits = 4 Mbytes Size of page = 212 * 8 bits = 4 Kbytes

  42. Single-Level Page Table • Assumptions • 32-bit virtual addresses • 4 Kbyte page size = 212 bytes • 32-bit address space • How many virtual page numbers? • 232 / 212 = 220 = 1,048,576 virtual page numbers = number of entries in the page table • If each page table entry occupies 4 bytes, how much memory is needed to store the page table? • 220 entries * 4 bytes = 222 bytes = 4 Mbytes

  43. Example: Two­level Page Table 31 22 21 12 11 0 Virtual Address value = z value = x value = y x page directory 210 entries 210 entries y page table page frame z 212 entries data 32 bits 32 bits Size of page directory = 210 * 32 bits = 4 Kbytes 8 bits Size of page table = 210 * 32 bits = 4 Kbytes Size of page = 212 * 8 bits = 4 Kbytes

  44. Two-Level Page Table • Assumptions • 210 entries in page directory (= max number of page tables) • 210 entries in page table • 32 bits allocated for each page directory entry • 32 bits allocated for each page table entry • How much memory is needed? • Page table size = 210 entries * 32 bits = 212 bytes = 4 Kbytes • Page directory size = 210 entries * 32 bits = 212 bytes = 4 Kbytes

  45. Two-Level Page Table • Small (typical) system • One page table might be enough • Page directory size + Page table size = 8 Kbytes of memory would suffice for virtual memory management • How much physical memory could this one page table handle? • Number of page tables * Number of page table entries * Page size = 1 * 210 * 212 bytes = 4 Mbytes • Large system • You might need the maximum number of page tables • Max number of page tables * Page table size = 210 directory entries * 212 bytes = 222 bytes = 4 Mbytes of memory would be needed for virtual memory management • How much physical memory could these 210 page tables handle? • Number of page tables * Number of page table entries * Page size = 210 * 210 * 212 bytes = 4 Gbytes

  46. Memory Interface Memory Hierarchy Memory Size and Speed ARM MMU • Memory Interfacing

  47. Interfacing External Memory • Little/Big Endian support • Address space: 4G bytes, (Differs in processor Implementation) • Supports programmable 8/16/32-bit data bus width for each bank • External address lines vary for a specific processor implementation • Programmable bank start address and bank size for bank 7 • Eight memory banks: • Memory banks for ROM, SRAM or Synchronous DRAM • Fully Programmable access cycles for all memory banks • Supports external wait signals to expend the bus cycle • Supports self-refresh mode in SDRAM for power down • Supports various types of ROM for booting (NOR/NAND Flash, EEPROM, and others) • The write buffer can hold 16 words of data and four addresses.

  48. CPU ­ Memory Interface address bus CPU Memory data bus Read Write Ready size • CPU ­ Memory Interface usually consists of: • uni­directional address bus • bi­directional data bus • read control line • write control line • ready control line • size (byte, word) control line • Memory access involves a memory bus transaction • read: (1) set address, read and size, (2) copy data when ready is set by memory • write: (1) set address, data, write and size, (2) done when ready is set

  49. Memory Subsystem Components • Memory subsystems generally consist of chips+controller • Each chip provides few bits (e.g., 1­4) per access • Bits from multiple chips are accessed in parallel to fetch bytes and words • Memory controller decodes/translates address and control signals • Controller can also be on memory chip • Example: • contains 8 16x1­bit chips and very simple controller address bus CPU Memory data bus Read Write Ready Size 16x8-bit memory array 0000 1-of-16 decoder 1 0 1 1 0 0 1 0 0001 1 0 0 0 0 0 0 1 address 1111 0 1 0 1 0 0 1 1 D7 D6 D5 D4 D3 D2 D1 D0 16x1-bit memory chip

  50. EEPROM Interfacing Memory Interface with 8-bit ROM ARMMEMORY A0 – A15 A0 – A15 D0 – D7 DQ0 – DQ7 WE WE OE OE GCS CE Memory Interface with 8-bit ROM

More Related