570 likes | 746 Views
A bit about computer architecture. CS 147, Fall Semester 2007 Robert Correll. Overview. RISC microprocessor design Diagnostic testing Software development Microprocessor features System-on-Chip (SoC). RISC microprocessor design. 12 members on the team: Design Manager (1)
E N D
A bit about computer architecture CS 147, Fall Semester 2007 Robert Correll
Overview • RISC microprocessor design • Diagnostic testing • Software development • Microprocessor features • System-on-Chip (SoC)
RISC microprocessor design • 12 members on the team: • Design Manager (1) • ASIC Design Engineers (9) • Diagnostics Manager (1) • Software Engineer (1) • Culture: • High-tech (Verilog) • Very quiet
Embedded 32-bit microprocessor • Earns Editor's Choice Award • Microprocessor Report Names IDT’s RC32364 Best Embedded Processor for Price/Performance • (Volume 12, Number 7, June 1, 1998)
Embedded processor-based applications • Low-end routers and switches • Cellular base stations • Consumer multimedia game systems
Device Overview • MIPS-II RISC architecture with enhancements • Scalar 5-stage pipeline minimizes branch and load delays • DSP engine capable of doing 1 multiply accumulate instruction every 2 clock cycles
Device Overview (continued) • Enhanced instruction set architecture • MIPS-IV compatible conditional move instructions • MIPS-IV superset PREF (prefetch) instruction • Fast multiplier with atomic multiply-add, multiply-sub • Count leading zero/one instructions
Device Overview (continued) • Large, efficient on-chip caches • Separate 8KB Instruction cache and 2KB Data cache • 2-way set associative • Write-back and write-through support on a per page basis • Optional cache locking, with per line resolution, to facilitate deterministic response • Simultaneous instruction and data fetch in each clock cycle, achieves over l GB/sec bandwidth
Device Overview (continued) • Flexible MMU with 32-page TLB • Variable page size • Enhanced write algorithm support • Variable number of locked entries • No performance penalty for address translation
Device Overview (continued) • Flexible bus interface allows simple, low-cost designs • Bus interface runs at a fraction of pipeline rate Programmable port-width interface (8-,16-, 32-bit memory and I/O regions) • Programmable bus turnaround (BTA) times • Supports single datum or burst transactions • Selectable system byte-ordering
Diagnostic Testing • Began with 300 tests and behavior model • Downloaded 10 to 40 new tests per day • One test per directory • Build each test • Run each test on an RTL model • Debug and track failures • Finished with more than 3,000 tests
Software Development • Test Release System • Automated regression process • Distributed jobs based upon cycle counts • Provided customized history reports • Accumulated load per signal utility • Test vectors • Many other value-added scripts • Diagnostic tests
Load Link Store Conditional Opcodes li $9, 1 sw $9, 0($6) .word 0xc0850000 # opcode # ll $5, 0($4) bne $5, $0, Fail # verify sem = 0 li $5, 2 li $9, 2 sw $9, 0($6) .word 0xe0850000 # opcode # sc $5, 0($4) bne $5, $8, Fail # verify sc indicates success li $8, 2
CPU Pipeline Stages • 1I - Instruction Fetch, Phase one • Instruction address translation begins • 2I - Instruction Fetch, Phase two • Instruction cache fetch begins • Instruction address translation continues
CPU Pipeline Stages (continued) • 1R - Register Fetch, Phase one • The instruction cache fetch finishes. • The instruction cache tag is checked against the physical page frame number obtained from the address translation.
CPU Pipeline Stages (continued) • 2R - Register Fetch, Phase two • The instruction decoder decodes the instruction. • Any required operands are fetched from the register file. • Make a decision to either issue or slip (for an interlock condition). • For a branch, the branch address is calculated.
CPU Pipeline Stages (continued) • 1A - Execution, Phase one • Any result from the A or D stages are bypassed. • The arithmetic logic unit (ALU) starts the integer arithmetic, logical or shift operation. • The ALU calculates the data virtual address for load and store instructions. • The ALU determines whether the branch condition is true.
CPU Pipeline Stages (continued) • 2A - Execution, Phase two • The integer arithmetic, logical or shift operation will complete. • A data cache access will start. • Store data is shifted to the specified byte position(s). • The data virtual to physical address translation will start.
CPU Pipeline Stages (continued) • 1D - Data Fetch, Phase one • The data cache access will continue. • The data address translation completes. • 2D - Data Fetch, Phase two • The data cache access will finish and the data is then shifted down and extended. • The data cache tag is checked against the physical address for any data cache access.
CPU Pipeline Stages (continued) • 1W - Write Back, Phase one • The processor uses this phase internally to resolve all exceptions in preparation for the register file write. • 2W - Write Back, Phase two • For register-to-register and load instructions, the result is written back to the register file. • Branch instructions perform no operation during this stage.
Stall Conditions • Detected after the R pipe-stage. • The processor will resolve the condition. • Detect cache miss • Start moving dirty cache line data to write buffer • Get first doubleword into cache and restart pipeline • Load remainder of cache line into cache
Slip Conditions • Slipped instructions are retried on subsequent cycles • Detect cache miss • Get entire cache line into cache • Continue pipeline • Inserted NOP instructions
Memory Management Unit (MMU) • Generates translation lookaside buffer (TLB) exceptions such as: • TLB refill • TLB invalid • TLB modified • Offers the following advantages: • Variable page size • Enhanced Write Algorithm support • Mapping of a larger portion of the virtual address space • Variable number of locked entries
CPU Exception Processing • Begins when the processor receives and detects exceptions such as: • address translation errors • arithmetic overflows • I/O interrupts • system calls • Processor suspends normal instruction sequence and enters Kernel mode
CPU Exception Processing (continued) • Processor then disables interrupts, • Forces execution of a software handler, which is located at a fixed address. • The handler may save processor context: • program counter contents • current operating mode (User or Kernel mode) • interrupt status (enabled or disabled)