1 / 48

SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture

VARD. ER. SUP. HAR. Arc. hit. ect. ure. SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture. Nagendra Doddapaneni. Overview . Harvard Architecture Super Harvard Architecture TigerSHARC processor. Outline. Background Harvard Architecture Why? What? Modern CPU Chip Design

merton
Download Presentation

SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VARD ER SUP HAR Arc hit ect ure SHARC‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni

  2. Overview • Harvard Architecture • Super Harvard Architecture • TigerSHARC processor

  3. Outline • Background • Harvard Architecture • Why? • What? • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor

  4. Outline • Background <- • Harvard Architecture • Why? • What? • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor

  5. Background • von Neumann Architecture • Single storage for instructions and data • Digital Signal Processors • Specialized microprocessor designed specifically for digital signal processing, generally in real time

  6. Outline • Background • Harvard Architecture • Why? <- • What? • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor

  7. Why Harvard Architecture ? • von Neumann bottleneck (‘memory bound’) • DSP applications • In von Neumann architecture • Either reading an instruction • Or reading/writing from/to memory

  8. Harvard Architecture (cont…)

  9. Outline • Background • Harvard Architecture • Why? • What? <- • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor

  10. What is Harvard Architecture ? • Physically separate storage and signal pathways for instruction and data • Next instruction fetched, when executing current instruction • Program memory can be small and wide • Data memory can be large and narrower

  11. Outline • Background • Harvard Architecture • Why? • What? • Modern CPU Chip Design <- • Super Harvard Architecture • TigerSHARC Processor

  12. Modern CPU chip design • Incorporate features from both architectures • ‘On chip’ cache memory – divided into instruction cache and data cache. Harvard architecture used when CPU accesses cache memory. • On a cache miss, ‘off chip’ main memory is accessed using von Neumann architecture. Main memory is not separated into data and instruction sections.

  13. Outline • Background • Harvard Architecture • Why? • What? • Modern CPU Chip Design • Super Harvard Architecture <- • TigerSHARC Processor

  14. Super Harvard Architecture • Cache used to store instructions, leaving both instruction bus and data bus free to fetch operands • Harvard Architecture + cache = Extended Harvard Architecture or Super Harvard Architecture

  15. Outline • Background • Harvard Architecture • Why? • What? • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor <-

  16. TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications

  17. TigerSHARC Processor • Processor Architecture <- • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications

  18. TigerSHARC Processor Architecture • 3 128-bit data buses • 2 IALU’s • 2 Computational Blocks • ALU ( Float and Integer ) • SHIFTER • MULTIPLIER • CLU

  19. TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation <- • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications

  20. TigerSHARCInstruction Parallelism and SIMD Operation • Core can execute simultaneously one to four 32-bit instructions encoded in single instruction line (VLIW). • Can execute in parallel? Depends on…. • Instruction line resources each requires • Source and Destination of registers used • Supports SIMD operations through the use of both Computational Blocks in parallel. • Each Computational Block can execute four 16-bit or eith 8-bit SIMD computations in parallel.

  21. TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU <- • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications

  22. TigerSHARCInteger ALU • 31 32 bit general registers + 1 status register + 8 dedicated registers for circular buffers • Performs integer ALU operations and data addressing • ALU instructions: ADD, SUB, ARS, LRS (right shifts only), ROT (left and right), AND NOT, NOT, OR, XOR, ABS, MIN, MAX, CMP • Status flags: zero (Z), negative (N), overflow (V), carry (C) • Instruction conditions: EQ, LT, LE, NEQ, NLT, NLE • Instruction options: unsigned (U), circular buffer (CB), bit reverse (BR), computed jump (CJMP) • Address related operations: data address generation, circular buffers, bit reverse, UREG moves, DAB control.

  23. TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File <- • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K Buses • DMA Controller • Applications

  24. TigerSHARC Computational BlocksX and Y Register File • Register File Syntax • Each Block has 32x32 bit Data registers • Each register can store 4x8 bit, 2x16 bit or 1x32 bit words. • Registers can be combined into dual or quad groups. These groups can store 8, 16, 32, 40 or 64 bit words.

  25. Register File Syntax TigerSHARC Computational BlocksX and Y Register File

  26. Volatile registers in each block • 24 Volatile Data registers in each block • XR0 – XR23 • YR0 – YR23 • 2 ALU summation registers in each block • XPR0, XPR1, YPR0, YPR1 • 5 MAC accumulate registers in each block • XMR0 – XMR3, YMR0 – YMR3 • XMR4, YMR4 – Overflow registers

  27. TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU <- • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications

  28. TigerSHARC X and Y ALU • 2x64 bit input paths • 2x64 bit output paths • 8, 16, 32, or 64 bit addition/subtraction - Fixed-point • 32 or 64 bit logical operations - fixed-point • 32 or 40 bit floating-point operations

  29. Sample ALU Instruction • Example of 16 bit addition • XYSR1:0 = R31:30 + R25:24 • Performs addition in X and Y Compute Blocks

  30. TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier <- • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications

  31. TigerSHARC Multiplier • Operates on fixed, floating and complex numbers. • Fixed-Point numbers • 32x32 bit with 32 or 64 bit results • 4 (16x16 bit) with 4x16 or 4x32 bit results • Floating-Point numbers • 32x32 bit with 32 bit result • 40x40 bit with 40 bit result • Complex Numbers • 32x32 bit with 32 bit result • Fixed-point only • Results stored in MR register

  32. TigerSHARC Multiplier XR0 = R1*R2;; XR1:0 = R3*R5;; XMR1:0 = R3*R5;; //uses XMR4 overflow XR2 = MR3:2, XMR3:2 = R3*R5;; XR3:2 = MR1:0, XMR1:0 = R3*R5;; XFR0 = R1*R2;; XFR1:0 = R3:2*R5:4;; //40 bit multiply //32 bit mantissa

  33. TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter <- • CLU • Program Sequencer • I J and K data buses • DMA Controller • Applications

  34. TigerSHARCShifter • Operates on one 64-bit, one or two 32-bit, two or four 16-bit, and four or eight 8-bit fixed-point operands • Shifts and rotates bits • manipulation operations, like bit set, clear, toggle and test • Bit FIFO operations to support bit streams

  35. TigerSHARC Processor • Processor Architecture • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU <- • Program Sequencer • J and K data buses • I bus – data bus

  36. TigerSHARC CLU • CLU instructions are designed to support different algorithms used for communications applications • Algorithms supported are • Viterbi Decoding (minimal distance decoding algorithm) • Turbo-code Decoding (variant of Viterbi decoding) • De-spreading for Code Division Multiple Access (CDMA) systems (used for tasking a signal in wide Pseudo Noise spread bandwidth)

  37. TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer <- • I J and K buses • DMA Controller • Applications

  38. TigerSHARC Program Sequencer • Supplies instruction addresses to memory • IAB caches up to five fetched instruction lines waiting to execute • It extracts an instruction line from IAB and distributes to appropriate core component for execution • Determine flow control for instructions like JMP, CALL • Reduce branch delays using branch prediction and BTB

  39. TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses <- • DMA Controller • Applications

  40. TigerSHARC architecture at a glance

  41. TigerSHARC Buses • DRAM divided into 6 blocks of 4Mbits • 6 blocks connect to four 128-bit wide internal buses through a crossbar connection • Internal bus architecture provides a total memory bandwidth of 32Gbytes/sec • Core and I/O can access • twelve 32-bit data words • four 32-bit instructions per cycle

  42. TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller <- • Applications

  43. TigerSHARC DMA Controller • On-chip, with 14 DMA channels • Provide zero-overhead data transfers • Operates independently and invisibly to the DSP’s core

  44. TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications <-

  45. TigerSHARC Applications

  46. References • ANALOG DEVICES • http://www.analog.com/processors/processors/tigersharc/index.html • http://www.analog.com/processors/processors/sharc/index.html • http://www.analog.com/processors/resources/teachingResources.html • ECE-ADI-PROJECT HOME PAGE • http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/index.html • http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/otherschoolsFrame.htm

  47. Summary • What is Harvard Architecture? • What is Super Harvard Architecture? • TigerSHARC processor architecture • How TigerSHARC is ‘faster’ for targeted DSP applications?

  48. Questions? Thank You.

More Related