1 / 82

On-Chip Communication (Architecture and Design)

On-Chip Communication (Architecture and Design). Sungjoo Yoo ISRC, SNU. Contents. Part 1 Introduction to on-chip communication On-chip communication architecture Software architecture Hardware architecture On-chip communication networks Part 2

cybille
Download Presentation

On-Chip Communication (Architecture and Design)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On-Chip Communication(Architecture and Design) Sungjoo Yoo ISRC, SNU

  2. Contents • Part 1 • Introduction to on-chip communication • On-chip communication architecture • Software architecture • Hardware architecture • On-chip communication networks • Part 2 • Analysis and optimization of on-chip communication network • On-chip communication design on unreliable interconnect • Open issues and summary

  3. Part 1

  4. Introduction • On-chip communication design M1 M3 High-level functional specification M2 mP IP M1 M3 M1 SoC Implementation of on-chip communication architecture SW wr. HW wr. HW wr. Physical Communication Network

  5. Designer’s Objectives and Problems • High-performance • What is the maximum bandwidth of wire? • What is the best suited OCA? • Low power consumption • What is the minimum energy required to send the given amount of data? • How to achieve the minimum energy? • Small HW/SW overhead • Interconnection and transceiver • Conflicting objectives • Trade-offs

  6. Incremental Refinement of On-Chip Communication

  7. Specification of On-Chip Communication • Abstraction levels of on-chip communication • Client/server level • Message level • Transaction level • Implementation level

  8. Client/Server Level • Concept • Service request/provide relation • A client component demands a service from server(s). • Service provider component may not be fixed and can be determined dynamically • Object request broker (ORB) is needed. • Real example • Modem service • PDA device: baseband modem  vocoder • Modem service can be Bluetooth, IEEE802.11, CDMA2000, GPS, etc. depending on the location of PDA device. • Indoor: Bluetooth or IEEE802.11 • Outdoor: IEEE802.11 (short range) or CDMA2000

  9. Message Level • Concept • Components communicate with each other via messages. • Message sender/receiver are fixed. • A message can have any type of data. • Real example • PDA: In the CDMA2000 mode, the vocoder sends messages to the CDMA2000 modem. • A message has a frame of voice data and control info.

  10. Transaction Level • Concept • Components are mapped on real processors. • Communication is mapped on abstract communication networks. • Communication protocols are fixed. • Transaction can be read, write, burst_read, burst_write, etc. • For each candidate of real communication networks, the transaction performance can be analyzed. • Real example • PDA: vocoder on a DSP, modem on an IP, candidate communication networks (AMBA, Sonics, IBM, ...) • Determine bus priorities, packet priorities, TDMA slot assignment, etc.

  11. SW architecture HW architecture Implementation Level • On-chip communication architecture is implemented. • Software and hardware architecture mP, DSP Local memory w/ I/D caches Application SW Middleware HW IP OS Device drivers DMA Memory Processor local bus Adapter Adapter Adapter Communication network (OCBs w/ bridges, Sonics, packet/circuit switch, etc.)

  12. On-Chip Communication Architecture • Software • Middleware, OS, device driver and ISR, memory instructions • Hardware • DMA, (bus) adapter, communication network (OCBs and bridges, packet network, etc.), memory

  13. Software On-Chip CommunicationArchitecture • Middleware: CORBA, COM+, JAVA, BREW • Service resolution • ORB implementation • Dynamic reconfiguration of services needs to be supported. • 802.11 baseband modem in HW --> Bluetooth in SW • Operating system • Communication services • pipe, shared memory, semaphore, mutex, etc. • Supported as OS system calls

  14. Software On-Chip Communication Architecture • Device driver and ISR • The device driver depends on OS and the processor • OS • Preemptive or not, interrupt or not, synchronization services (semaphore, lock var, …) • Processor • Bus width, register set, exception behavior, etc. • Memory instructions • Load/store, load multiple/store multiple instructions • Cache/virtual memory instructions in ARM v6 architecture

  15. IP(mP) adapter Ch. adp Ch. adp Hardware On-Chip Communication Architecture • DMA (Direct Memory Access) • Block size • Adapter • Basic functionality: protocol conversion • E.g. VCI -- AMBA • Local communication architecture • Distributed bus arbitration/network routing: e.g. Sonics, packet switch network mP mP IP M4 M1 M1 M3 OS OS Adapter Adapter Adapter AMBA CoreConnect

  16. Hardware On-Chip Communication Architecture • Communication network • On-chip bus • AMBA, CoreConnect, PI, etc. • Sonics mNetwork • On-chip communication network • Circuit switch • Philips • Packet switch • W. Dally (DAC01), Guerrir (DATE00)

  17. Hardware On-Chip Communication Architecture • On-chip memory • Shared memory • E.g. external SDRAM in multimedia chips • Distributed memory w/ caches: e.g. Daytona architecture • Four 64-bit processing elements (PE’s) • Each PE • - 32-bit RISC with DSP enhancements • - 64-bit vector co-processor (four MAC’s) • Split-transaction bus • - Shared memory based on L1 cache snooping • - Caches reduce bus traffic. • Embedded RTOS dynamically schedules tasks. • 120mm2, 0.35m, 100MHz

  18. Hardware On-Chip Communication Architecture • On-chip memory (cont’d) • On-chip implementation of linked list • Philips, DATE01 • Data transfer and storage exploration (DTSE) • IMEC • Focus on low power consumption and area of memory

  19. On-Chip Communication Networks • Routing • Sonics mNetwork SiliconBackplane • Philips, Circuit Switch Network • Packet Switch Networks, Guerrir, DATE00 • Network topologies • Mesh, W. Dally, DAC2001 • Octagon, ST Microelectronics, DAC2001

  20. Sonics mNetwork SiliconBackplane • On-chip bus • Time-division multiple access (TDMA)

  21. Pre-characterized on-chip bus agent

  22. Two-step Arbitration • Originally assigned module  TDMAIf no bus access  priority-based

  23. Pipelined TDMA Bus Arbitration • Pipeline depth • Based on memory target latency at the desired clock frequency

  24. Design Example: Carrier-Class VOIPProcessing Card DSP + CPU banks + IO + DRAM DSP: ~16 processors voice and modem protocols LEC CPU: ~4 processors Packet protocols Control (call setup) Hi BW SDRAM

  25. Communication Bandwidth Requirements: Basic I/O IO traffic is low BW Data IO rates = 1000 ch x 64kb/s x 3 full duplex = 48MB/s (worst case) Data are buffered to SDRAM

  26. Communication Bandwidth Requirements: Cache Updates • CPU cache swap • assuming 1.6MIPS/channel • Total BW requirements: • 48 + 600 + 320 = 968 (MB/s)

  27. mNetwork Implementation

  28. Derivative Design Example • Full G.168 LEC uses a specialized core • LEC has local 4MB memory • # of channels: 1000  2000 • Increased traffic • Bus width: 64  128 (bits)

  29. Circuit Switch Network: Philips PROPHID Architecture • Focus on high-throughput signal processing for multimedia applications • Requirements • High computation capacity and high communication bandwidth • Performance and programmability • PROPHID • Heterogeneous multi-processor architecture consisting of general and application specific processors • General purpose processor • Control and low-medium signal processing • Application specific processors • High performance signal processing

  30. Philips Multi-window TV application

  31. PNX8500 PROPHID architecture

  32. PROPHID: An Architecture Template For high throughput: ~ 10 Gbits/s and reconfigurable connection (switch matrix, 20 proc’s, 64MHz) Programmability and control app’s Control-oriented bus ~10 GOPS Autonomous tasks based on data-driven execution

  33. PROPHID: Autonomous Execution of ADS Processors - Autonomous task execution on Application Domain Specific (ADS) processors - Steam-based execution - Data-availability determines the execution of tasks. - Master(CPU)-slave synchronization can be avoided.

  34. Khan Process Network Model of Multi-window Application

  35. Communication Infrastructure

  36. Processor Model and Surrounding Shell

  37. Circuit Switch Network • Guaranteeing the throughput of streams with hard-real-time constraints in the PROPHID architecture. • Requirements of task execution on ADS processors • Time-interleaved task execution • Each task requires input/output FIFO’s.

  38. Circuit Switch Network

  39. Network Topology

  40. Time-Space-Time Routing

  41. High-Performance Communication Network in PROPHID Architecture space time time

  42. Chip Photo and Metrics

  43. A Generic Architecture for On-Chip Packet-Switched Interconnections, DATE 2000. • A scalable system-level interconnection template is presented.

  44. A Generic Architecture for On-Chip Packet-Switched Interconnections • Bus-based architecture will not meet the bandwidth requirements, since • it is inherently non-scalable in terms of bandwidth • Bandwidth is shared by connected comp’s. • Multiple on-chip bus approaches like VSIA • case-specific grouping of IP’s • Not a truly scalable and reusable interconnection. • In this paper, a generic interconnection template is presented.

  45. A Generic Architecture for On-Chip Packet-Switched Interconnections • Switching networks • Circuit switching • like PROPHID communication network • High performance • Drawbacks • lack of reactivity against rapidly changing comm. • E.g. data bursts in MPEG (worst case should be assumed.), random traffic between CPU master and slaves. • Packet switching • Packets are transferred by routers like Internet. • Routing decisions are distributed over the routers, the network can remain very reactive.

  46. Packet Routing • Wormhole routing

  47. Network Topology: Fat-tree Network • Ex. 16 terminals: 8 --> 8 communication • The terminals can be processors, DSPs, memory, etc. • - Routers are free to use any of the available paths • - Packet: a sequence of 32 bit words • - Packet payload may be of any size

  48. Scalability of Fat-Tree Network

  49. Scaling and Protocol Stack

  50. Real Implementation

More Related