1 / 25

“I think there is a world market for maybe five computers.”

“I think there is a world market for maybe five computers.”. Thomas Watson Senior, Chairman of IBM, 1943. Architecture Classification. SISD Single Instruction Single Data SIMD Single Instruction Multiple Data MIMD Multiple Instruction Multiple Data MISD Multiple Instruction Single Data.

reegan
Download Presentation

“I think there is a world market for maybe five computers.”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “I think there is a world market for maybe five computers.” Thomas Watson Senior, Chairman of IBM, 1943 ICSS531 - Parallel Architecture

  2. Architecture Classification • SISD • Single Instruction Single Data • SIMD • Single Instruction Multiple Data • MIMD • Multiple Instruction Multiple Data • MISD • Multiple Instruction Single Data ICSS531 - Parallel Architecture

  3. Vector Processors • The earliest parallel computers • Pipeline design (MISD) • Typically viewed as SIMD • Important machines include • Cray-1, etc. • CDC Cyber 205 • IBM 3090 Vector ICSS531 - Parallel Architecture

  4. Seymour Cray (1925-1996) • Packaging, including heat removal • High level bit plumbing… getting the bits from I/O, into memory through a processor and back to memory and to I/O • Parallelism • Programming: O/S and compiler • Problems being solved ICSS531 - Parallel Architecture

  5. Cray’s Contributions • Creative and productive during his entire career 1951-1996. • Creator and un-disputed designer of supers from 1960 • Circuits, packaging, and cooling… • “the mini” as a peripheral computer • Established the template for vector supercomputer architecture ICSS531 - Parallel Architecture

  6. Cray’s Attitudes • Didn’t go with paging & segmentation because it slowed computation • In general, would cut loss and move on when an approach didn’t work… • Ignored CMOS and microprocessors until SRC Company design • Went against conventional wisdom ICSS531 - Parallel Architecture

  7. Computers • CDC 6600 (6xxx Series) • Employed “peripheral processors” • Influenced architecture probably more than any other computer • Cray 1 (1/M, 1/S, XMP, YMP, C90, T90) • Cray 2 GaAs… and Cray 3, Cray 4 ICSS531 - Parallel Architecture

  8. Cray XMP/4 ICSS531 - Parallel Architecture

  9. Cray 2 ICSS531 - Parallel Architecture

  10. Vector Processing • Vector processors have high-level operations that work on linear arrays of numbers: vectors ICSS531 - Parallel Architecture

  11. Styles of Vector Architectures • Memory-memory vector processors • All vector operations are memory to memory • Vector-register processors • All vector operations between vector registers • Vector equivalent of load-store architecture • Includes all vector machines since late 1980s • Cray, Convex, Fujitsu, Hitachi, NEC ICSS531 - Parallel Architecture

  12. Components of Vector Processor • Vector Register • Fixed length bank holding a single vector • Has at least 2 read and 1 write ports • Typically 8-32 vector registers, each holding 64-128 64-bit elements • Vector Functional Units • Fully pipelined, start new operation every clock • Typically 4-8 FUs: FP add, FP mult, FP reciprocal, integer add, logical, shift • Scalar Registers • Single element for FP scalar or address ICSS531 - Parallel Architecture

  13. Vector-Register Architecture ICSS531 - Parallel Architecture

  14. Y = a * X + Y ld f0,a addi r4,rx,#512 loop: ld f2,0(rx) multd f2,f0,f2 ld f4,0(ry) add f4,f2,f4 sd 0(ry),f4 addi rx,rx,#8 addi ry,ry,#8 sub r20,r4,rx bnez r20,loop ld f0,a lv v1,rx multv v2,f0,v1 lv v3,ry addv v4,v2,v3 sv ry,r4 ICSS531 - Parallel Architecture

  15. Y = a * X + Y ld f0,a addi r4,rx,#512 loop: ld f2,0(rx) multd f2,f0,f2 ld f4,0(ry) add f4,f2,f4 sd 0(ry),f4 addi rx,rx,#8 addi ry,ry,#8 sub r20,r4,rx bnez r20,loop ld f0,a lv v1,rx multv v2,f0,v1 lv v3,ry addv v4,v2,v3 sv ry,r4 ICSS531 - Parallel Architecture

  16. Y = a * X + Y ld f0,a addi r4,rx,#512 loop: ld f2,0(rx) multd f2,f0,f2 ld f4,0(ry) add f4,f2,f4 sd 0(ry),f4 addi rx,rx,#8 addi ry,ry,#8 sub r20,r4,rx bnez r20,loop ld f0,a lv v1,rx lv v3,ry multv v2,f0,v1 addv v4,v2,v3 sv ry,r4 ICSS531 - Parallel Architecture

  17. CM2 ICSS531 - Parallel Architecture

  18. Basic Organization CM Processors And Memories • Host sends commands & data to microcontroller • Microcontroller broadcasts control signals, data to array • Microcontroller collects data from processor array Host Computer Microcontroller ICSS531 - Parallel Architecture

  19. CM Processors and Memories • Processors and memories are 1 bit wide, memory is bit-addressable • Operation is bit-serial • Fields may be any number of bits, start anywhere • Context bit (flag) of processor determines whether processor is active ICSS531 - Parallel Architecture

  20. Programming Languages • PARIS - PArallel Instruction Set, similar to assembly language • *LISP - Common Lisp extension with explicit parallel operations • C* - C extension with explicit parallel data, implicit parallel operations • CM-Fortran - Fortran 90 variant implemented on CM ICSS531 - Parallel Architecture

  21. CM2 • The heart of the CM2 is the parallel processing unit • Consists of up to 64K processors • Each processors has up to 128KB RAM • Processors are bit serial!! • An interprocessor communications network • One or more sequencers • An interface to one or more front-end computers • Zero or more I/O controllers and/or framebuffers ICSS531 - Parallel Architecture

  22. CM2 System Organization Nexus Front End Connection Machine Processors Connection Machine Processors Sequencer 0 Sequencer 3 Sequencer 1 Sequencer 2 Connection Machine Processors Connection Machine Processors ICSS531 - Parallel Architecture

  23. Interprocessor Network • Each node of the network is a cluster (“chip”) • 16 data processors on the chip • Memory • One router node • The nodes are connected using a 12D hypercube • 4096 nodes, each directly connected to 11 other nodes • Thus the maximum size of a CM is 12 times 4096 or 64K processors ICSS531 - Parallel Architecture

  24. Arith.cs /* Simple arithmetic demonstration - file arith.cs */ #include <stdio.h> #define NPROCS 1048576 shape [NPROCS]A; float:A s, x, y; void main() { int k, i; with ( A ) { x = (rand()/1.0e7) - 60.0; y = (rand()/1.0e7) - 60.0; for ( i = 0; i < 3; i++ ) { CM_start_timer(1); with ( A ) for ( k = 0; k < 200; k++ ) s = x * y; CM_stop_timer(1); CM_reset_timer(); } }}} ICSS531 - Parallel Architecture

  25. CM5 ICSS531 - Parallel Architecture

More Related