1 / 61

Impact of Large-Scale Computer Systems Today

Explore the impact of large-scale computer systems in various fields such as low-energy defibrillation, genome sequencing, public content generation, and online gaming.

parkerl
Download Presentation

Impact of Large-Scale Computer Systems Today

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced computer systems(Chapter 12) http://www.pds.ewi.tudelft.nl/~iosup/Courses/2011_ti1400_12.ppt

  2. Large-Scale Computer Systems Today • Low-energy defibrillation • Saves lives • Affects >2M people/year • Studies involving both laboratory experiments and computational simulation Source: TeraGrid science highlights 2010, https://www.teragrid.org/c/document_library/get_file?uuid=e950f0a1-abb6-4de5-a509-46e535040ecf&groupId=14002

  3. Large-Scale Computer Systems Today • Genome sequencing • May save lives • The $1,000 barrier • Large-scale molecular dynamics simulations • Tectonic plate movement • May save lives • Adaptive fine mesh simulations • Using 200,000 processors Source: TeraGrid science highlights 2010, https://www.teragrid.org/c/document_library/get_file?uuid=e950f0a1-abb6-4de5-a509-46e535040ecf&groupId=14002

  4. Large-Scale Computer Systems Today • Public Content Generation • Wikipedia • Affects how we think about collaborations • “The distribution of effort has increasingly become more uneven, unequal”Sorin Adam MateiPurdue University Source: TeraGrid science highlights 2010, https://www.teragrid.org/c/document_library/get_file?uuid=e950f0a1-abb6-4de5-a509-46e535040ecf&groupId=14002

  5. Large-Scale Computer Systems Today • Online Gaming • World of Warcraft, Zynga • Affects >250M people • “As an organization, World of Warcraft utilizes 20,000 computer systems, 1.3 petabytes of storage, and more than 4600 people.” • 75,000 cores • Upkeep: >135,000$/day (?) Source: http://www.gamasutra.com/php-bin/news_index.php?story=25307 and http://spectrum.ieee.org/consumer-electronics/gaming/engineering-everquest/0and http://35yards.wordpress.com/2011/03/01/world-of-warcraft-by-the-numbers/

  6. Why parallelism (1/4) • Fundamental laws of nature: • example: channel widths are becoming so small that quantum propertiesare going to determine device behaviour • signal propagation timeincreases when channel widths shrink

  7. Why parallelism (2/4) • Engineering constraints: • Phase transition timeof a component is a good measure for the maximum obtainable computing speed • example: optical or superconducting devices can switch in 10-12seconds • optimistic suggestion:1 TIPS(Tera Instructions Per Second, 1012) is possible • However, we must calculate something • assume we need 10 phase transitions: 0.1 TIPS

  8. Why parallelism (3/4) But what about memory ? 0.5 cm • It takes light approximately 16 picoseconds to cross • 0.5 cm, yielding a possible execution rate of 60 GIPS • However, in silicon, speed is about 10 times slower, • resulting in 6 GIPS

  9. Why parallelism (4/4) • Speed of sequential computers is limited toa few GIPS • Improvements by using parallelism: • multiple functional units(instruction-level parallelism) • multiple CPUs(parallel processing)

  10. Quantum Computing? Source: http://www.engadget.com/2011/05/29/d-wave-sells-first-commercial-quantum-computer-to-lockheed-marti/ • “Qubits are quantum bits that can be in an “on”, “off”, or “both” state due to fuzzy physics at the atomic level.” • Does surrounding noise matter? • Wim van Dam, Nature Physics 2007 • May 25, 2011 • Lockheed Martin (10M$) • D-Wave One 128 qubit

  11. Agenda • Introduction • The Flynn Classification of Computers • Types of Multi-Processors • Interconnection Networks • Memory Organization in Multi-Processors • Program Parallelism and Shared Variables • Multi-Computers • A Programmer’s View • Performance Considerations

  12. Classification of computers (Flynn Taxonomy) • Single Instruction, Single Data (SISD) • conventional system • Single Instruction, Multiple Data (SIMD) • one instruction on multiple data objects • Multiple Instruction, Multiple Data (MIMD) • multiple instruction streams on multiple data streams • Multiple Instruction, Single Data (MISD) • ?????

  13. Agenda • Introduction • The Flynn Classification of Computers • Types of Multi-Processors • Interconnection Networks • Memory Organization in Multi-Processors • Program Parallelism and Shared Variables • Multi-Computers • A Programmer’s View • Performance Considerations

  14. SIMD (Array) Processors ..... PE CM-5’91 Instruction Issuing Unit INCR CM-2’87Peak: 28GFLOPSSustainable:5-10% PE = Processing Element Sources: http://cs.adelaide.edu.au/~sacpc/hardware.html#cm5 and http://www.paulos.net/other/cm2.html and http://boards.straightdope.com/sdmb/archive/index.php/t-515675.html (about the blinking leds)

  15. MIMDUniform Memory Access (UMA) architecture Any processor can access directly any memory. P1 P2 Pm ...... interconnection network 0 3 . ...... 1 4 . 2 5 N M1 M2 Mk Uniform Memory Access (UMA) computer

  16. MIMDNUMA architecture Any processor can access directly any memory. 0 3 . P1 P2 ...... Pm 1 4 . 2 5 N M1 M2 Mm interconnection network Non-Uniform Memory Access (NUMA) computer realization in hardware or in software (distributed shared memory)

  17. MIMDDistributed memory architecture Any processor can access any memory, but sometimesthrough another processor (via messages). 0 0 0 P1 P2 Pm 1 1 1 ...... 2 2 2 M1 M2 Mm interconnection network

  18. Example 1: Graphical Processing Units’s CPU versus GPU • CPU: Much cache and control logic • GPU: Much compute logic

  19. GPU Architecture SIMD architecture • Multiple SIMD units • SIMD pipelining • Simple processors • High branch penalty • Efficient operation on • parallel data • regular streaming

  20. Example 2: Cell B.E. Distributed memory architecture 8 identical cores PowerPC

  21. Example 3: Intel Quad-core Shared Memory MIMD

  22. Example 4: Large MIMD Clusters BlueGene/L

  23. Supercomputers Over Time Source: http://www.top500.org

  24. Agenda • Introduction • The Flynn Classification of Computers • Types of Multi-Processors • Interconnection Networks (I/O) • Memory Organization in Multi-Processors • Program Parallelism and Shared Variables • Multi-Computers • A Programmer’s View • Performance Considerations

  25. Interconnection networks(I/O between processors) • Difficulty in building systemswith many processors: the interconnections • Important parameters: • Diameter: • Maximal distance between any two processors • Degree: • Maximal number of connections per processor • Total number of connections (Cost) • Bisection width • Largest number of simultaneous messages

  26. Multiple bus Bus 1 Bus 2 (Multiple) bus structures

  27. Cross bar Sun E10000 N N2 switches Cross-bar interconnection network Source: http://www.cray-cyber.org/systems/E10k_detail.php

  28. Multi-stage networks (1/4) stage1 stage2 stage3 P0 path from P5 to P3 P1 P3 8 modules 3-bit ids P5 P7

  29. Multi-stage networks (2/4) connections P4-P0 and P5-P3 both use 0 0 P0 P1 1 P3 1 0 P4 P5 Shuffle Network stage3 stage1 stage2 “Shuffle”: 2 x ½ deck, interleave

  30. Multi-stage network (3/4) • Multistage networks: multiple steps • Example: Shuffle or Omega network • Every processor identified by three-bit number (in general, n-bit number) • Message from processor to another contains identifier of destination • Routing algorithm: In every stage, • inspect one bit of destination • if 0: use upper output • if 1: use lower output

  31. Multi-stage network (4/4) • Properties: • Let N = 2nbe the number ofprocessing elements • Number of stages n = log2N • Number of switches per stage N/2 • Total number of (2x2) switches N(log2N)/2 • Not every pair of connections can be simultaneously realized • Blocking

  32. Hypercubes (1/3) Non-uniform delay, so for NUMA architectures. 10 11 n.2n-1 connections n = 2 maximum distancenhops 00 01 • Connected PEs differ by 1 bit • Routing: • - scan bits from right to left • - if different, send to neighbor • with same bit different • - repeat until end 000 -> 111 011 111 010 110 n = 3 001 101 100 000

  33. Hypercubes (2/3) • Question: what is the average distance between two nodes in a hypercube?

  34. Mesh Constant number of connections per node

  35. Torus mesh with wrap-around connections

  36. Tree

  37. Fat tree … Nodes have multiple parents

  38. Local networks • Ethernet • based on collision detection • upon collision, back off and randomly try later • speedup to100Gb/s (Terabit Ethernet?) • Token ring • based on tokencirculation on ring • possession of token allows putting message on the ring PC PC

  39. Agenda • Introduction • The Flynn Classification of Computers • Types of Multi-Processors • Interconnection Networks • Memory Organization in Multi-Processors • Program Parallelism and Shared Variables • Multi-Computers • A Programmer’s View • Performance Considerations

  40. Memory organization (1/2) UMA architectures. Processor Secondary Cache Network Interface network

  41. Memory organization (2/2) NUMA architectures. Processor Secondary Cache Local Memory Network Interface network

  42. Cache coherence • Problem: caches in multiprocessors may have copies of the same variable • Copies must be kept identical • Cache coherence:all copies of a shared variable have the same value • Solutions: • write throughto shared memory and all caches • invalidatecache entries in all other caches • Snoopy caches: • Proc.Elements sense write and adapt cache or do invalidate

  43. Agenda • Introduction • The Flynn Classification of Computers • Types of Multi-Processors • Interconnection Networks • Memory Organization in Multi-Processors • Program Parallelism and Shared Variables • Multi-Computers • A Programmer’s View • Performance Considerations

  44. Parallelism Language construct: PARBEGIN PARBEGIN task_1; task_2; .... …. task_n; PAREND task 1 task n PAREND

  45. Shared variables (1/4) Task_1 Task_2 ..... STW R2, SUM(0) ..... ..... STW R2, SUM(0) ..... SUM shared memory T1 T2

  46. Shared variables (2/4) • Suppose processsors both 1 and 2 execute: LW A,R0 /* A is variable in main memory */ ADD R1,R0 STW R0,A • Initially: • A=100 • R1 in processor 1 is 20 • R1 in processor 2 is 40 • What is the final value of A? 120, 140, 160? Now consider the final value of A is yourbank account balance.

  47. Shared variables (3/4) • So there is a need for mutual exclusion: • different components of the same program need exclusive access to a data structure to ensure consistent values • Occurs in many situations: • access to shared variables • access to a printer • A solution: a single instruction (Test&Set) that • testswhether somebody else accesses the variable • if so, continue testing (busy waiting) • if not, indicates that the variable is being accessed

  48. Shared variables (4/4) Task_1 Task_2 crit: T&S LOCK,crit ...... STW R2, SUM(0) ..... CLR LOCK crit: T&S LOCK,crit ...... STW R2, SUM(0) ..... CLR LOCK shared memory SUM LOCK T1 T2

  49. Agenda • Introduction • The Flynn Classification of Computers • Types of Multi-Processors • Interconnection Networks • Memory Organization in Multi-Processors • Program Parallelism and Shared Variables • Multi-Computers [earlier, see Token Ring et al.] • A Programmer’s View • Performance Considerations

  50. Example program • Compute dot product of two vectors with a • sequential program • two tasks with shared memory • two tasks with distributed memory using messages • Primitives in parallel programs: • create_thread() (create a (sub)process) • mypid() (who am I?)

More Related