210 likes | 277 Views
This comprehensive review covers topics such as Amdahl’s Law, balance point heuristic, bus contention, and strategies to improve shared-memory multiprocessor performance. It includes practical tasks and simulations. Exam and homework details provided for a high-performance computing course.
E N D
CS8625-June-22-2006 Homework & Midterm Review CS8625 High Performance and Parallel ComputingDr. Ken Hoganson • Class • Will • Start • Momentarily…
Balance Point • The basis for the argument against “putting all your (speedup) eggs in one basket”: Amdahl’s Law • Note the balance point in the denominator where both parts are equal. • Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup.
Balance Point Heuristic • Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup. Solved for N N= α -------- 1-α Solved for α α= N -------- N + 1
Balance Point • Example • Parallel Fraction = 90% • (10% in serial) Solved for N N= α -------- 1-α N=0.90/0.10=9, Sup=5
Example • Example: Workload has an average alpha of 94%. How many processors can reasonably be applied to speedup this workload? Solved for N N= α -------- 1-α
Example • Example: An architecture has 32 processors. What workload parallel fraction is the minimum need to make reasonably efficient use of the processors? Solved for α α= N -------- N + 1
Multi-Bus Multiprocessors • Shared-Memory Multiprocessors are very fast • Low latency to memory on bus • Low communication overhead through shared-memory • Scalability problems • Length of bus slows signals (.75 SOL) • Contention for the bus reduces performance • Requires Cache to reduce contention CPU CPU MEM CPU
Bus Contention Multiple devices – processors, etc, compete for access to a bus Only one device can use a bus at a time, limiting performance and scalability 1 – zero requests – exactly one request = probability of 2 or more (at least one blocked request)
Performance degrades as requests are blocked • Resubmitted blocked requests degrades performance even further than that shown above
Clearly, the probability that a processor’s access to a shared bus will be denied will increase with both: • The number of processors sharing a bus • The probability a processor will need access to the bus. • What can be done? What is the “universal band-aid” for performance problems?
If cache greatly reduces access to mem, then • Blocking rate on the bus is much lower.
Two approaches to improving shared memory/bus machine performance: • Invest in large amounts, and multiple levels of, cache, • and a connection network to allow caches to synchronize contents. • Invest in multiple buses and independently accessible blocks of memory • Combining both may be the best strategy.
Homework • Your project is to explore the effect on the performance of a shared-memory bus-based multiprocessor, of interconnection network contention. • You will do some calculations, use the HPPAS simulator, and write a couple-page report to turn in.
Task 1 • For a machine with processors that include on-chip cache that yield a cache hit rate of 90%, determine the maximum number of processors that can go on a single shared-bus, and still maintain at least a 98% acceptance of requests. • Use the calculations shown in the lecture to zero in on the correct answer, recording your calculations in a table for your report. Show each step of the calculation as was done in the lecture/ppt. • Your results should “bracket” the maximum.
Task 1 • Task 1: Use the formula in the table to find
Task 2 • Use the maximum number of processors (Task 1) and Amdahl’s law at the balance point, to figure out what workload parallel fraction yields a balance in the denominator. • Determine the theoretical speedup that will be obtained. Solved for α α= N -------- N + 1
Task 3 • Use the data values developed so far, to run the HPPAS simulation system. Record the speedup obtained from this system. • If it differs markedly from the theoretical value, check all the settings, and rerun the simulation, and explain any variation from the theoretical expected value. • Record your results in your report, showing each step of the calculation as was done in the lecture/ppt.
Dates • The current plan: • Make the midterm available on Friday June 23. • Due date will be July 10 (after the conference and after the July 4th weekend). • Conference week: • Complete homework: Due on July 3 by email. • Work on Midterm exam. • No class lecture on June 27 and 29. • No class on July 4. • Next live class is Wed July 6.
Topic Overview Overview of topics for the exam: • Five parallel levels • Problems to be solved for parallelism • Limitations to parallel speedup • Amdahl’s Law: theory, implications • Limiting factors in realizing parallel performance • Pipelines and their performance issues • Flynn’s classification • SIMD architectures • SIMD algorithms • Elementary analysis of algorithms • MIMD: Multiprocessors and Multicomputers • Balance point and heuristic (from Amdahl’s Law) • Bus contention and analysis of single shared bus. • Use of the online HPPAS tool. • Specific multiprocessor clustered architectures: • Compaq • DASH • Dell Blade Cluster
End of Lecture End Of Today’s Lecture.