1 / 42

Assisting technologies for program parallelization

Assisting technologies for program parallelization. Chikayama/Taura Lab. Masakazu HAYATSU hayatsu@logos.t.u-tokyo.ac.jp. Agenda. Introduction Difficulty of Program Parallelization Assistant Tools for Program Parallelization SUIF Explorer S-Check Ursa Minor Conclusion. Introduction.

cisco
Download Presentation

Assisting technologies for program parallelization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU hayatsu@logos.t.u-tokyo.ac.jp

  2. Agenda • Introduction • Difficulty of Program Parallelization • Assistant Tools for Program Parallelization • SUIF Explorer • S-Check • Ursa Minor • Conclusion

  3. Introduction • Popularization of parallel computer • Commercial computer with very large # of processor • Low-end PC with 2-4 processor • Performance • Progress of speedup of uni-processor is getting sluggish ⇒Importance of a parallel program is increasing further

  4. Difficulty of Program Parallelization • Dependency • dead lock • data race • Avoid these problem A A B B 100 100? 1? X 1

  5. × ? Automatic Parallelization • Low performance • Parallelization technique is fragile • Knowledge out of code is often required : for(i=0; i<N; i++){ a[f(i)] = 0; //A a[g(i)] = 1; // B } :

  6. Development Process Design & Improve Model Manually Optimizing Program Run Data Race, Dead Lock … ○ Speedup Evaluation Validity Check × Done Finding Problems

  7. (define (RayTracing ViewPoint Vscan nref energy rgb) (if (<= nref 4) (let ((crashed? (tracer ViewPoint Vscan))) ;crashed? (if (and (not crashed?) (!= nref 0)) (let* ((hl0 (fcsyn (f+ (f* (vector-ref Vscan 0) (vector-ref Light 0)) (f* (vector-ref Vscan 1) (vector-ref Light 1)) (f* (vector-ref Vscan 2) (vector-ref Light 2))))) (hl (if (f< hl0 0.0) 0.0 hl0)) (ihl (f* hl hl hl energy (car beam)))) (begin (vector-set! rgb 0 (f+ (vector-ref rgb 0) ihl)) (vector-set! rgb 1 (f+ (vector-ref rgb 1) ihl)) (vector-set! rgb 2 (f+ (vector-ref rgb 2) ihl))))) (if crashed? (let* ((P (cdr crashed?)) ;intersection point (m (car crashed?)) ;crashed object (NV (Get-NVector m Vscan P))) (let* ((br (fcsyn (f+ (f* (vector-ref NV 0) (vector-ref Light 0)) (f* (vector-ref NV 1) (vector-ref Light 1)) (f* (vector-ref NV 2) (vector-ref Light 2))))) (br1 (if (f< br 0.0) 0.0 br)) (bright (if (and (car sh) (Shadow-Check-One-Or-Matrix (car or-Net) P)) 0.0 (f* (f+ br1 0.2) energy (vector-ref m 11))))) (begin(utexture m P) (vector-set! rgb 0 (f+ (vector-ref rgb 0) (f* bright (vector-ref m 13)))) (vector-set! rgb 1 (f+ (vector-ref rgb 1) (f* bright (vector-ref m 14)))) (vector-set! rgb 2 (f+ (vector-ref rgb 2) (f* bright (vector-ref m 15)))) Problem of Manual Parallelization • User must fully understand many lines of code It is prone tocause an error

  8. Important factor for assistant tool • Assist for program parallelization • Combine the benefit of automatic/manual • automatic:can extract information by the numbers • manual:can use high level information • Extract information, and highlight important information

  9. Candidate for parallelization ( 0R-05-01, 0R-05-02, 0R-05-03 ) ( 0R-0e-01, 0R-0e-02 ) ( 0R-0t-02, 0R-0t-03 ) ( 0R-0w-01, 0R-0w-02 ) Extraction of parallelism ;; quick : v— array to be sorted left,right— renge for sort (define (quick v left right) (if (>= left right) v (let ((new-left left) (new-right right) (pivot (vector-ref v (floor (/ (+ left right) 2))))) (do () ((> new-left new-right)) (do () ((>= (vector-ref v new-left) pivot)) (set! new-left (+ new-left 1))) (do () ((<= (vector-ref v new-right) pivot)) (set! new-right (- new-right 1))) (if (<= new-left new-right) (begin (swap v new-left new-right) (set! new-left (+ new-left 1)) (set! new-right (- new-right 1))))) (begin (quick v left new-right) (quick v new-left right))))) (quick #(4 5 3 1 4 0 5 6 ) 0 7) ;; quick : v— array to be sorted left,right— range for sort (define (quick v left right) (if (>= left right) v (let ((new-left left) (new-right right) (pivot (vector-ref v (floor (/ (+ left right) 2))))) (do () ((> new-left new-right)) (do () ((>= (vector-ref v new-left) pivot)) (set! new-left (+ new-left 1))) (do () ((<= (vector-ref v new-right) pivot)) (set! new-right (- new-right 1))) (if (<= new-left new-right) (begin (swap v new-left new-right) (set! new-left (+ new-left 1)) (set! new-right (-new-right 1))))) (begin (quick v left new-right) (quick v new-left right))))) (quick #(4 5 3 1 4 0 5 6 ) 0 7)

  10. notice • Different approach • Our work: based on dependency analysis • Today’s survey: based on profile data • Profile data? • Isn't it enough if execution time is known?

  11. Difficulty in Tuning a Parallel Program (1/2) 100 parallel region 10% • Coverage • Percentage of total execution time spent in the parallel regions • Amdahl’s law • Granularity • Average length of computation between synchronizations • Overhead of communication, synchronization

  12. Difficulty in Tuning a Parallel Program (2/2) Top resource-using code segment • Critical Path Simple consumption of resources does not mean that there is a corresponding potential for improvement

  13. Assistant Tool for Program Parallelization • SUIF Explorer • Coverage and Granularity • S-Check • Effect of change on allover performance • Ursa Minor • Experienced programmer's knowledge

  14. Assistant Tool for Program Parallelization • SUIF Explorer • Coverage and Granularity • S-Check • Effect of change on allover performance • Ursa Minor • Experienced programmer's knowledge

  15. SUIF Explorer [Liao, et al 1999] • Objective • Identify the important loops • Rules of thumb • Most of a program’s execution time is spent on a small percentage of the code • Most of a program’s execution time is spent on loops

  16. The SUIF Explorer System Sequential Program 2.Collecting profile & dynamic dependences Parallelizing Compiler 1. Automatic parallelization Execution Analyzers Parallelization Guru 3.Guidance to improving program performance Rivet Visualizer User

  17. The Parallelization Guru (1/2) • Parallelization guidance • The coverage and granularity • Updates the information as new loops are parallelized • A list of loops to parallelize • Sorted in order of execution time • Have no I/O and are not nested under some parallel loops • Dependence information on each loop

  18. The Parallelization Guru (2/2) • User interaction • Starts with the loop at the top of the list • If (loop have many dependence) user don’t choose to attempt • else User then determines • if the static dependence can be ignored • if an array can be privatized …etc. • using program slice

  19. program slice contribute to the value

  20. The Parallelization Guru • Comment • Performance data & Dependency information are related closely ⇒ it cut down development cost • It is applicable only to loops

  21. Assistant Tool for Program Parallelization • SUIF Explorer • Coverage and Granularity • S-Check • Effect of change on allover performance • Ursa Minor • Experienced programmer's knowledge

  22. S-Check [Snelick 1997] • Objective • Identify the parts of the program that changes to them will significantly improve overall performance • Effect prediction • Determine the effect of changes in the code without actually making the changes

  23. Sensitive Checker • Insert “delay” into segments of a parallel program, calculate sensitivity to perturbation • Assumption • A program code segment ishighly sensitive to slight perturbations ⇒ comparable segment improvements will boost performance correspondingly

  24. Program Model • Code = Transfer Function • Taylor expansion • βj := indicating how sensitive execution is • βi,j := interactions between code

  25. ・・・・・・ delay(0) ・・・・・・ delay(0) ・・・・・・ delay(0) ・・・・・・ delay(1) ・・・・・・ delay(0) ・・・・・・ delay(0) ・・・・・・ delay(0) ・・・・・・ delay(1) ・・・・・・ delay(0) ・・・・・・ delay(1) ・・・・・・ delay(1) ・・・・・・ delay(0) ・・・ original parallel program while(x>y){ // A delay(a); } delay(b); send(…); // B ・・・・・・ do_computation{delay(c); …}; // C while(x>y){ } send(…); ・・・・・・ do_computation{…}; Insert delays 1:ON / 0:OFF Mark possible bottlenecks // A // B // C Generate & Run numerous versions of program Effects Source 0.44 A 4.54 B 0.07 AB 1.21 C 0.02 BC 0.34 AC 0.00 ABC Analyze Results Solve for Effects

  26. UserInteract (1/3) • Test code locations are selected manually or automatically • Information provided from profiler • programming constructs (ex. while, for) • certain library function call (ex. barrier(), send())

  27. User Interact(2/3) • Set the parameter • delay perturbation patterns • delay value • Trade off (info vs # of run)

  28. UserInteract(3/3) • Higher effect code is more likely to be a bottleneck • Dependency is not dealt with

  29. S-Check • Comment • Identify the program segment linking directly to a performance • Knowledge about the program is required in order to mark possible bottlenecks • code size get bigger, sensitivity test take longer time • Dependence information is not available

  30. Assistant Tool for Program Parallelization • SUIF Explorer • Coverage and Granularity • S-Check • Effect of change on allover performance • Ursa Minor • Knowledge of experienced programmer's

  31. Ursa Minor [Kim, et al. 2000] • Objective • × stop at pointing to problematic code〇 present with possible causes and solutions • Transfer knowledge to novice programmer from experienced programmer

  32. UrsaMinor System Import/Export Data files from Polaris or other Parallel Program Static Data Dynamic Data Merlin Performance Adviser Database Database Manager Store analyzed data, Map file, etc. GUI Manager Analyze problem Suggest solution Table View Structure View User

  33. Merlin Performance Advisor • Knowledge database • knowledge on diagnosis and solutions • Transfer programming experience from experts to new users (with “MAP” file) • Performance model • Architecture … etc.

  34. Merlin Symptom ⇒ Diagnostic Suggestions

  35. Advisor Map (1/2) • Advisor Map • Problem Domain • General performance problems from the viewpoint of programmers • Diagnostics Domain • Possible causes of these problems • Solution Group • Possible remedies

  36. Advisor Map (2/2)

  37. Expression Evaluator • Basic Spreadsheet Operations • Numeric Functions: NEG, ADD, SPDUP, PERCO, ARVG, etc. • Relational Functions: EQ, NE, etc. • Query Functions: PARALLEL, HASIO, HASCALL, HASDEP, etc. • Logical Functions: AND, OR, etc.

  38. Merlin • Comment • The idea which progressed further rather than indication of a bottleneck • Who write the “MAP”? • The effect of this technology depends on quality of the MAP

  39. Comparison • SUIF Explorer vs. S-Check • No configuration, dependence information • Efficiency? • Two vs. Ursa Minor • Practical • Not kind to beginners

  40. Conclusion • Several approach to guide the user with smart information • Future work • Integration • Profiler and Dependence Analyzer • Portability • Different architecture, OS, performance

More Related