400 likes | 534 Views
Enhancing the Role of Inlining in Effective Interprocedural Parallelization. Jichi Guo, Mike Stiles Qing Yi, Kleanthis Psarris. Problem. Inter-procedural parallelization Parallel after inlining Gain more parallelizable loops Lost of parallelized loops
E N D
Enhancing the Role of Inlining in Effective Interprocedural Parallelization Jichi Guo, Mike Stiles Qing Yi, KleanthisPsarris
Problem • Inter-procedural parallelization • Parallel after inlining • Gain more parallelizable loops • Lost of parallelized loops • Inlining messes up caller / callee • Missed parallel opportunities • Inlining increases code complexity
Goal • Keep the gain parallelizable loops • Prevent the lost parallelism • Discover the missed opportunities
Solution • Summarize the code using annotation • Express the underlying information • Inline the annotation before parallelization • Pass the summarized information to the compiler • Reverse-inline after parallelization • Revert inlining side effects • Maintain equivalence
Outline • Innovations • Problems of parallel + inline strategy • Annotation language • Annotation-based inlining technique • Experiments • Summary
Outline • Innovations • Problems of parallel + inline strategy • Annotation language • Annotation-based inlining technique • Experiments • Summary
Problems of parallel + inlining • Parallel + inlining • Conventional inlining with heuristics and pre-transformations • Heuristics: code size • Transformations: linearization, forward substitution • Intra-procedural loop parallelization • Fortran do-all loop • Goal • Gain loops in caller • Problems • Lost loops in caller / callee • Missed loops in caller
Problems of parallel + inlining • Lost of parallelizable loops in caller/callee • Transformations that cause the lost • Forward substitution • Linearization • Forward substitution of non-linear subscripts • Create indirect array references • Linearization of array dimensions • Mess up array shapes
Problems of parallel + inlining • Forward substitution of non-linear subscripts • Create indirect array references X2(I) ⇒ T(IX(7) + I) Y2(I) ⇒ T(IX(8) + I) Z2(I) ⇒ T(IX(9) + I)
Problems of parallel + inlining • Linearization of array dimensions • Mess up array shapes PP(i, j, k) ⇒ PP(i + j*4 + k*16)
Problems of parallel + inlining • Missed parallelizable loops in caller • Coding styles that cause the lost • Opaque compositional subroutines • A calls B, B calls C, C calls D, … • Array access • When it is difficult to determine which part is killed • Debugging and Error Checking • Statement that breaks the dependency is never executed • I/O statements • Indirect array references • ID=IDX(I), X = A(ID)
Problems of parallel + inlining • Opaque compositional subroutines • A calls B, B calls C, C calls D, …
Problems of parallel + inlining • Array access • Difficult to determine which part is killed CTR computed at runtime
Problems of parallel + inlining • Debugging and Error Checking • Statement that breaks the dependency is never executed • I/O statements
Problems of parallel + inlining • Indirect array references IN=>NODE NODE=>IREL IREL=>RHSB
Outline • Innovations • Problems of parallel + inline strategy • Annotation language • Annotation-based inlining technique • Experiments • Summary
The annotation language • Goal • Summarize information • Avoid ambiguity
The annotation language • Restricted grammar • Special operators • Writing annotations
The annotation language • Restricted grammar • Do-all loop only • No goto
The annotation language • Special operators y = operator(x1, x2, …, xn) Purpose: abstract relation • Unknown operator • Relation is unknown • Generic functions • Unique operator • Relation is one-to-one, from X to Y
The annotation language • Writing annotations • Eliminating adverse side effects • Preserve caller and callee if inlining breaks the dependency • Summarize opaque subroutines • Eliminate nested function calls • Array access • Specify exact range get read/modified • Debugging and error handling • Aggressive strategy: ignore checking statements • Indirect array references • Discover unique relation
The annotation language • Summarize opaque subroutines • Eliminate nested function calls
The annotation language • Array access • Specify exact range get read/modified
The annotation language • Debugging and error handling • Aggressive strategy: ignore checking statements
The annotation language • Indirect array references • Discover unique relation
Outline • Innovations • Problems of parallel + inline strategy • Annotation language • Annotation-based inlining technique • Experiments • Summary
Annotation-based inlining • Goal • Pass annotated information to the compiler • Eliminate inlining side effects • Flow • Inline before parallelization • Reverse-inlining after parallelization • Verify and evaluate at last • Implementation • POLARIS compiler for parallelization • ROSE compiler for parsing • POET transformer • PERFECT benchmark
Annotation-based inlining • Workflow • Annotation inlining⇒ Parallelization ⇒ Reverse-inlining
Annotation-based inlining • Inlining annotation • Steps • Annotation ⇒ source language • Translating special operators • Inlinining generated source language • Avoiding linearization • Translating special operators • Unknown: using uninitialized global arrays • Unique: using linear expression • Avoiding linearization
Annotation-based inlining • Inlining annotation
Annotation-based inlining • Parallelize do-all loops
Annotation-based inlining • Reverse inlining
Annotation-based inlining • Reverse inlining is indispensible • Inlinining is restored to function call • Avoid lost of parallelism in caller / callee • Enable abstraction operators (unknown, unique)
Annotation-based inlining • Verification and evaluation • Correctness, Efficiency, and Generality
Outline • Innovations • Problems of parallel + inline strategy • Annotation language • Annotation-based inlining technique • Experiments • Summary
Experiment • Purpose • What does conventional lining bring to parallelization • Gain? • Lost? • Missed? • How good is annotation-based inlining to avoid above issues • Design • PERFECT benchmarks (except SPEC77) • Two machines • 8 cores Intel Mac • 4 cores AMD Operon • End compiler • GFortran 4.2.1 • IFort 11.1 • Result • Count of Loops • Performance
Experiment • Result: Loops • Conventional inlining • Having loss • Annotation-based inlining • No loss, more gain
Experiment • Result: Performance • Average speedup limited • Annot-based inlining always better
Summary • Inter-procedural parallelization • Summarize effects of conventional inlining • Gain • Lost • Missed • Propose annotation-based inlining • Annotation summary • Enhanced inlining strategy • Reverse inlining
Thanks! Questions?