1 / 30

Executive Briefing: Multicore-Enabling SaaS Applications

Cilk++ , Cilk , Cilkscreen, and Cilk Arts are trademarks of Cilk Arts, Inc. Executive Briefing: Multicore-Enabling SaaS Applications. September 3, 2008. www.cilk.com. Agenda. Emergence of multicore processors Key challenges facing developers When can multicore help?

lemuel
Download Presentation

Executive Briefing: Multicore-Enabling SaaS Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cilk++, Cilk, Cilkscreen, and Cilk Artsare trademarks of Cilk Arts, Inc. Executive Briefing:Multicore-Enabling SaaSApplications September 3, 2008 www.cilk.com

  2. Agenda • Emergence of multicore processors • Key challenges facing developers • When can multicore help? • Data races: a new type of bug • Questions to ask when going multicore • Programming tools & techniques

  3. About CILKARTS Mission: To provide the easiest, quickest, and most reliable way to optimize application performance on multicore processors. • Launched in March 2007. • Headquartered in Burlington, MA. • Funded by Stata Venture Partners, software industry executives, founders, and grants from the NSF and DARPA. • First product is Cilk++, based on 15 years of research at MIT

  4. Emergence of MulticoreandImpact on SaaS

  5. Moore’s Law Transistor count is still rising, … Intel CPU Introductions but clock speed is bounded at ~5GHz. Source: Herb Sutter, “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software,” Dr. Dobb's Journal, 30(3), March 2005.

  6. Power Density Source: Patrick Gelsinger, Intel Developer’s Forum, Intel Corporation, 2004.

  7. Vendor Solution Intel 45nm quad-core processor • To scale performance, put many processor cores on a chip. • Intel predicts 80+ cores by 2011!

  8. SaaS Opportunity • Increase throughput • Quantitative finance: increase volume of portfolios analyzed overnight • Reduce response time • Engineering simulation: accelerate structural analysis of assembly • Improve user experience • Multiplayer games: increased galaxy size • Reduce data center power consumption

  9. User Work User Work Computer Operation 2 Computer Operation 1 Multicore and SaaS • Application response time? • Processor utilization? P1 P2 P3 P4 P5 P6 P7 P8

  10. User Work User Work User Work User Work User Work User Work User Work User Work Computer Operation 2 Computer Operation 2 Computer Operation 2 Computer Operation 2 Computer Operation 1 Computer Operation 1 Computer Operation 1 Computer Operation 1 Multicore and SaaS • For CPU-constrained applications, multi-threading improves response time and boosts utilization Computer Operation #1 Computer Operation #2 P1 User Work User Work P2 P3 P4 P5 P6 P7 P8

  11. Key Challenges Facing Developers

  12. Multicore Challenges Application Performance • How can you minimize response time? • Will your solution scale as the number of processor cores increases? • Can you identify performance bottlenecks? Development Time • How will you get your product out in time? • Where will you find enough parallel-programming talent? • Will you be forced to redesign your application? Software Reliability • Can you debug your parallel application? • How will you test it effectively before release?

  13. Can a Multicore CPU Help My App?

  14. Work & Span • Work: total amount of time spent in all the instructions • Span: Critical path • Parallelism: ratio of work to span 1 2

  15. Work & Span • Work: total amount of time spent in all the instructions • Span: Critical path • Parallelism: ratio of work to span • In this example: • Work = 18 • Span = 9 • Parallelism = 2 • i.e., little gain beyond 2 processors 1 2 3 4 6 13 7 9 14 16 5 8 10 17 11 15 12 18

  16. Can Multicore Help? • The more parallelism is available in an application, the more a multicore processor can help. Work:T1 = 58 Span: T∞ = 9 (same as previous example) Parallelism: T1/T∞ = 6.44

  17. Data Races:A New Type of Bug in Multicore Programming

  18. Race Bugs Definition.A determinacy race occurs when two logically parallel instructions access the same memory location and at least one of the instructions performs a write. A int x = 0; x++; x++; B C 1 x = 0; assert(x == 2); 2 4 r1 = x; r2 = x; D 3 5 r1++; r2++; 7 6 x = r1; x = r2; 8 assert(x == 2);

  19. Coping with Race Bugs • Although locking can “solve” race bugs, lock contention can destroy all parallelism. • Making local copies of the nonlocal variables can remove contention, but at the cost of restructuring program logic. • Cilk++ provideshyperobjects to mitigate data races on nonlocal variables without the need for locks or code restructuring. IDEA:Different parallel branches may see differentviewsof the hyperobject.

  20. 20 Questions to Ask http://www.cilk.com/resource-library/going-multicore-20-questions-to-ask/

  21. Development Time • To multicore-enable my application, how much logical restructuring of my application must I do? • Can I easily train programmers to use the multicore software platform? • Can I maintain just one code base, or must I maintain a serial and parallel versions? • Can I avoid rewriting my application every time a new processor generation increases the core count? • Can I easily multicore-enable ill-structured and irregular code, or is the multicore software platform limited to data-parallel applications? • Does the multicore software platform properly support modern programming paradigms, such as objects, templates, and exceptions? • What does it take to handle global variables in my application?

  22. Application Performance • How can I tell if my application exhibits enough parallelism to exploit multiple processors? • Does the multicore software platform address response-time bottlenecks, or just offer more throughput? • Does application performance scale up linearly as cores are added, or does it quickly reach diminishing returns? • Is my multicore-enabled code just as fast as my original serial code when run on a single processor? • Does the multicore software platform's scheduler load-balance irregular applications efficiently to achieve full utilization? • Will my application "play nicely" with other jobs on the system, or do multiple jobs cause thrashing of resources? • What tools are available for detecting multicore performance bottlenecks?

  23. Software Reliability • How much harder is it to debug my multicore-enabled application than to debug my original application? • Can I use my standard, familiar debugging tools? • Are there effective debugging tools to identify and localize parallel-programming errors, such as data-race bugs? • Must I use a parallel debugger even if I make an ordinary serial programming error? • What changes must I make to my release-engineering processes to ensure that my delivered software is reliable? • Can I use my existing unit tests and regression tests?

  24. Programming Tools & Techniques

  25. Parallel C++ Options Pthreads & WinAPI threads • An API for creating and manipulating O/S threads. • Programmer writes thread-interaction protocols. Intel’s Threading Building Blocks • A C++ template library with automatic scheduling of tasks. • Programmer writes explicit “continuations.” OpenMP • Open-source language extensions to C++. • Programmer inserts pragmas into code. Cilk++ • Faithful extension of C++. • Programmer inserts keywords into code that do not destroy serial semantics. • Provably good scheduler and a race-detection tool.

  26. Cilk++:Smooth Path to Multicore for Legacy Applications

  27. Cilk++ Cilk++is a remarkably simpleset of extensions for C++ and a powerful runtime systemfor multicore applications. Cilk++provides a smoothevolutionfrom serial programming to parallel programming.

  28. CILK ARTS Solution Application Performance • Best-in-class performance • Linear scaling as cores are added • Minimal overhead on a single-core Development Time • Minimal application changes • Can be learned in days by programmers without multithreading expertise • Seamless path forward (and backward) Software Reliability • Multithreaded version as reliable as the original • No fundamental change to release engineering

  29. Cilk++Compiler Conventional Compiler CILK ARTS Solution 1 int fib (int n) { if (n<2) return (n); else { int x,y; x = cilk_spawn fib(n-1); y = fib(n-2); cilk_sync; return (x+y); } } 2 Cilk++Hyperobject Library 5 Cilk++source Linker int fib (int n) { if (n<2) return (n); else { int x,y; x = fib(n-1); y = fib(n-2); return (x+y); } } 4 Cilk++Race Detector Binary Serial code Cilk++Runtime System 3 Parallel Regression Tests Conventional Regression Tests Reliable Single-Threaded Code Reliable Multi-Threaded Code Exceptional Performance

  30. Thank You! • Free e-Book www.cilk.com/multicore-e-book/ • We are currently accepting applications for our Early Visibility program • For more info about Cilk++ and resources for multicoders: • duncan@cilk.com • www.cilk.com

More Related