1 / 47

Scalable Internet Services

Scalable Internet Services. Cluster Lessons and Architecture Design for Scalable Services BR01, TACC, Porcupine, SEDA and Capriccio. Outline. Overview of cluster services lessons from giant-scale services (BR01) SEDA (staged event-driven architecture) Capriccio. Scalable Servers.

cricket
Download Presentation

Scalable Internet Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Internet Services Cluster Lessons and Architecture Design for Scalable Services BR01, TACC, Porcupine, SEDA and Capriccio

  2. Outline • Overview of cluster services • lessons from giant-scale services (BR01) • SEDA (staged event-driven architecture) • Capriccio ravenben@cs.ucsb.edu

  3. Scalable Servers • Clustered services • natural platform for large web services • search engines, DB servers, transactional servers • Key benefit • low cost of computing, COTS vs. SMP • incremental scalability • load balance traffic/requests across servers • Extension from single server model • reliable/fast communication, but partitioned data ravenben@cs.ucsb.edu

  4. Goals • Failure transparency • hot-swapping components w/o loss of avail • homogeneous functionality and/or replication • Load balancing • partition data / requests for max service rate • need to colocate requests w/ associated data • Scalability • aggregate performance should scale w/# of servers ravenben@cs.ucsb.edu

  5. Two Different Models • Read-mostly data • web servers, DB servers, search engines (query) • replicate across servers + (RR DNS / redirector) client client client client client IP Network (WAN) Round Robin DNS ravenben@cs.ucsb.edu

  6. Two Different Models … • Read-write model • mail servers, e-commerce sites, hosted services • small(er) replication factor for stronger consistency client client client client client IP Network (WAN) Load Redirector ravenben@cs.ucsb.edu

  7. Key Architecture Challenges • Providing high availability • availability across component failures • Handling flash crowds / peak load • need support for massive concurrency • Other challenges • upgradability: maintaining availability and minimal cost during upgrades in S/W, H/W, functionality • error diagnosis: fast isolation of failures / performance degradation ravenben@cs.ucsb.edu

  8. Nuggets • Definition • uptime = (MTBF – MTTR)/MTBF • yield = queries completed / queries offered • harvest = data available / complete data • MTTR • at least as important at MTBF • much easier to tune and quantify • DQ principle • data/query x queries/second  constant • physical bottlenecks limit overall throughput ravenben@cs.ucsb.edu

  9. Staged Event-driven Architecture • SEDA (SOSP’05) ravenben@cs.ucsb.edu

  10. Break… • Come back in 5 mins • more on threads vs. events… ravenben@cs.ucsb.edu

  11. applications application programming interface Dynamic Tap. core router Patchwork distance map network SEDA event-driven framework Java Virtual Machine Tapestry Software Architecture ravenben@cs.ucsb.edu

  12. ? A ? B ? C Network ? ? ? ? Impact of Correlated Events + + = • web / application servers • independent requests • maximize individual throughput event handler • correlated requests: A+B+CD • e.g. online continuous queries, sensor aggregation, p2p control layer, streaming data mining ravenben@cs.ucsb.edu

  13. Capriccio • User-level light-weight threads (SOSP03) • Argument • threads are the natural programming model • current problems result of implementation • not fundamental flaw • Approach • aim for massive scalability • compiler assistance • linked stacks, block graph scheduling ravenben@cs.ucsb.edu

  14. The Price of Concurrency • Why is concurrency hard? • Race conditions • Code complexity • Scalability (no O(n) operations) • Scheduling & resource sensitivity • Inevitable overload • Performance vs. Programmability • No good solution Threads Ideal Ease of Programming Events Threads Performance ravenben@cs.ucsb.edu

  15. The Answer: Better Threads • Goals • Simple programming model • Good tools & infrastructure • Languages, compilers, debuggers, etc. • Good performance • Claims • Threads are preferable to events • User-Level threads are key ravenben@cs.ucsb.edu

  16. “But Events Are Better!” • Recent arguments for events • Lower runtime overhead • Better live state management • Inexpensive synchronization • More flexible control flow • Better scheduling and locality • All true but… • Lauer & Needham duality argument • Criticisms of specific threads packages • No inherent problem with threads! ravenben@cs.ucsb.edu

  17. Criticism: Runtime Overhead • Criticism: Threads don’t perform well for high concurrency • Response • Avoid O(n) operations • Minimize context switch overhead • Simple scalability test • Slightly modified GNU Pth • Thread-per-task vs. single thread • Same performance! ravenben@cs.ucsb.edu

  18. Criticism: Synchronization • Criticism: Thread synchronization is heavyweight • Response • Cooperative multitasking works for threads, too! • Also presents same problems • Starvation & fairness • Multiprocessors • Unexpected blocking (page faults, etc.) • Both regimes need help • Compiler / language support for concurrency • Better OS primitives ravenben@cs.ucsb.edu

  19. Criticism: Scheduling • Criticism: Thread schedulers are too generic • Can’t use application-specific information • Response • 2D scheduling: task & program location • Threads schedule based on task only • Events schedule by location (e.g. SEDA) • Allows batching • Allows prediction for SRCT • Threads can use 2D, too! • Runtime system tracks current location • Call graph allows prediction Task Program Location Events Threads ravenben@cs.ucsb.edu

  20. 900 800 KnotC (Favor Connections) KnotA (Favor Accept) 700 Haboob Mbits / second 600 500 400 300 200 100 0 1 4 16 64 256 1024 4096 16384 Concurrent Clients The Proof’s in the Pudding • User-level threads package • Subset of pthreads • Intercept blocking system calls • No O(n) operations • Support > 100K threads • 5000 lines of C code • Simple web server: Knot • 700 lines of C code • Similar performance • Linear increase, then steady • Drop-off due to poll() overhead ravenben@cs.ucsb.edu

  21. Arguments For Threads • More natural programming model • Control flow is more apparent • Exception handling is easier • State management is automatic • Better fit with current tools & hardware • Better existing infrastructure ravenben@cs.ucsb.edu

  22. Why Threads: control Flow • Events obscure control flow • For programmers and tools Web Server AcceptConn. ReadRequest PinCache ReadFile WriteResponse Exit ravenben@cs.ucsb.edu

  23. Why Threads: control Flow • Events obscure control flow • For programmers and tools Web Server AcceptConn. ReadRequest PinCache ReadFile WriteResponse Exit ravenben@cs.ucsb.edu

  24. Why Threads: Exceptions • Exceptions complicate control flow • Harder to understand program flow • Cause bugs in cleanup code Web Server AcceptConn. ReadRequest PinCache ReadFile WriteResponse Exit ravenben@cs.ucsb.edu

  25. Why Threads: State Management • Events require manual state management • Hard to know when to free • Use GC or risk bugs Web Server AcceptConn. ReadRequest PinCache ReadFile WriteResponse Exit ravenben@cs.ucsb.edu

  26. Why Threads: Existing Infrastructure • Lots of infrastructure for threads • Debuggers • Languages & compilers • Consequences • More amenable to analysis • Less effort to get working systems ravenben@cs.ucsb.edu

  27. Building Better Threads • Goals • Simplify the programming model • Thread per concurrent activity • Scalability (100K+ threads) • Support existing APIs and tools • Automate application-specific customization • Mechanisms • User-level threads • Plumbing: avoid O(n) operations • Compile-time analysis • Run-time analysis ravenben@cs.ucsb.edu

  28. Case for User-Level Threads • Decouple programming model and OS • Kernel threads • Abstract hardware • Expose device concurrency • User-level threads • Provide clean programming model • Expose logical concurrency • Benefits of user-level threads • Control over concurrency model! • Independent innovation • Enables static analysis • Enables application-specific tuning App User Threads OS ravenben@cs.ucsb.edu

  29. Case for User-Level Threads • Decouple programming model and OS • Kernel threads • Abstract hardware • Expose device concurrency • User-level threads • Provide clean programming model • Expose logical concurrency • Benefits of user-level threads • Control over concurrency model! • Independent innovation • Enables static analysis • Enables application-specific tuning App User Threads OS Similar argument to the design of overlay networks ravenben@cs.ucsb.edu

  30. Capriccio Internals • Cooperative user-level threads • Fast context switches • Lightweight synchronization • Kernel Mechanisms • Asynchronous I/O (Linux) • Efficiency • Avoid O(n) operations • Fast, flexible scheduling ravenben@cs.ucsb.edu

  31. Safety: Linked Stacks • The problem: fixed stacks • Overflow vs. wasted space • LinuxThreads: 2MB/stack • Limits thread numbers • The solution: linked stacks • Allocate space as needed • Compiler analysis • Add runtime checkpoints • Guarantee enough space until next check Fixed Stacks overflow waste Linked Stack ravenben@cs.ucsb.edu

  32. Linked Stacks: Algorithm • Parameters • MaxPath • MinChunk • Steps • Break cycles • Trace back • chkpts limit MaxPath length • Special Cases • Function pointers • External calls • Use large stack 3 3 2 5 2 4 3 6 MaxPath = 8 ravenben@cs.ucsb.edu

  33. Linked Stacks: Algorithm • Parameters • MaxPath • MinChunk • Steps • Break cycles • Trace back • chkpts limit MaxPath length • Special Cases • Function pointers • External calls • Use large stack 3 3 2 5 2 4 3 6 MaxPath = 8 ravenben@cs.ucsb.edu

  34. Linked Stacks: Algorithm • Parameters • MaxPath • MinChunk • Steps • Break cycles • Trace back • chkpts limit MaxPath length • Special Cases • Function pointers • External calls • Use large stack 3 3 2 5 2 4 3 6 MaxPath = 8 ravenben@cs.ucsb.edu

  35. Linked Stacks: Algorithm • Parameters • MaxPath • MinChunk • Steps • Break cycles • Trace back • chkpts limit MaxPath length • Special Cases • Function pointers • External calls • Use large stack 3 3 2 5 2 4 3 6 MaxPath = 8 ravenben@cs.ucsb.edu

  36. Linked Stacks: Algorithm • Parameters • MaxPath • MinChunk • Steps • Break cycles • Trace back • chkpts limit MaxPath length • Special Cases • Function pointers • External calls • Use large stack 3 3 2 5 2 4 3 6 MaxPath = 8 ravenben@cs.ucsb.edu

  37. Linked Stacks: Algorithm • Parameters • MaxPath • MinChunk • Steps • Break cycles • Trace back • chkpts limit MaxPath length • Special Cases • Function pointers • External calls • Use large stack 3 3 2 5 2 4 3 6 MaxPath = 8 ravenben@cs.ucsb.edu

  38. Special Cases • Function pointers • categorize f* by # and type of arguments • “guess” which func will/can be called • External functions • users annotate trusted stack bounds on libs • or (re)use a small # of large stack chunks • Result • use/reuse stack chunks much like VM • can efficiently share stack chunks • memory-touch benchmark, factor of 3 reduction in paging cost ravenben@cs.ucsb.edu

  39. Scheduling: Blocking Graph • Lessons from event systems • Break app into stages • Schedule based on stage priorities • Allows SRCT scheduling, finding bottlenecks, etc. • Capriccio does this for threads • Deduce stage with stack traces at blocking points • Prioritize based on runtime information Web Server Accept Read Open Read Close Write Close ravenben@cs.ucsb.edu

  40. Resource-Aware Scheduling • Track resources used along BG edges • Memory, file descriptors, CPU • Predict future from the past • Algorithm • Increase use when underutilized • Decrease use near saturation • Advantages • Operate near the knee w/o thrashing • Automatic admission control Web Server Accept Read Open Read Close Write Close ravenben@cs.ucsb.edu

  41. Pitfalls • What is the max amt of resource? • depends on workload • e.g.: disk thrashing depends on sequential or random seeks • use early signs of thrashing to indicate max capacity • Detecting thrashing • only estimate using “productivity/overhead” • productivity from guessing (threads created, files opened/closed) ravenben@cs.ucsb.edu

  42. Thread Performance • Slightly slower thread creation • Faster context switches • Even with stack traces! • Much faster mutexes Time of thread operations (microseconds) ravenben@cs.ucsb.edu

  43. Runtime Overhead • Tested Apache 2.0.44 • Stack linking • 78% slowdown for null call • 3-4% overall • Resource statistics • 2% (on all the time) • 0.1% (with sampling) • Stack traces • 8% overhead ravenben@cs.ucsb.edu

  44. Microbenchmark: Producer / Consumer ravenben@cs.ucsb.edu

  45. Web Server Performance ravenben@cs.ucsb.edu

  46. Example of “Great Systems Paper” • observe higher level issue • threads vs. event programming abstraction • use previous work (duality) to identify problem • why are threads not as efficient as events? • good systems design • call graph analysis for linked stacks • resource aware scheduling • good execution • full solid implementation • analysis leading to full understanding of detailed issues • cross-area approach (help from PL research) ravenben@cs.ucsb.edu

  47. Acknowledgements • Many slides “borrowed” from the respective talks / papers: • Capriccio (Rob von Behren) • SEDA (Matt Welsh) • Brewer01: “Lessons…” ravenben@cs.ucsb.edu

More Related