Scalable Internet Services

Scalable Internet Services Cluster Lessons and Architecture Design for Scalable Services BR01, TACC, Porcupine, SEDA and Capriccio

Outline • Overview of cluster services • lessons from giant-scale services (BR01) • SEDA (staged event-driven architecture) • Capriccio ravenben@cs.ucsb.edu

Scalable Servers • Clustered services • natural platform for large web services • search engines, DB servers, transactional servers • Key benefit • low cost of computing, COTS vs. SMP • incremental scalability • load balance traffic/requests across servers • Extension from single server model • reliable/fast communication, but partitioned data ravenben@cs.ucsb.edu

Goals • Failure transparency • hot-swapping components w/o loss of avail • homogeneous functionality and/or replication • Load balancing • partition data / requests for max service rate • need to colocate requests w/ associated data • Scalability • aggregate performance should scale w/# of servers ravenben@cs.ucsb.edu

Two Different Models • Read-mostly data • web servers, DB servers, search engines (query) • replicate across servers + (RR DNS / redirector) client client client client client IP Network (WAN) Round Robin DNS ravenben@cs.ucsb.edu

Two Different Models … • Read-write model • mail servers, e-commerce sites, hosted services • small(er) replication factor for stronger consistency client client client client client IP Network (WAN) Load Redirector ravenben@cs.ucsb.edu

Key Architecture Challenges • Providing high availability • availability across component failures • Handling flash crowds / peak load • need support for massive concurrency • Other challenges • upgradability: maintaining availability and minimal cost during upgrades in S/W, H/W, functionality • error diagnosis: fast isolation of failures / performance degradation ravenben@cs.ucsb.edu

Nuggets • Definition • uptime = (MTBF – MTTR)/MTBF • yield = queries completed / queries offered • harvest = data available / complete data • MTTR • at least as important at MTBF • much easier to tune and quantify • DQ principle • data/query x queries/second  constant • physical bottlenecks limit overall throughput ravenben@cs.ucsb.edu

Staged Event-driven Architecture • SEDA (SOSP’05) ravenben@cs.ucsb.edu

Break… • Come back in 5 mins • more on threads vs. events… ravenben@cs.ucsb.edu

applications application programming interface Dynamic Tap. core router Patchwork distance map network SEDA event-driven framework Java Virtual Machine Tapestry Software Architecture ravenben@cs.ucsb.edu

? A ? B ? C Network ? ? ? ? Impact of Correlated Events + + = • web / application servers • independent requests • maximize individual throughput event handler • correlated requests: A+B+CD • e.g. online continuous queries, sensor aggregation, p2p control layer, streaming data mining ravenben@cs.ucsb.edu

Capriccio • User-level light-weight threads (SOSP03) • Argument • threads are the natural programming model • current problems result of implementation • not fundamental flaw • Approach • aim for massive scalability • compiler assistance • linked stacks, block graph scheduling ravenben@cs.ucsb.edu

The Price of Concurrency • Why is concurrency hard? • Race conditions • Code complexity • Scalability (no O(n) operations) • Scheduling & resource sensitivity • Inevitable overload • Performance vs. Programmability • No good solution Threads Ideal Ease of Programming Events Threads Performance ravenben@cs.ucsb.edu

The Answer: Better Threads • Goals • Simple programming model • Good tools & infrastructure • Languages, compilers, debuggers, etc. • Good performance • Claims • Threads are preferable to events • User-Level threads are key ravenben@cs.ucsb.edu

“But Events Are Better!” • Recent arguments for events • Lower runtime overhead • Better live state management • Inexpensive synchronization • More flexible control flow • Better scheduling and locality • All true but… • Lauer & Needham duality argument • Criticisms of specific threads packages • No inherent problem with threads! ravenben@cs.ucsb.edu

Criticism: Runtime Overhead • Criticism: Threads don’t perform well for high concurrency • Response • Avoid O(n) operations • Minimize context switch overhead • Simple scalability test • Slightly modified GNU Pth • Thread-per-task vs. single thread • Same performance! ravenben@cs.ucsb.edu

Criticism: Synchronization • Criticism: Thread synchronization is heavyweight • Response • Cooperative multitasking works for threads, too! • Also presents same problems • Starvation & fairness • Multiprocessors • Unexpected blocking (page faults, etc.) • Both regimes need help • Compiler / language support for concurrency • Better OS primitives ravenben@cs.ucsb.edu

Criticism: Scheduling • Criticism: Thread schedulers are too generic • Can’t use application-specific information • Response • 2D scheduling: task & program location • Threads schedule based on task only • Events schedule by location (e.g. SEDA) • Allows batching • Allows prediction for SRCT • Threads can use 2D, too! • Runtime system tracks current location • Call graph allows prediction Task Program Location Events Threads ravenben@cs.ucsb.edu

900 800 KnotC (Favor Connections) KnotA (Favor Accept) 700 Haboob Mbits / second 600 500 400 300 200 100 0 1 4 16 64 256 1024 4096 16384 Concurrent Clients The Proof’s in the Pudding • User-level threads package • Subset of pthreads • Intercept blocking system calls • No O(n) operations • Support > 100K threads • 5000 lines of C code • Simple web server: Knot • 700 lines of C code • Similar performance • Linear increase, then steady • Drop-off due to poll() overhead ravenben@cs.ucsb.edu

Arguments For Threads • More natural programming model • Control flow is more apparent • Exception handling is easier • State management is automatic • Better fit with current tools & hardware • Better existing infrastructure ravenben@cs.ucsb.edu

Why Threads: control Flow • Events obscure control flow • For programmers and tools Web Server AcceptConn. ReadRequest PinCache ReadFile WriteResponse Exit ravenben@cs.ucsb.edu

Why Threads: Exceptions • Exceptions complicate control flow • Harder to understand program flow • Cause bugs in cleanup code Web Server AcceptConn. ReadRequest PinCache ReadFile WriteResponse Exit ravenben@cs.ucsb.edu

Why Threads: State Management • Events require manual state management • Hard to know when to free • Use GC or risk bugs Web Server AcceptConn. ReadRequest PinCache ReadFile WriteResponse Exit ravenben@cs.ucsb.edu

Why Threads: Existing Infrastructure • Lots of infrastructure for threads • Debuggers • Languages & compilers • Consequences • More amenable to analysis • Less effort to get working systems ravenben@cs.ucsb.edu

Building Better Threads • Goals • Simplify the programming model • Thread per concurrent activity • Scalability (100K+ threads) • Support existing APIs and tools • Automate application-specific customization • Mechanisms • User-level threads • Plumbing: avoid O(n) operations • Compile-time analysis • Run-time analysis ravenben@cs.ucsb.edu

Case for User-Level Threads • Decouple programming model and OS • Kernel threads • Abstract hardware • Expose device concurrency • User-level threads • Provide clean programming model • Expose logical concurrency • Benefits of user-level threads • Control over concurrency model! • Independent innovation • Enables static analysis • Enables application-specific tuning App User Threads OS ravenben@cs.ucsb.edu

Case for User-Level Threads • Decouple programming model and OS • Kernel threads • Abstract hardware • Expose device concurrency • User-level threads • Provide clean programming model • Expose logical concurrency • Benefits of user-level threads • Control over concurrency model! • Independent innovation • Enables static analysis • Enables application-specific tuning App User Threads OS Similar argument to the design of overlay networks ravenben@cs.ucsb.edu

Capriccio Internals • Cooperative user-level threads • Fast context switches • Lightweight synchronization • Kernel Mechanisms • Asynchronous I/O (Linux) • Efficiency • Avoid O(n) operations • Fast, flexible scheduling ravenben@cs.ucsb.edu

Safety: Linked Stacks • The problem: fixed stacks • Overflow vs. wasted space • LinuxThreads: 2MB/stack • Limits thread numbers • The solution: linked stacks • Allocate space as needed • Compiler analysis • Add runtime checkpoints • Guarantee enough space until next check Fixed Stacks overflow waste Linked Stack ravenben@cs.ucsb.edu

Linked Stacks: Algorithm • Parameters • MaxPath • MinChunk • Steps • Break cycles • Trace back • chkpts limit MaxPath length • Special Cases • Function pointers • External calls • Use large stack 3 3 2 5 2 4 3 6 MaxPath = 8 ravenben@cs.ucsb.edu

Special Cases • Function pointers • categorize f* by # and type of arguments • “guess” which func will/can be called • External functions • users annotate trusted stack bounds on libs • or (re)use a small # of large stack chunks • Result • use/reuse stack chunks much like VM • can efficiently share stack chunks • memory-touch benchmark, factor of 3 reduction in paging cost ravenben@cs.ucsb.edu

Scheduling: Blocking Graph • Lessons from event systems • Break app into stages • Schedule based on stage priorities • Allows SRCT scheduling, finding bottlenecks, etc. • Capriccio does this for threads • Deduce stage with stack traces at blocking points • Prioritize based on runtime information Web Server Accept Read Open Read Close Write Close ravenben@cs.ucsb.edu

Resource-Aware Scheduling • Track resources used along BG edges • Memory, file descriptors, CPU • Predict future from the past • Algorithm • Increase use when underutilized • Decrease use near saturation • Advantages • Operate near the knee w/o thrashing • Automatic admission control Web Server Accept Read Open Read Close Write Close ravenben@cs.ucsb.edu

Pitfalls • What is the max amt of resource? • depends on workload • e.g.: disk thrashing depends on sequential or random seeks • use early signs of thrashing to indicate max capacity • Detecting thrashing • only estimate using “productivity/overhead” • productivity from guessing (threads created, files opened/closed) ravenben@cs.ucsb.edu

Thread Performance • Slightly slower thread creation • Faster context switches • Even with stack traces! • Much faster mutexes Time of thread operations (microseconds) ravenben@cs.ucsb.edu

Runtime Overhead • Tested Apache 2.0.44 • Stack linking • 78% slowdown for null call • 3-4% overall • Resource statistics • 2% (on all the time) • 0.1% (with sampling) • Stack traces • 8% overhead ravenben@cs.ucsb.edu

Microbenchmark: Producer / Consumer ravenben@cs.ucsb.edu

Web Server Performance ravenben@cs.ucsb.edu

Example of “Great Systems Paper” • observe higher level issue • threads vs. event programming abstraction • use previous work (duality) to identify problem • why are threads not as efficient as events? • good systems design • call graph analysis for linked stacks • resource aware scheduling • good execution • full solid implementation • analysis leading to full understanding of detailed issues • cross-area approach (help from PL research) ravenben@cs.ucsb.edu

Acknowledgements • Many slides “borrowed” from the respective talks / papers: • Capriccio (Rob von Behren) • SEDA (Matt Welsh) • Brewer01: “Lessons…” ravenben@cs.ucsb.edu

Scalable Internet Services

Scalable Internet Services

Presentation Transcript

20-755: The Internet Lecture 12: Scalable services

Capriccio: Scalable Threads for Internet Service

Capriccio: Scalable Threads for Internet Services (von Behren)

Capriccio: Scalable Threads for Internet Services

A Scalable Internet Architecture

Systems Issues for Scalable, Fault Tolerant Internet Services

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Reliable and Scalable Internet Telephony

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Capriccio: Scalable Threads For Internet Services

Constructing Scalable Services

Capriccio: Scalable Threads for Internet Services

Capriccio: Scalable Threads for Internet Services

Scalable Group Communication for the Internet

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Capriccio: Scalable Threads for Internet Services

Capriccio: Scalable Threads for Internet Services

SEDA An architecture for Well-Conditioned, scalable Internet Services

Reliable and Scalable Internet Telephony