Dynamic Resource Management in Internet Hosting Platforms

Dynamic Resource Management in Internet Hosting Platforms Ph.D. Thesis Defense Bhuvan Urgaonkar Advisor: Prashant Shenoy

Internet Applications • Proliferation of Internet applications auction site online game online retail store • Growing significance in personal, business affairs • Focus: Internet server applications

Hosting Platforms • Data Centers • Clusters of servers • Storage devices • High-speed interconnect • Hosting platforms: • Rent resources to third-party applications • Performance guarantees in return for revenue • Benefits: • Applications: don’t need to maintain their own infrastructure • Rent server resources, possibly on demand • Platform provider: generates revenue by renting resources

Goals of a Hosting Platform • Meet service-level agreements • Satisfy application performance guarantees • E.g., average response time, throughput • Maximize revenue • E.g., maximize the number of hosted applications Question:How should a hosting platform manage its resources to meet these goals?

140000 120000 100000 80000 Request Rate (req/min) 60000 40000 20000 0 0 5 10 15 20 Time (hrs) Challenge #1: Dynamic Workloads 1200 • Multi-time-scale variations • Time-of-day, hour-of-day • Overloads • E.g., Flash crowds • User threshold for response time: 8-10 s • Key issue: How to provide good response time under varying workloads? 0 0 1 2 3 4 5 Time (days) Arrivals per min 140K 0 0 12 24 Time (hours)

Challenge #2: Complexity of Applications • Complex software architecture • Diverse software components • Web servers, Java application servers, databases • Multiple classes of clients • How to provide differentiated service? • Replicable components • How many replicas to have? • Tunable configuration parameters • E.g., MaxClient in Apache • How to set these parameters? • Key issue: How to capture all this complexity?

Talk Outline • Motivation • Thesis contributions • Application modeling • Dynamic provisioning • Scalable request policing • Conclusions

Hosting Platform Models • Small applications • Require only a fraction of a server • Shared Web hosting, $20/month to run own Web site • Shared hosting: multiple applications on a server • Co-located applications compete for server resources

Hosting Platform Models • Large applications • May span multiple servers • eBay site uses thousands of servers! • Dedicated hosting: at most one application per server • Allocation at the granularity of a single server

Thesis Contributions Dynamic resource management in hosting platforms Shared Hosting • Statistical multiplexing and under-provisioning [OSDI 2002] • Application placement [PDCS 2004] Dedicated Hosting • Analytical model for an Internet application [SIGMETRICS 2005] • Dynamic provisioning [Autonomic Computing 2005] • Scalable request policing [PODC 2004, WWW 2005]

Internet Application Architecture queries search “moby” response Melville’s ‘Moby Dick’ Music CDs by Moby HTTP J2EE Database request processing in an online bookstore • Multi-tier architecture • Each tier uses services provided by its successor • Session-based workloads

SIGMETRICS’05 Baseline Application Model • Model consists of two components • Sub-system to capture behavior of clients • Sub-system to capture request processing inside the application clients application

Modeling Clients • Clients think between successive requests • Infinite server system to capture think time Z • Captures independence of Z from processing in application Z Client 1 Z Client 2 clients application Z Client N Q0

Modeling Request Processing p1 p3 pM=1 p2 S1 S2 SM Q1 Q2 QM N tier 1 tier 2 tier M • Transitions defined to capture circulation of requests • Request may move to next queue or previous queue • Multiple requests are processed concurrently at tiers • Processor sharing scheduling discipline • Caching effects get captured implicitly!

Putting It All Together p1 p3 pM=1 • A closed-queuing model that captures a given number of simultaneous sessions being served Z p2 S1 S2 SM client Z client Q1 Q2 QM Q0 N tier 1 tier 2 tier M

Mean-value Analysis • Product-form closed queuing network • Lm: average length of Qm • Am: average number of clients in Qm seen by arriving client • Am (n+1) = Lm (n) • Iterative algorithm to compute mean queue lengths, sojourn times 1 client n client Q1 Q2 QM n+1 Q0 client A1(n+1)= L1(n) A2(n+1)= L2(n) AM(n+1)= LM(n)

Parameter Estimation • Visit ratios • Equivalent to trans. probs. for MVA • Vi ≈ λi / λreq ; λreq at sentry, λifrom logs • Service times • Use residence time Xi logged at tier i • For last tier, SM ≈ XM • Si = Xi – ( Vi+1 / Vi ) ·Xi+1 • Think time • Measured at the application sentry

30000 Observed 25000 Basic Model 20000 Avg resp time (msec) 15000 10000 5000 0 0 100 200 300 400 500 Num sessions Evaluation of Baseline Model • Auction site RUBiS • One server per tier Apache JBOSS Mysql 150 75 • Concurrency limits not captured

Handling Concurrency Limits Z • Requests may be dropped due to concurrency limits • Need to model the finiteness of queues! S1 S2 SM Z Q1 Q2 QM Q0 N dropped requests

Handling Concurrency Limits Z • Approach: Subsystems to capture dropped requests • Distinguish the processing of dropped requests S1 S2 SM Z Q1 Q2 QM Q0 N drop Q1 QM drop pM drop p1 drop drop drop S1 SM

Estimating Drop Probabilities and Delay Values • Drop probability • Step 1: Estimate throughput using MVA assuming no concurrency limits • Step 2: Estimate pidrop as the drop probability of M/M/1/Ki queue • Delay value for tier i • Subject the application to offline workload that causes limit to be exceeded only at tier i; record response time of failed requests Ki t t*(1-pidrop) Tput=t t*pidrop Low limit High limit High limit

30000 Observed 25000 Basic Model Enh Model 20000 Avg resp time (msec) 15000 10000 5000 0 0 100 200 300 400 500 Num sessions Response Time Prediction • Enhanced model can capture concurrency limits

JBOSS Replication and Load Imbalances • Causes of imbalance • “Sticky” sessions • Variation in session durations and resource requirements • Imbalance factor for jth most-loaded replica of tier i • imbalance(i, j) = num_arrivals(i, j) / num_arrivals(i) • Scale visit ratio • Vi, j = Vi * imbalance(i, j) Apache Mysql JBOSS

JBOSS Capturing Load Imbalance Response times (based on load) Number of requests (per-replica) 1800 1000 1600 800 1400 Replica 1 Least loaded 1200 600 Number of requests Replica 2 Medium loaded 1000 Avg. resp. time (msec) 400 Replica 3 800 Most loaded 600 200 Average 400 0 200 30 90 150 210 210 270 0 Time (sec) Observed Perfect Load balancing Enhanced Model • Session affinity causes load imbalance • Imbalance shifts among replicas • Our enhancement helps improve response time prediction Mysql Apache JBOSS

Auto. Computing’05 Dynamic Provisioning • Key idea: increase or decrease allocated servers to handle workload fluctuations • Monitor incoming workload • Compute current or future demand • Match number of allocated servers to demand Monitor workload Compute current/ future demand Adjust allocation

Dynamic Provisioning at Multiple Time-scales • Predictive provisioning • Certain Internet workloads patterns can be predicted • E.g., time-of-day effects, increased workload during Thanksgiving • Provision using model at time-scale of hours or days • Reactive provisioning • Applications may see unpredictable fluctuations • E.g., Increased workload to news-sites after an earthquake • Detect such anomalies and react fast (minutes)

Request Policing Sentry policing • Key Idea: If incoming req. rate > current capacity • Turn away excess requests • Why police when you can provision? • Provisioning is not instantaneous • Residual sessions on reallocated server • Application and OS installation and configuration overheads • Overhead of several (5-30) minutes drop

Existing Work • Lots of existing work on request policing • [Kanodia00, Li00, Verma03, Welsh03,Abdelzaher99, …] • Shortcomings of existing work: • Does not attempt to integrate policing and provisioning • Does not address scalability of the policer! • The policer itself may become the bottleneck during overloads

Policer: Design Goals • Each class should sustain its guaranteed admission rate • Class-based differentiation and revenue maximization • Challenging due to online nature of the problem • An admitted request may cause a more important request arriving later to be dropped • Approach: Preferential admission to higher class requests • Scalability • The policer should remain operational even under extremely high arrival rates

PODC’04 / WWW’05 Overview of Policer Design Admission control • Our policer has three components • Request classifier and per-class leaky buckets • Class-specific queues • Admission control dgold Class gold admitted dsilver Class silver Classifier dropped dbronze Class bronze Leaky buckets Class-specific queues

Class-based Differentiation Admission control dgold Class gold admitted dsilver Class silver Classifier dropped dbronze Class bronze Leaky buckets Class-specific queues • Each incoming request undergoes classification • Per-class leaky buckets used to ensure that rates guaranteed in SLA are admitted

Revenue Maximization Admission control dgold Class gold admitted dsilver Class silver Classifier dropped dbronze Class bronze Leaky buckets Class-specific queues • Idea: Different delays in processing requests of different classes • More important requests processed more frequently • Methodology to compute delay values in online manner • Bounds probability of a request denying admission to a more important request [Appendix B of thesis]

Admission Control Admission control dgold Class gold admitted dsilver Class silver Classifier dropped dbronze Class bronze Leaky buckets Class-specific queues • Goal: Ensure that an admitted request meets its response time target • Measurement-based admission control algorithm • Use information about current load on servers and estimated size of new request to make decision

Scalability of Admission Control • Idea #1: Reduce the per-request admission control cost • Admission control on every request may be expensive • Bursty arrivals during overloads => batches get formed • Delays for class-based differentiation => batches get formed • Admission control that operates on batches instead of requests • Idea #2: Sacrifice accuracy for computational overhead • When batch-based processing becomes prohibitive • Threshold-based scheme • E.g., Admit all Gold requests, drop all Silver and Bronze requests • Thresholds chosen based on observed arrival rates and service times • Extremely efficient • Wrong threshold => bad response times or fewer requests admitted

Scaling Even Further … • Protocol processing overheads will saturate sentry resources at extremely high arrival rates • Indiscriminate dropping of requests will occur • Important requests may be turned away without even undergoing the admission control test • Loss in revenue! • Sentry should still be able to process each arriving request! • Idea: Dynamic capacity provisioning for sentry • Pull in an additional sentry if CPU utilization of existing sentries exceeds a threshold (e.g., 90%) • Round-robin DNS to load balance among sentries

Fraction admitted 1 0.8 Fraction admitted Gold 0.6 Silver 0.4 Bronze 0.2 0 0 100 200 300 400 500 Time (sec) Class-based Differentiation • Three classes of requests: Gold, Silver, Bronze • Policer successful in providing preferential admission to important requests

Threshold-based: Higher Scalability • Threshold-based processing allows the policer to handle upto 4 times higher arrival rate • Single sentry can handle about 19000 req/s

Threshold-based: Loss of Accuracy • Higher scalability comes at a loss in accuracy of admission control • More violations of response time targets

Talk Outline • Motivation • Thesis contributions • Application modeling • Dynamic provisioning • Scalable request policing • Summary and Future Research

Thesis Contributions Dynamic resource management in hosting platforms Shared Hosting • Statistical multiplexing and under-provisioning [OSDI 2002] • Application placement [PDCS 2004] Dedicated Hosting • Analytical model for Internet applications [SIGMETRICS 2005] • Dynamic provisioning [Autonomic Computing 2005] • Scalable request policing [PODC 2004, WWW 2005]

Future Research Directions • Virtual machine based hosting • Recent research has shown feasibility of migrating VMs across nodes • Adds a new dimension to the capacity provisioning problem • Characterizing multi-tier workloads • Workloads for standalone Web servers are well-characterized • E.g., typical service times at Java tier or query processing times? • Offshoot of this study: workloads generators for multi-tier applications • Automated determination of provisioning parameters • Predictor and reactor invoked based on manually chosen frequencies • System administrators use rules-of-thumb => error-prone

Thanks to … • Advisor Prashant Shenoy • Thesis committee Emery Berger, Jim Kurose, Don Towsley, Tilman Wolf • Collaborators Abhishek Chandra, Pawan Goyal, Giovanni Pacifici, Timothy Roscoe, Arnold Rosenberg, Mike Spreitzer, Asser Tantawi • All my teachers Paul Cohen, Mani Krishna, Don Towsley • Friends and family

Questions or comments?

Query Caching at the Database • Caching effects • Captured by tuning Vi and/or Si • Bulletin-board site RUBBoS • 50 sessions • SELECT SQL_NO_CACHE causes Mysql to not cache the response to a query

Agile Switching Using Virtual Machine Monitors • VMMs allow multiple “virtual” m/c on a server • E.g., Xen, VMWare, … • Use VMMs to enable fast switching of servers • Switching time only limited by residual sessions dormant dormant active active VM1 VM1 VM2 VM3 VM2 VM3 VMM VMM

Apps Apps Apps Nucleus Nucleus Nucleus OS OS OS Prototype Data Center Server Node • 40+ Linux servers • Gigabit switches • Multi-tier applications • Auction (RUBiS) • Bulletin-board (RUBBoS) • Apache, JBOSS (replicable) • Mysql database Application capsules Sentries Resource monitoring Parameter estimation Control Plane Application placement Dynamic provisioning

Arrival rate 50000 40000 30000 Total arrival Arrival rate (req/s) Arrival at sentry 1 20000 10000 0 0 100 200 300 400 500 600 Time (sec) Sentry Provisioning (XXX)

Apps Apps Apps Nucleus Nucleus Nucleus OS OS OS System Overview Server Node • Control Plane • Centralized resource manager • Nucleus • Per-server measurements and resource management • Sentry • Per-application admission control • Capsule • Component of an application running on a server Application capsules Sentries Resource monitoring Parameter estimation Control Plane Application placement Dynamic provisioning

Dynamic Resource Management in Internet Hosting Platforms

Dynamic Resource Management in Internet Hosting Platforms

Presentation Transcript

Dynamic Resource Management in a Static Network Operating System

Resource Management for Dynamic Service Chain Adaptation

“Creating Platforms by Hosting Rivals”

Dynamic Resource Management

Dynamic Resource Allocation in Conservation Planning

Internet Number Resource Management AFRINIC Update

Internet Connectivity in Resource centres :

DYNAMIC HOST REGISTRATION -- INTERNET GROUP MANAGEMENT PROTOCOL

Resource Overbooking and Application Profiling in Shared Hosting Platforms

APNIC Internet Resource Management and Internet Infrastructure Support

Dynamic Resource Management in Internet Data Centers

Welcome! APNIC Internet Resource Management Seminar

Internet Number Resource Management

Dynamic Resource Management for Virtualization HPC Environments

Dynamic Resource Allocation in OFDMA Systems

Resource Management for Dynamic Service Chain Adaptation

Resource Overbooking and Application Profiling in Shared Hosting Platforms

Welcome! APNIC Internet Resource Management Seminar

Welcome! APNIC Internet Resource Management Seminar

Dynamic Dedicated Server - Dynamic Hosting