Black-box and Gray-box Strategies for Virtual Machine Migration

Black-box and Gray-box Strategies for Virtual Machine Migration Timothy Wood, PrashantShenoy, ArunVenkataramani, and MazinYousif † Univ. of Massachusetts Amherst †Intel 4th USENIX Symposium on Networked Systems Design & Implementation (NSDI 2007)

Introduction • Operate application in data center. • Effective management of data center resources while meeting SLAs • Virtualization • Benefit of Virtualization • Application isolation • Server consolidation(multiplexing) • Handle workload dynamics

Motivation • Efficient data center resource management • Live Migration • However, detecting workload hotspots and initiating a migration is currently handled manually • Lacks the agility to respond to sudden workload changes • Need consider multiple resource • CPU, network, and memory

Solution • Automated black-box and gray-box strategies for virtual machine migration (Sandpiper) • Monitoring system resource usage • Hotspot detection • Determining a new mapping • Initiating the necessary migrations

Determine: What virtual servers should migrate Where to move them How much of a resource to allocate the virtual servers after migration The Sandpiper Architecture Monitors usage profiles to detect hotspots. Hotspot: any resource exceeds a threshold(or SLA violation) for a sustain period Construct resource usage profiles for each virtual server (Predict PM workload) Gathering resource usage statistics on that server Gathers processor, network and memory swap statistics for each VM Implements a daemon to gather OS-level statistics and application logs

Black-box monitoring(1/4) • VM workload usages is inferred solely from external observations. • From Domain-0. • Monitoring parameter: • CPU usage • Network bandwidth • Memory swap rate • Monitoring interval

Black-box monitoring(2/4)-CPU monitoring • VM CPU usage can be determined by tracking scheduling events in the hypervisor. • Does not include VM’s disk IO and network CPU overhead. • These kinds of overhead is count on Domain-0 • Each VM is then charged: • domain-0’s CPU usage*(VM IO request/ total IO requests) • Assumption: the monitoring engine and the nucleus overhead is negligible

Black-box monitoring(3/4)-Network monitoring • Background: • Domain-0 in Xenimplements the network interface driver • VMs access the driver via clean device abstractions(virtual firewall-router (VFR) interface) • Monitoring engine can use the Linux /proc interface VNIC’s usage • /proc/net/dev

Black-box monitoring(4/4)-Memory monitoring • Challenge: • Domain-0 cannot directly monitor each VM’s actual memory usage/utilization. • Only know the amount of memory assigned to the VM. • Solution: • Observing swap activity in Domain-0 can infer the working set sizes.[11] [11] S. Jones, A. Arpaci-Dusseau, and R. Arpaci-Dusseau. Geiger: Monitoring the buffer cache in a virtual machine environment. In Proc. ASPLOS’06, pages 13–23, October 2006.

Gray-box monitoring • Motivation: • Black-box monitoring is not feasible to “peek inside” a VM to gather usage statistics. • Solution: • Install a light-weight monitoring daemon inside each virtual server • Use /proc interface to gather OS-level statistics • CPU, network, memory • Application-level statistics • Daemon get statistics from function provided by application itself • E.g. web/database server: request rate, request drop rate, service time

Profile Generation(1/2) • Profile: a compact description of that server’s resource usage over a sliding time window W. • Profile content: • Blackbox parameter: • CPU utilization, network bandwidth utilization, and memory swap rate • Graybox parameter: • memory utilization, service time, request drop rate and incoming request rate. (assumption: web server-apache)

Profile Generation(2/2) • Profile type: • Distribution profile: • The probability distribution of the resource usage over the window W. • Time series profile: • The temporal fluctuations and it is simply a list of all reported observations within the window W.

Hotspot detection • Goal: • Signaling a need for VM migration whenever SLA violations are detected. • A hotspot is flagged only ifthresholdsorSLAsare exceeded for a sustained time. • at least k/nmost recent observations and the next predicted valueexceed a threshold. • Use time series profile • Formula: (auto-regressive family of predictors-AR(1).)

Resource Provisioning • Goal: • Ensures that the SLAs are not violated even in the presence of peak workloads. • Estimate the peak CPU, network and memory requirement of each overloaded VM • Black-box provisioning • Gray-box provisioning

Black-box provisioning(1/3) • Estimation of peak CPU&Network bandwidth needs: • Distribution profile • Use historical data to predict the peak. • Challenge: Estimation error!! • Background: • Both the CPU scheduler and the network packet scheduler in Xen are work-conserving.

Black-box provisioning(2/3) • Estimation error: • Example: • Two virtual machines that are assigned CPU weights of 1:1(50% of each) • Assume that VM1is overloaded and requires 70% of the CPU to meet its peak needs.

Black-box provisioning(3/3) • Solution of estimation error: • adds a constant Δ to scale up this estimate. • Estimation of peak memory needs: • If swap activity exceeds the threshold. • Then the current allocation is deemed insufficient and is increased by a constant amount Δm

Gray-box provisioning(1/3) • The gray-box approach can access to application-level logs. • Ability to estimate the peak resource needs of the application even when the resource is fully utilized. • Estimating peak CPU needs: • An application modelis necessary to estimate the peak CPU needs.

Gray-box provisioning(2/3) • Estimating peak CPU needs(cont.): • Applications such as web and database servers can be modeled as G/G/1 queuing systems[23]. • G/G/1 queuing system behavior[13]: • = mean service time (obtain from server log) • d = mean response time of request(obtain by SLA) • = request arrival rate • = variance of inter-arrival rate(obtain from server log) • = variance of service time(obtain from server log) [23] B. Urgaonkar, P. Shenoy, A. Chandra, and P. Goyal. Dynamic provisioning for multi-tier internet applications. In Proc. ICAC ’05, June 2005. [13] L. Kleinrock. Queueing Systems, Volume 2: Computer Applications. John Wiley and Sons, Inc., 1976.

Gray-box provisioning(3/3) • Estimating peak CPU needs(cont.): • We can map the current CPU usage with, then the peak CPU usage can be calculated: • Estimating peak network bandwidth • b = mean requested file size

Hotspot mitigation(1/3) • Hotspot mitigation alg: • Goal: • Determine which VM should be migrateto where to dissipate. • Challenge: • NPHard--- multi-dimensional bin packing problem • Bin=physical server, dimension=resource constraints • Solution: • A heuristic which solve: • Which overloaded VMs to migrate • Migrate to where such that migration overhead is minimized. • Migration overhead can not be neglect

Hotspot mitigation(2/3) • Hotspot mitigation alg(cont.): • Intuition: • Move load from the most overloaded servers to the least-loaded servers, • minimize data copying incurred during migration • Volume: the degree of load along multiple dimensions in a unified fashion. • where cpu, net and mem are the corresponding utilizations of that resource for the virtual or physical server

Hotspot mitigation(3/3) • Hotspot mitigation alg(cont.): • volume-to-size ratio (VSR): • Volume/Size(Size=the memory size of the VM) • Migration decision: • Move highest VSR VM from the highest volume serverand determines if it can be housed on the least volume physical server. • Swap decision(only consider 2-way swap): • Activate when simple migration cannot solve hotspot. • Swap the highest VSR VM on the highest volume hotspot server with k lowest VSR VMs in lowest volume server • If a swap cannot be found, the next least loaded server is considered • Note: a swap may require a third server(RAM issue)

Implementation • Virtualization platform: • Xen • Sandpiper Control plane: • Run on the control node(Python) • Profiling Engine: • Use past 200 measurement to generate profile • Hotspot trigger: • 3/5 (k/n) past reading+next predicted over threshold • Default threshold: • 75% • Monitoring Engine: • Gray-box monitoring daemon: • Linux OS daemon, Apache module(service time, request rate, drop rate, file size)

Evaluation Environment • Data center: • 20 server(2.4Ghz pentium-4 servers) • Connected with gigabit ethernet • At least 1GB ram • OS • Linux 2.6.16+Xen 3.0.2-3 • Workload generator • A cluster of Pentium-3 Linux servers

Experiment 1-Migration Effectiveness • Experiment 1 uses 3 physical servers and 5 VMs with memory allocations as following. • All VMs run Apache serving dynamic PHP web pages. • Use httperf to inject a workload

Experiment 1-Migration Effectiveness(cont.) t=362,Hotspot detected, VM4 has 2-nd highest VSR (no PM has enough capacity to host VM3) PM1 has lowest volume t=166,Hotspot detected, VM1 has highest VSR PM3 has lowest volume In final phase VM1 and VM5 the same Volume But VM5 use smaller memory PM2 has lowest volume VM3

Experiment 2- Virtual Machine Swaps • Experiment setting: • As before, clients use httperf to request dynamic PHP pages.

Experiment 2- Virtual Machine Swaps(cont.) Hotspot detected on PM1. The only viable solution is to swap VM2 with VM4. (3 party swap) VM4 use smallest memory, so it is migrated twice. Migration of VM2 is completed, VM4 start to be migrated to PM1. Migration overhead Migration of VM4 is completed, VM2 start to be migrated to PM2.

Experiment 3- Mixed resource workloads • Experiment setting: • VM2 is database that stores its table in memory • PM2 has more physical memory

Experiment 3- Mixed resource workloads(cont.) • PM1 has a network hotspot and PM2 has a CPU hotspot • Sandpiper swaps a network intensive VM for a CPU-intensive VM at t=130

Experiment 3- Mixed resource workloads(cont.) • Sandpiper responds by increasing the RAM allocation in steps of 32MB every time swapping is observed; • When no additional RAM is available, the VM is swapped to the second physical server at t=430. • Swap two Network-intensive VM(VM1 and VM2)

Experiment 4- Gray v. Black: Memory Allocation • Goal: • Compare the effectiveness of the black- and graybox approaches in mitigating memory hotspots • Using the SPECjbb2005 benchmark generate memory usage. • Settings:

Experiment 4- Gray v.s. Black: Memory Allocation(cont.) • Experiment Result: • Observation: • The gray-box system can reduce or eliminate swapping without significant overprovisioning of memory.

Experiment 4- Gray v.s. Black: Apache Performance • Settings: • We use httperf to generate requests for CPU intensive PHP scripts on all VMs.

Experiment 4- Gray v.s. Black: Apache Performance • Black-box strategy error guess: 1 2 3 4

Experiment 4- Gray v.s. Black: Apache Performance • Compare Gray-box strategy with Black-box strategy: • Gray-box strategy can migrate VM3 to PM2 and VM1 to PM3 concurrently

Experiment 5-Prototype Data Center Evaluation • Data Center environment • 16 servers that run a total of 35 VMs. • 1 additional server runs the control plane • 1 additional server is reserved as a scratch node for swaps. • Settings: • Six physical servers running a total of 14 VMs to be overloaded • four servers see a CPU hotspot and two see a network hotspot

Experiment 5-Prototype Data Center Evaluation Migration overhead • Result: • Sandpiper eliminates hotspots on all six servers by interval 60.

Sandpiper overhead and scalability • Sandpiper’s CPU and network overhead: • depends on the number of PMs and VMs in the data center. • Overhead of Graybox strategy may affected by the size of application-level statistics gathered

Sandpiper overhead and scalability(cont.) • Nucleus overhead: • Network: • Each report uses only 288 bytes per VM • The resulting overhead on a gigabit LAN is negligible • CPU usage: • Compare the performance of a CPU benchmark with and without our resource monitors running. • On a single physical server running 24 concurrent VMs, • Nucleus overheads reduce the CPU benchmark by approximately 1%.

Sandpiper overhead and scalability(cont.) • Control Plane Scalability: • Source of computation complexity • Computation of a new mapping of virtual machines to physical servers after detecting a hotspot

Conclusion&future work • In this paper, we proposed Sandpiper, a automatic system which can: • monitoring and detecting hotspots • determining a new mapping of physical to virtual resources • initiating the necessary migrations • We discussed a blackbox strategy and graybox strategy. • Evaluation showed we can bring rapid hotspot elimination in data center environments. • Future work: • Support replicated services • automatically determining whether to migrate a VM or to spawn a replica.

Comment • Advantage: • Good point to separate the monitoring strategy in blackbox and graybox. • Sandpiper’s architecture and strategy may fit our “Plan A” • Shortage: • The relationship of CPU utilization and request rate may not be linear • The hotspot mitigation algorithm only consider average the workload between physical machine • Should consider how to make PM get highest utilization without hotspot

Black-box and Gray-box Strategies for Virtual Machine Migration

Black-box and Gray-box Strategies for Virtual Machine Migration

Presentation Transcript

Black Box Checking

White Box vs. Black Box Testing

Black Box Testing

Gray box

BLACK BOX TESTING

Black box testing

Black Box Testing

Black-box and Gray-box Strategies for Virtual Machine Migration

Black-box (oracle)

Black Box Testing

Black Box Electronics

Gray-Box Design

Black Box

Black Box Electronics

Income black box

White Box and Black Box Testing

Black-Box Testing

Black Box Testing

Black-box (oracle)