1 / 27

Everyone Knows it’s Windy The Association U C SD

Everyone Knows it’s Windy The Association U C SD. Forecasting Resource Performance with The Network Weather Service. Rich Wolski U C SD and UT , Knoxville. Where should I run?. vbns. SDSC. C T94. SP-2.  -Farm. Sun. T-3E. C. The Internet. AAI/ATD Net. UCSD PCL. The Problem.

yaholo
Download Presentation

Everyone Knows it’s Windy The Association U C SD

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Everyone Knows it’s Windy The Association UCSD

  2. Forecasting Resource Performance withThe Network Weather Service Rich Wolski UCSD and UT, Knoxville

  3. Where should I run? vbns SDSC CT94 SP-2 -Farm Sun T-3E C The Internet AAI/ATD Net UCSD PCL

  4. The Problem • How much of each resource can I get? • Who am I? => my priority • Who owns the resource? => local management policy • Who else is using it? => contention • Idea: Use performance history to produce a quantifiable measure of availability • Large body of prediction theory • Allows disparate resources to be compared • Can be evaluated by a computer program => dynamic scheduling

  5. Forecasting Resource Performance • The Network Weather Service (NWS) • Monitors the deliverable performance available from a distributed resource set. • Forecasts future performance levels using statistical forecasting models. • Reports monitor and forecast data to interested client schedulers, applications, visual interfaces, etc. • Generally available Grid service • portable, extensible, robust, scalable, etc-able

  6. Architecture network machine cpu sensor memory sensor network sensor Sensors A P I Persistent State Reporting Forecasting

  7. Example: Predicting CPU Availability • Sensors • How accurate are the measurements? • Forecasts • What are the best prediction techniques? • How accurate are the forecasts they generate? Is it possible to predict CPU availability?

  8. CPU Measurements • Three measurers: • uptime -- one-minute smoothed run queue length • vmstat -- idle, user active, and system active CPU percentages • NWS CPU Sensor -- adaptive combination of both • Ground truth is a ten-second “spinning” process • Study: Ten second periodicity, 24 hour epoch, one busy grad. Student

  9. Uptime and Truth

  10. Vmstat and Truth

  11. NWS CPU Sensor and Truth

  12. Wrong in three ways • measurement error • the difference between what the process observed and what was measured via uptime, vmstat, or the NWS • forecasting error • the difference between a forecast in a time series and the value being forecast in that series • actual error • the difference between what was forecast and what the test process actually observed

  13. Measurement Error

  14. Forecasting CPU Availability What will the next value be?

  15. Simple Techniques • Averaging • running average • sliding window averages • adaptive window averages • Noise rejection • sliding window median • adaptive median average

  16. More Exotic Fortune Tellers • Exponential smoothing • used by “Van Jacobson improvements” to estimate TCP packet round-trip-time Pred = (1-G)*(Previous Pred) + G*(Current Value) • Autoregressive Moving Average (ARMA) Models Xt = S[a(t-k)X(t-k)] + S[b(t-k)e(t-k)] • State transition (aka hidden Markov models) Which model is best for a given series?

  17. Dynamically Choosing a Model • Run all models in parallel and obtain a prediction from each for the next measurement. • Calculate the error made by each prediction when the measurement is eventually taken. • When a forecast is required, use the latest prediction made by the model with the lowest cumulative error measure. • Mean Square Error (MSE) • Mean Absolute Error (MAE)

  18. Actual Errors

  19. Quantiles

  20. A Cluster of Workstations: Actual Error

  21. Longer Lived Predictions • Predicting next value may be possible, but predicting specific future values is potentially difficult. • Applications often need a prediction of the average availability over a specified lifetime Can we predict average availability over longer periods?

  22. Forecast Error Only

  23. Measurement Residuals

  24. Run at UCSD or SDSC? vbns SDSC CT94 SP-2 -Farm Sun T-3E C The Internet AAI/ATD Net UCSD PCL

  25. Running on a Batch System

  26. So... • Predicting the next measurement is easy... • P. Dinda and D. O’ Halloran at CMU • M. Harchol-Balter and A. Downey, SIGMETRICS, 1996. • AppLeS miscreants at UCSD • …but making the next measurement is hard. • Uptime works better than expected • NWS sensor handles “nice” problem • Measurement error can vary by method

  27. More “So…” • Longer range forecasting of aggregate availability looks possible • Forecasting error does not explode • Measurement error is almost symmetric • For application scheduling, time-shared clusters are an attractive target, and (as of yet) batch scheduling systems are not. • Open Question: What fraction, if any, of Grid compute resources should be managed as time-shared clusters?

More Related