1 / 23

Control-based Quality Adaptation in Data Stream Management Systems (DSMS)

Control-based Quality Adaptation in Data Stream Management Systems (DSMS). Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song Liu ¥ † Department of Computer Sciences, Purdue University, USA ‡ School of Computing Science, Simon Fraser University at Surrey, Canada

tbrenda
Download Presentation

Control-based Quality Adaptation in Data Stream Management Systems (DSMS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song Liu¥ †Department of Computer Sciences, Purdue University, USA ‡School of Computing Science, Simon Fraser University at Surrey, Canada ¥School of Mechanical Engineering, Purdue University, USA DEXA 2005

  2. Data Stream Management • Continuous data, discarded after being processed • Continuous query • Data-active query-passive model • Applications • Financial analysis • Mobile services • Sensor networks • Network monitoring • More … DEXA 2005

  3. DSMS architecture • Network of query operators (O1 – O3) • Each operator has its own queue (q1 – q4) • Scheduler decides which operator to execute • Query results (Q1, Q2) pushed to clients • Example systems: • Aurora/Borealis • STREAM DEXA 2005

  4. Quality-of-Service (QoS) in DSM • Data processing is QoS-critical in DSMS • Tuple delay is the major concern: results generated from old data are useless! • Highly dynamic environment  hard to maintain QoS • Bursty data input • Unpredictable unit processing cost • Overloading during spikes  degraded (delay) QoS • Solution: adjust the following (i.e. quality adaptation) • Sampling rate (source side) • Data loss (DSMS side)  load shedding DEXA 2005

  5. Load Shedding • Eliminating excessive load by dropping data items  less QoS violations • Basic algorithm (Tatbul et al., 2003): periodically • CPU is the bottlenecking resource • Key questions • When? • How much? • Where? • Which tuples? DEXA 2005

  6. What’s missing? • Current solutions focus on steady-state performance • Assuming input level changes between stable states • However, arrivals are bursty in practice – always in transient state • Taking averages (baseline) wouldn’t work DEXA 2005

  7. Our approach • View load shedding as a feedback control problem • Feedback Control: manipulation of system behavior by adjusting system input based on system output • Cruise control of automobiles, room temperature control, etc. • The feedback control loop: • Plant • Monitor • Controller • Actuator • How it works • Error = measured output – desirable output • Focal point: controller, which maps error to control signal DEXA 2005

  8. Why Feedback Control ? • Maintain system performance under internal/external uncertainties • Control theory provides tools to choose and tune controller toward desired performance • Current load shedding solution is also feedback-based • Difference: we use control theory to guide the controller design • Steps of problem-solving using control theory • Mapping problem to feedback control loop, determine input/output • System identification: modeling input/output relationship • Controller design: can be done analytically DEXA 2005

  9. The feedback control loop • Plant : current DSMS • Input : load admitted • Output : delay QoS • Reference output: specified by DBA • Actuator • adaptor: load shedder • admission controller • Monitor : new • Controller : new • System dynamics: disturbances • Discrete control: control period T DEXA 2005

  10. System identification • To build dynamic model that describes the relationship between input and output • Most system can be modeled by the following linear difference equation: • I(x): input at period x • O(x): output at period x • n: order of the equation • ai, bi: system-specificcoefficients • Determine n, ai, biby experiments using synthetic inputs DEXA 2005

  11. Controller design • PI controller: • E(k) : error • g, r: controller coefficients • Id(k) : desirable input • More efficiently: • Transfer function of the PI controller: • For example, a second order system has TF: • Closed-loop TF (CLTF): • determine g and r by pole placement of the CLTF (details skipped) DEXA 2005

  12. Actuator (load shedder) design • Id(k) is the desirable load (# of data tuples) entering the DSMS during the next control period k • Let S(k) be the real load during period k, we need to discardS(k) - Id(k) tuples • Two implementations of load shedder: • Admit the first Id(k) tuples during period k • Pros: easy to implement, generate (100%) accurate control signal • Cons: skewed to the early arrivals • Sampling based shedding: each tuple is discarded with probability 1-p, i.e.p = Id(k) / S(k) • However,S(k) is unknown at the beginning of periodk • Solution: use S(k-1) to estimate S(k) and this does not affect controller performance (see backup slide) DEXA 2005

  13. Determining control period • Control period T is critical in controller design • Two primary concerns in setting T • Should be short enough to capture the changes of input rate • Nyquist-Shannon theorem of sampling • The shorter the better • Output signal (delay) is measured as an average of all data tuples in one control period • T is too short  small number of sampled tuples • T cannot be too short as the output signal may fail to represent real system status • We make tradeoffs between the above two factors and set T to one second DEXA 2005

  14. Experiments • We evaluate our control-based solution by simulations • Set four classes of delays: 500ms – 2000ms • Operator scheduling policy: Earliest Deadline First • Input: CPU utilization • Output: deadline miss ratio • Small query network with 13 operators • Stream data: • Synthetic: Poisson, Pareto • Real: TCP traces • Comparison: static shedding • Amount of shedding follows a pre-determined STEPSIZE • Similar to TCP rate control DEXA 2005

  15. Simulation results: Poisson inputs Target deadline miss ratio (control goal) is set to zero Inputs Outputs DEXA 2005

  16. Simulation results: bursty inputs a. Pareto b. TCP trace • Much less deadline misses than static shedding • The same or lower level of data loss (load shed) • Hard to get an appropriate STEPSIZE in static shedding – not a problem in control-based approach DEXA 2005

  17. Summary • Load shedding is an important quality adaptation method • Current solutions focusing on steady-state performance do not work well under bursty inputs • We propose an approach to guide load shedding in a highly dynamic environment based on feedback control theory • Initial experimental results by simulation show promising potential of our approach DEXA 2005

  18. Verification of model First order linear model DEXA 2005

  19. Simulation: unpredictable unit processing cost Control-based method learns the real cost DEXA 2005

  20. Controller stability after replacing S(k) with S(k-1) Let Id’(k) be the input signal as a result of using S(k-1) instead of S(k), we have Id’(k) = pS(k-1) and thus S(k-1) Id (k) = S(k) Id’(k) . In the z-domain, we get Id (k) = zId’(k) . Plugging above into the CLTF, we have According to control theory, controller is still stable. DEXA 2005

  21. Ongoing work • Performed all three steps in a real DSMS – the Borealis system • We set output to average delay • System identification gives afirst-order model structure • Control function • Controller analysis gives the following set of parameters: DEXA 2005

  22. Ongoing work: results • Control target: 2000ms • Comparison: • Adaptive: static shedding • BASELINE • NON-CTRL • Metrics: • Total delay violations • Total delayed tuples • Max delay • Load shed DEXA 2005

  23. Ongoing work: results DEXA 2005

More Related