1 / 56

Queries over Streaming Sensor Data

Queries over Streaming Sensor Data. Sam Madden DB Lunch October 12, 2001. Outline. Background Server Side Solutions Fjords, Sensor Proxies, CACQ Sensor Side Solutions Catalog Management Aggregation Future Work. Background: Sensor Networks. Sensor Networks.

keola
Download Presentation

Queries over Streaming Sensor Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Queries over Streaming Sensor Data Sam Madden DB Lunch October 12, 2001

  2. Outline • Background • Server Side Solutions • Fjords, Sensor Proxies, CACQ • Sensor Side Solutions • Catalog Management • Aggregation • Future Work

  3. Background: Sensor Networks

  4. Sensor Networks • Small, low cost battery powered microprocessors with 1 –4 sensors • Light, temperature, vibration, acceleration, AC power, humidity. • 10 kBit – 1Mbit wireless networks, 100ft range. • “Ad-hoc” networking – no predefined routes. • Cal, MIT, UCLA OS and networking communities committed

  5. SmartDust • Sensor nets motivated by “SmartDust Vision” – millimeter scale microprocessors, sensor, and wireless communication for pennies. • Deployed in thousands, no concern for reliability of a single sensor. • Requires: position detection, fault tolerance, aggregation, etc.

  6. Rene / Mica Motes • SmartDust stand-in • ~2cm x 3cm, OTS.

  7. TinyOS • Lightweight OS for sensors • Event-based • Active-message, multi-hop networking • Auto-idling • Network reprogramming, time synchronization, etc. [18] J. Hill, R. Szewczyk, A. Woo, S. Hollar, and D. C. K. Pister. System architecture directions for networked sensors. In Proceedingsof the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, November 2000.

  8. Applications of Sensor Nets • Space Monitoring • Power, light, temp in buildings • Temperature, humidity • Traffic • Military • Structural • Personal Networks

  9. Database Opportunities • All applications depend on data processing • Declarative query language over sensors attractive • Want “to combine and aggregate data streaming from motes.” • Sounds like a database…

  10. Database Challenges • Sensors unreliable • Come on and offline, variable bandwidth • Sensors push data • Sensors stream data • Sensors have limited memory, power, bandwidth • Sensors have processors

  11. Outline • Background • Server Side Solutions • Fjords, Sensor Proxies, CACQ • Sensor Side Solutions • Catalog Management • Aggregation • Future Work

  12. Fjords • Query Plan Abstraction to handle lack of reliability and streaming, push based data • Combine push and pull in arbitrary combinations • Use connectors between operators to isolate them from flow direction • “Bracket Model” – Graefe ‘93

  13. Fjords (Continued) • Operators assume non-blocking queue interface between each other. • Queues implement push vs. pull • Pull from A to B : Suspend A, schedule B until it produces data. A cannot go forward until B produces data. • Push from B to A : A polls, scheduler thread invokes B until it produces data. A can process other inputs while waiting for B. • Supports parallelism between operators via queues, state machines, and OS (e.g. NIC buffers, DMA) in operator transparent way.

  14.  Fjords Example Push Push  Pull Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

  15.  Fjords Example Push Push  Pull Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

  16.  Fjords Example Push Push  Pull Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

  17.  Fjords Example Push Push  Pull Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

  18.  Fjords Example Push Push  Pull Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

  19.  Fjords Example Push Push  Pull Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

  20.  Fjords Example Push Push  Pull Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

  21.  Fjords Example Push Push  Pull Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

  22.  Fjords Example Push Push  Pull Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

  23. Fjords Applications • Combine traffic streams with web-based accident reports Francis Li, Sam Madden, Megan Thomas. Traffic Visualization. http://www.cs.berkeley.edu/~mct/infovis/project/traffic.html

  24. Operators for Streaming Data • Need special operators for dealing with streams (See P. Seshadri, et al. The design and implementation of a sequence database systems..VLDB ’96) • In particular, streams can’t be joined or sorted in the traditional sense • Solution: Use windows – e.g. “Zipper Join”

  25. Sensor Proxy • Energy-sensitive database operator • Buffer sensor tuples and route to multiple user queries to hide query load from sensors • Push aggregation operators into sensors to reduce communications load • Dynamically adjust sample rate based on user demand • Push results into Fjords so that other operators don’t block waiting on slow or dead sensors

  26. Some Results • Pushing predicates into sensors can vastly reduce costs: Atmel Simulator 100 samples / sec 5 vehicles / sec 7x power savings

  27.  Query 1 Query 2   Stocks. symbol = ‘APPL’ Stocks. symbol = ‘MSFT’ Stock Quotes ‘MSFT’ ‘APPL’ Stock Quotes CACQ • Expect hundreds to thousands of queries over same sensor sources • Continuously Adaptive Continuous Queries • Continuous Queries: Long running queries which combine selections and joins to improve efficiency (See Chen, NiagaraCQ, SIGMOD 2000)

  28. static dataflow eddy CACQ (Cont.) • Continuous Adaptivity From Eddies • Route tuples differently, depending on selectvity and cost estimates of operators

  29. CACQ (cont.) • Combining CA with CQ is a win: • CQ increases number of simultaneous queries • Adaptivity well suited to long running queries • Eddies allow us to avoid ugly query-optimization phase in traditional CQ • Eddies + Streams == few copies, unlike traditional CQ

  30. CACQ (cont) Look for a paper in SIGMOD 2002 (fingers crossed!)

  31. Outline • Background • Server Side Solutions • Fjords, Sensor Proxies, CACQ • Sensor Side Solutions • Catalog Management • Aggregation • Future Work

  32. Sensor Side Solutions • CACQ + Fjords provides interface + performance on QP, but sensors still need help: • Locate / identify sensors • Reduce power consumption • Take advantage of processors? • Improve responsiveness

  33. Cataloging Sensors • To query sensors, need a way to locate, identify properties, extract values • Goal: Drop a bunch of sensors around the DBMS, allow them to be queried without manual effort • Idea: Add a layer to each sensor which advertises its capabilities

  34. #temperature sensor field { name : "temp" #optional type : int units : celsius min : -20 max : 100 bits : 8 sample_cost : 10.0 J #optional -- for use in costing sample_time : 10.0 ms #optional -- for use in costing input : adc2 #optional : read from adc channel 1 sends : ondemand accessorEvent : GET_TEMPERATURE_DATA responseEvent : TEMPERATURE_DATA_READY } Compiled in 27 bytes of memory Layer to register with telegraph Can be “push” or “pull” Catalog (Continued)

  35. Aggregating Over Sensors • Sensor Proxy combines user queries, pushes down aggregates • Goal: Save energy, increase efficiency • Idea: Take advantage of the routing hierarchy (example soon!)

  36. Why bother with aggregation • Individual sensor readings are of limited use • Interest in higher level properties, e.g. what vehicles drove through, what is the spread of temperatures in the building • We have a processor & network on board, lets use it • We cannot survive without aggregation • Delivering a message to all nodes much easier than delivering a message from each node to a central point • Delivering a large amount of data from every node harder still, vide connectivity experiment • Forwarding raw information too expensive • Scarce energy • Scarce bandwidth • Multihop performance penalty

  37. Aggregation challenges • Inherently unreliable environment, certain information unavailable or expensive to obtain • how many nodes are present? • how many nodes are supposed to respond? • what is the error distribution (in particular, what about malicious nodes?) • Trying to build an infrastructure to remove all uncertainty from the application may not be feasible – do we want to build distributed transactions? • Information trickles in one message at a time • Never have a complete and up-to-date information about the neighborhood • What type of information should we expect from aggregation • Streams • Robust estimates

  38. 3 4 5 1 2 Scenario: Count

  39. 1 2 3 4 5 Sensor # Time Goal: Count the number of nodes in the network. Number of children is unknown. Scenario: Count

  40. 1 2 3 Sensor # Time Goal: Count the number of nodes in the network. Number of children is unknown. Scenario: Count

  41. 1 2 3 4 Sensor # Time Goal: Count the number of nodes in the network. Number of children is unknown. Scenario: Count

  42. 1 2 3 4 5 Sensor # Time Goal: Count the number of nodes in the network. Number of children is unknown. Scenario: Count

  43. 1 2 3 4 5 Sensor # Time Goal: Count the number of nodes in the network. Number of children is unknown. Scenario: Count

  44. 1 2 3 4 5 Sensor # Time Goal: Count the number of nodes in the network. Number of children is unknown. Scenario: Count

  45. 1 2 3 4 5 Sensor # Time Goal: Count the number of nodes in the network. Number of children is unknown. Scenario: Count

  46. Counting Lessons • Take advantage of redundancy to improve accuracy (reply to all parents, not just one) • Use broadcast to reduce number of messages • Result is a stream of values: much more robust to failures, movement, or collision than a single value.

  47. Aggregation in network programming • Network programming problem • Reliable delivery of a large number of messages to all nodes in range, while exploiting the broadcast nature of the medium • Basic setup • Broadcast a known number of idempotent program fragments • Each node keeps a bitmap of fragments received (1=packet received) • Two stages of the problem: single hop, and multihop • Solutions • Single hop, dense cell • Broadcasting the program – trivial, the central node broadcasts • Feedback from nodes – broadcast a request from the central node: Is anyone missing packets in this packet range? • Convergence: no replies to the request

  48. Aggregation in multihop network programming • Broadcasting the program – use flooding • Remember the last 8 packets forwarded, use that cache to decide whether to forward or not • Feedback from nodes • Distribute requests for feedback using the flooding • After some delay, respond if any packets are missing locally • Responses from children: AND with the local bitmap, store the result locally, forward the request • Suboptimal because there is no local fixups • Convergence • No replies to the request

  49. Aggregation over streams • Inherent uncertainty of the system • Can nodes communicate, do they have enough power, have they moved? • computing a complete single answer can be very expensive, and may not be possible • Partial estimates have their own value • Aggregation over streams • Values reflect the current best estimates • Self stabilizing: in the absence of changes converges to a desired value within N steps

  50. What does it mean to aggregate(The DB Perspective) • General purpose solution: apply standard aggregation operators like COUNT, MIN, MAX, AVERAGE, and SUM to any set of sensors. • Previous example are application specific • In sensors, operators may be arbitrary signal processing functions • Provide grouping semantics: e.g. ‘select avg(temp) group by trunc(light/10)’ • In sensor networks, groups may be random samples t1 t2 t3 t4 t5 t6 t7 t8 t9

More Related