360 likes | 519 Views
Context-Aware Sensors. Eiman Elnahrawy and Badri Nath Department of Computer Science, Rutgers University EWSN January 19 th 2004. Outline. Introduction, Motivations, Related Work Context-Awareness Approach: Modeling and Learning Applications Preliminary Evaluations
E N D
Context-Aware Sensors Eiman Elnahrawy and Badri Nath Department of Computer Science, Rutgers University EWSN January 19th 2004
Outline • Introduction, Motivations, Related Work • Context-Awareness • Approach: Modeling and Learning • Applications • Preliminary Evaluations • Challenges and Research Directions • Conclusion
Introduction • Sensors expected to become a major source of information • Applications • Monitoring: sometimes remote harsh environments • Habitat, climate, contamination • Agriculture and crops • Quality of food • Structures (response to earthquakes) • Tracking and military applications • Traffic control • Industry (control at assembly lines) • Medical (smart medicine cabinets)
Major design goal Limitations of Wireless Sensor Networks • Limited battery life: if abused, sensors last few days, otherwise, may last up to few months • Limited communication bandwidth • Limited processing capability
High rate of packet loss • Poor communication links • Connection failures • Fading of signal strength • Packet collision between multiple transmitters • Constant or sporadic interferences • > 10% of the links suffer average loss rate > 50% • Packet loss of most links fluctuates over time with estimated variance 9% - 17% • Topology is continuously changing (node failure, node mobility)
Limitations cause many data quality problems… 1. Outliers: serious events/bogus readings at low battery levels 2. Missing values • Low level solutions to tolerate loss don’t usually work, problem persists • Limited resources:Can we sample?
Inevitable! • (Uncontrollable) harsh environmental conditions, HW and radio problems • Current technology: cheap low quality sensors, vary in their tolerance to quality problems • Focus of industry is even cheaper sensors-> lower quality that varies with the cost of the sensor
Serious… • Incompleteness/Imperfection/Uncertainty • Need to know event/malicious sensor • Seriously affects decision-making/triggers • False +ve/-ve/misleading answers • May cost you money • May jeopardize application: e.g. routing based on gradient
I can’t rely on this sensor data anymore. It has too many problems!!? • Missing information • Hmm, is this a malicious sensor • Something strange or sensor gone bad • Can we sample? • Noise • Bias • Limitations result in many data quality problems • Serious for immediate decision making or actuator triggers!!
General Approach • Relatively dense networks (coverage, connectivity, robustness, etc.) • Correlated and/or redundant readings • Spatial and temporal dependencies • Why don’t we exploit these spatio-temporal relationships among sensors (contextual information)?
Related Work • Spatio-temporal correlations in sensor data • Dimensions [Ganesan et al. 2002] • Premon [Goel et al. 2001] • Geospatial data analysis [Heidemann et al. 2001] • Assume the existence of such correlations without attempting to explicitly quantify them • Other data quality problems • Reducing the effect of noise [Elnahrawy et al. 2003] • Calibration (a post deployment technique) [Bychkovskiy et al. 2003]
In-network aggregation [Madden et al. 2002, 2003, Zhao et al. 2002] • Motivated our online in-network learning of relationships • Spatial and temporal data[Shekhar et al. 2003] • Graphical models in computer vision and image processing[Smyth et al. 1998, Freeman 1999]
Contextual Information Encodes spatial dependencies as well as temporal dependencies Enables sensors to locally predict their current readings Context-Awareness Sensors are awareof their context (neighborhood and history) Given context information sensors can infer (predict) their reading Two Concepts
Learning the Contextual Information • Probabilistic approach based on Bayes classifiers learning and utilizing contextual information learning parameters of a Bayes classifier and then making inferences Mapping • Scalable (distributed) and energy-efficient procedure for online learning • Inference computed locally at the node
S Modeling the Contextual Information • Markovian Model (short range dependencies):last reading, immediate neighbors T H N T+1 T+2
Why Bayesian? • Simple training and inference (sensors can afford it) • Bayesian-based models have been used in literature (image processing, spatial data) • Gives good results and (sometimes) outperforms more sophisticated classifiers • Has a very nice “decomposability and progressive learning” property -> Distributed learning
Bayesian and Sensor Networks • Features: h,n • Last reading of sensor h (temporal information) • Current readings of some immediate neighbors n (spatial information) • In our preliminary work we used 2 neighbors • Quantization: R = {ri} • Divide range of possible values into a finite set of non-overlapping subintervals, not necessarily of equal length, each subinterval = class
S H N Prediction in Bayesian Classifiers • MAP (Maximum A posteriori) : calculate the most likely class of the current sensor reading rMAP given • The observed features h,n (spatio-temporal information) • The parameters θ (conditional probability tables)
Naive Bayes • Features conditionally independent given the target class • Parameters θ(CI) become • The 2 conditional prob. tables for P(h| ri), P(n| ri) • The prior probability of each class P(ri)
Parameters are just ratios of counters! Frequency of r1 = |r1| / |D| = # [n (r2,r2) current reading r2]/|r2| = # [H r2 current reading r1]/|r1| Total number of counters 1 + m + 3/2 m2 + ½ m3
Learning the Parameters • Data is free: most networks are readily used for collecting learning data (e.g., monitoring) • 2 phases: learning and testing • In-network, in a distributed fashion using in-network aggregation • Sensors collect training data and estimate the parameters locally( 1 + m + 3/2 m2 + ½ m3 counters) • Parameters (counters) are then aggregated while propagating up the routing tree (SUM aggregate) • Flood overall counters to every sensor
Stationary vs. Non-Stationary • Perfect Stationarity: Use in-network aggregation, most efficient • Handling dynamic correlations requires a priori knowledge of the dynamics • Over time: re-learn the parameters dynamically at each change • Over space: cluster the network into geographical regions where the “stationarity in space" assumption holds inside each region • Time, space: hybrid approach
Analysis: In-network vs. Centralized • Both apply, different Communication cost • Roughly measured by size of learning data • Vary from application to another • Depends on accuracy and routing mechanism • More experiments needed (future work) • Non-stationary (space): centralized is inferior
In-network learning Distributive summary aggregate k X O(m3)X O(n), k epochs, m classes, and n nodes O(m3) summary agg., k times Effectively reduces traffic Analysis: (Imperfectly) Stationary Centralized learning • Centralized agg. (detailed set) • p X O(n2), p training instances (application-dependent) • p centralized aggregates • Significant traffic Examples show centralized learning is an order of magnitude higher
Predicting any missing value Detecting malicious sensors Discovering outliers Super-resolution (Sampling) Inference Problem Applications
Evaluations • Synthetic data (Tracking data set) • Phenomenon with sharp boundaries • Shockwave propagating around a center based on Euclidean distance • 10000 sensors over a grid of 100 x 100 • Divided range of readings into 10 bins (classes) • Added outliers with % 10-90
As % outliers increases • The classifier takes more time (iterations) to learn • The error in prediction increases and then remains constant at 7% • Sensors rely more on the temporal correlations
As % outliers increases • We were able to detect about 90% of the added outliers • Incorrect prediction were off by less than 1
Evaluations Acknowledgement: Robert Szewczyk @Berkeley • Real data (Great Duck Island GDI) • Intel’s project off the shore of Maine • Subset of the nodes (2, 12, 13, 15, 18, 24, 32, 46, 55, and 57) • Spatially adjacent • 5 sensors (light, temperature, thermopile, thermistor, humidity) • Readings from August 6 to September 9, 2002 (about 140,000 each sensor)
Light Temperature Thermistor Humidity
Light Temperature Thermistor Humidity
Evaluations • Error becomes small enough in a relatively short time • > 90% accuracy in most of the cases • Stationary, random imprecision, noise, and outliers in the testing phase
Challenges • Dynamic correlations • Heterogeneity • Number of neighbors, selection criteria • Efficient routing • Dealing with rare events • Avoid quantization -> Regression models • Multi-dimensional
Future Work • Prototype and more Evaluations • Preliminary evaluations to investigate efficiency • Extremely valuable in highlighting major decisions and potential deployment problems • Characterization • Overall cost • Integration • Integrating noise, calibration, and context-awareness • Important to ensure learning of accurate correlations
Conclusion • Dealing with data quality problems is very important • Context-awareness: learning and making inferences • Works well • Applications: missing values, outliers, sampling • Many open problems and future work directions