Multi-Terminal Information Theory Problems in Sensor Networks

Multi-Terminal Information Theory Problems in Sensor Networks Gregory J Pottie UCLA Electrical Engineering Department pottie@icsl.ucla.edu

General Issues Basic tools of information theory Multi-terminal information theory Applications Data fusion Cooperative communication My pet ideas Outline

Sensor Network Operation Cooperative communication Data fusion Routing Basic goal: reliable detection/identification of sources, and timely notification of end user

Basic Information Theoretic Concepts • Typical Sets (of sufficiently long sequences of iid variables): • Has probability nearly 1 • The elements are equally probable • The number of elements is nearly 2nH Xn Yn W source decoder channel p(y|x) channel encoder • Aim of communications system: • Minimize errors due to noise in channel • Maximize data rate • Minimize bandwidth and power (the resources) • Shannon Capacity establishes the fundamental limits

Basic Information Theoretic Concepts Xn Yn W source decoder channel p(y|x) channel encoder • Capacity C is the max mutual information I(X;Y) wrt p(x); that is, choose the set X leading to largest mutual information. • Capacity C is the largest rate at which information can be transmitted without error • Jointly typical set: from among the typical input and output sequences, choose the ones for which 1/n log p(xn,yn) close to H(X,Y) • Size of jointly typical set is about 2nI(X,Y), thus there are about this number of distinguishable signals (codewords) in Xn • These codewords necessarily contain redundancy--size of set is smaller than the alphabet would imply; sequences provide better performance than isolated symbols if properly chosen.

Jointly Typical Sequences Xn Yn X1n X2n Output set in general larger due to additive noise; Output images of inputs may overlap due to noise

Gaussian Channel Capacity • Discrete inputs to channel, and channel adds noise with Gaussian distribution (zero mean, variance N) • Input sequence (codeword) power set to P • Capacity is maximum I(X;Y) over p(x) such that EX2 satisfies power constraint • C = 1/2 log(1+P/N) bits per transmission. • The more usual form is to consider a channel of bandwidth W and noise power spectral density No. Then C = W log(1+P/NoW) bits per second.

Capacity and Coding • Shannon capacity is the maximum rate at which information may be reliably sent over a channel (zero decoding error probability) • Given bandwidth and bit rate, can compute SNR at which capacity is achieved • Practical channel codes seek to reduce SNR to the minimum required to achieve transmission at some finite bit error rate, with reasonable decoding complexity log P(e) Digital modulation Coded Modulation C SNR (dB)

Parallel Channels • The water-filling power distribution maximizes the capacity of parallel Gaussian channels, with noise variances Ni • Capacity is sum of those of the subchannels to which power is allocated • This can be extended to continuously variable channels (in some combination of time and frequency), such as radio channels experiencing multipath fading • Practical algorithms must cope with channel dynamics; channel state must be conveyed to transmitter to approach capacity n P3 P2 Power N1 N2 N5 N3 N4

Multi-Terminal Information Theory • The preceding discussion assumed a single transmitter and receiver • Multi-terminal information theory considers maximization of mutual information for the following possibilities: • Multiple senders and one receiver (the multiple access channel) • One sender and multiple receivers (the broadcast channel) • One sender and one receiver, but intervening transducers that can assist (the relay channel) • Composite combinations of these basic types • Estimation theory also aims to maximize mutual information, except the senders do not cooperate and usually there is a fidelity constraint: • One sender and multiple receivers (the data fusion problem) • Multiple senders and receivers (the source separation problem) • Delay and resource usage may also be included

Gaussian Multiple Access Channel • m transmitters with power P sharing the same noisy channel • C(P/N)=1/2 log(1+P/N) bits per channel use for isolated sender • then the achievable rate region is • The last inequality dominates when rates are the same • Capacity increases with more users (there is more power) • Result is dual to Slepian-Wolf encoding of correlated sources

Gaussian Broadcast Channel • One sender of power P and two receivers, one with noise N1 and one with noise N2, N1 < N2 • The two codebooks are coordinated to exploit commonality of information transmitted, otherwise capacity does not exceed simple multiplexing

Relay Channel • One sender, one relay, and one receiver; relay transmits X1 based only on its observations Y1 Y1:X1 Y X • Combines a broadcast channel and a multiple access channel • Networks are comprised of multiple relay channels that may further induce delay

General Multi-Terminal Networks • m nodes, with node j with associated transmission variable X(j), and receive variable Y(j) • Node 1 transmits to node m; what is the maximum achievable rate? (X1,Y1) (Xm,Ym) • Bounds derived from information flow across multiple cut sets • generally not achievable • Source-channel coding separation theorem fails because capacity of multiple access channels increases with correlation, while source encoding eliminates correlation

Now let it move… • Nodes move within bounded region according to some random distribution; what is capacity subject to energy constraint on messages? Node m Node 1 Time 2 Time 1 • Answer depends on delay constraint; eventually they will collide implying near-zero path loss and thus unbounded capacity • Other questions: • Probability the nodes have connecting path of required rate • Probability of message arriving in required delay

Some Recent Research • Data fusion in sensor networks • Cooperative communications in sensor networks • Wild speculation on signal locality, scaling, and hierarchy

But first, a Rate Distortion Primer • Rate distortion function R(D) can be interpreted as • The minimum rate at which a source can be represented subject to a distortion D=d(X,Y) • The minimum distortion that can be achieved given a maximum rate constraint R • Interesting dual results to Capacity; here we get to determine the distortion • Applies to compression of real-valued sequences Achievable region R D

Rate Distortion and Data Fusion • Can identify resource use (energy/number of bits transmitted) with rate,decision reliability (false alarm rate, missed detection prob) with distortion • Operate at different points on rate distortion curve depending on valuesof cost function • Location of fusion center, numerical resolution, number of sensors,length of records, routing, distribution of processing all affect R(D)

A Simple Algorithm • Nodes activated to send requests for information from other nodes based on SNR • If above threshold T, decision is reliable, and suppress activity by neighbors • Otherwise, increase likelihood of requesting help based on proximity to T • In likelihood, higher SNR nodes form the cluster • Bits of resolution related to SNR (e.g., for use in maximal ratio combining) 1: high SNR; initiates 2: activated, and requests further information 3: SNR too low to respond 3 2 1 3

The n-helper Gaussian Scenario X Y1 Y2 • Multiple sensors observe event and generate correlated Gaussian data. One data node (X) is the main data source (e.g. closest to phenomenon), and the n additional nodes (Y1 - Yn) are the ‘helpers’. • The Problem: What codes and data rates so that gateway/data-fusion center can reproduce the data from the main node using the remaining nodes as sources of partial side information, subject to some distortion criterion. Gateway/Fusion center Y3 … Yn

Main Result • We do not care about reproduction of the Y variables; rather they act as helpers to reproduce X • This problem was previously solved for the 2-node case • Our solution: for an admissable rate (Rx,R1,…,Rn), and for some Di’s>0, the n-helper system data rates can be fused to yield an effective data rate (wrt source X) satisfying the following rate distortion bound: • where s2 is the variance and r the correlation

Comments • Other source distributions analytically difficult, but many are likely to be convex optimizations • Generalization would consider instances of relay/broadcast channels in conveying information to fusion center with minimum energy • Sensor network detection problems are inherently local: even though expression may be complicated, the number of helpers will usually be small due to decay of signals as power of distance

Many low-power and low-cost wireless sensors cooperate with each other to achieve more reliable and higher rate communications The dominant constraint is the peak power, the bandwidth is not the main concern Multiplexing (FDMA, TDMA, CDMA, OFDM) is the standard approach. Each sensor has an unique channel We focus on schemes where multiple sensors occupy the same channel Problem Definition of Cooperative Communication

Example: Space-Time Coding • N transmit antennas and N receive antennas • Channel transition matrix displays independent Rayleigh (complex Gaussian) fading in each component • With properly designed codes, capacity is N times that of single Rayleigh channel • Note this implicitly assumes synchronization among Tx and Rx array elements--requires special effort in sensor networks • A coordinated transmission, not a multiple access situation.

Context • Cooperative reception problem very similar to multi-node fusionproblem; same initiation procedure required to create the cluster, however we can choose channel code. • Cooperative transmission and reception similar to multi-target multi-node fusion, but more can be done: beacons, space-time coding • Use to overcome gaps in network, communicate with devicesoutside of sensor network (e.g. UAV)

Channel state information: known at transmitter side, and at both sides If channel state information is known at the transmit side, RF synchronization can be achieved Channels: AWGN and fading channels with unequal path loss General formula Channel Capacity

Receive diversity: Transmit diversity: Combined transmit-receive diversity RF synchronization Channel Capacity(cont’d)

Comments • Capacity is much higher if phase synchronization within transmitter and receiver clusters can be achieved • Have investigated practical methods for satellite/ground sensors synchronization • Beacons (e.g. GPS) can greatly simplify the synchronization problem for ground/ground cooperative communications • Recent network capacity results do not take into account possibilities for cooperation by nodes as transmitter/receiver clusters

Implications of Signal Locality • Severe decay of signals with distance (second to fourth power) • Mutual information to source dominated by small set of nodes • Cooperative communication clusters for ground to ground transmission will likely be small • Implications: • Local processing is nearly optimal; do not need to convey raw data over long distances very frequently • Consequently, lowest layers of processing/network formation, etc. are the most important, since most frequently invoked (“typical”) • Practical example: • Specialized local transmission schemes (e.g., for forming ad hoc clusters), but long range might use conventional methods such as TCP/IP

Hierarchy • For dealing with the network as a whole, number of variations of topology are immense • Distributed algorithms exploiting locality of events • Use of ensembles for deriving bounds • In between, considers layers of hierarchy, each of which may be amenable to a conventional optimization technique

cue Information Processing Hierarchy transmit decision human observer beamforming base stationhigh resolutionprocessing query for more information high powerlow false alarm ratelow duty cycle low powerhigh false alarm ratehigh duty cycle

Information Theory Challenges • Minimal energy to obtain reliable decision in a distributed network • Minimal energy to relay a decision across a distributed network (including gaps) • Minimal (average) delay in conveying information through network • Role of hierarchy; how much leads to what kinds of changes in information theoretic optimal behavior • At small scale can use brute force, at large scale ensembles; what can we do in between? • Limits of distributed vs. centralized approaches

References • T. Cover and J. Thomas, Elements of Information Theory. Wiley: 1991. • G. Pottie and W. Kaiser, “Wireless Integrated Network Sensors,” Commun. ACM, May 2000 • M. Ahmed, Y-S. Tu, and G. Pottie, “Cooperative detection and communication in wireless sensor networks,” 38th Allerton Conf. On Comm., Control and Computing, Oct. 2000.

Multi-Terminal Information Theory Problems in Sensor Networks