Basic Concepts of Discrete Probability

Basic Concepts of Discrete Probability Elements of the Probability Theory (continuation)

Bayes’ Theorem • Bayes’s theorem (1763) solves the following problem: from a number of observations on the occurrence of the effect, one can make an estimate on the occurrence of the cause leading to that effect (is is also called the rule of inverse probability).

Bayes’ Theorem • Let and be two mutually exclusive and exhaustive events: • Let both and have a subevent ( and , respectivelly. • The event is of our special interest. It can occur only when and occurs.

Bayes’ Theorem • Let we are given a priori information that E has occurred and the conditional probabilities P{E|A1} and P{E|A2} (a priori probabilities) are assumed to be known. • The Bayes’s problem is formulated as follows: how likely is that and has occurred because of the occurrence of E (a posteriori probabilities)?

Bayes’ Theorem a priori probabilities a posteriori probabilities

Bayes’ Theorem • Since , then for the mutually exclusive events:

Bayes’ Theorem(general case) • Let ; • Then

1–p 0 0 Consider again the problemof sending a bit of information from sender to receiver: p p 1 1 1–p Bayes’ Theorem and Communication Channels Before a bit b{0,1} has been transmitted, the receiver has no information: p(0) = p(1) = ½. The transmission of the bit value changesthese probabilities: If the bit value b’=0 has beenreceived, we assign a higher probability that the bit value was b=0, rather than b=1. This probability is calculated using the Bayes’ theorem.

Y X 0.9 0 0 0.1 0.1 1 1 0.9 Bayes’ Theorem and Communication Channels Let us apply Bayes’ theorem to the noisychannel where the sender’sbit is the random variable X,and the received bit is Y. 1) Take p=0.1 and use the channel without error correction. We have that P{X=0|Y=0} = P{X=1|Y=1} = 0.9 and P{X=1|Y=0} = P{X=0|Y=1} = 0.1. 2) if we use the code where we send the bits 3 times,we get P{X=0|Y=0} = P{X=1|Y=1} = 0.972 and P{X=1|Y=0} = P{X=0|Y=1} = 0.028. Thus, the information given by the Bayes’ posteriordistributions P{X|Y}, is anyway less random than (½,½).

Random Variables • A random variable is a real-valued function defined over the sample space of a random experiment: • A random variable is called discrete if its range is either finite or countable infinite. • A random variable establishes the correspondence between a point of Ωand a point of in the “coordinate space” associated with the corresponding experiment.

Discrete Probability Function and Distribution • Any discrete random variable X assumes different values in the coordinate space: • The probability distribution function (the cumulative distribution function - CDF) is defined as where is the probability function

f(x) = P{X=x} with x Ψ x f(x’) = P{X=x’} with x’ Ψ X x’ f(x’’) = P{X=x’’} with x’’ Ψ x’’ Discrete Probability Function • Thus, the discrete random variable X ‘produces’ letters from a countable (typically finite) alphabet Ψ with the probability function p(x):

Discrete Probability Distribution Function (CDF) • The following properties of the CDF follow from the axioms of probability: • F(x) is nondecreasing function: if then for every

Bivariate Discrete Distribution • In most engineering problems the interrelation between two random quantities (pairs of values - a vector-valued random variable) leads to a bivariate discrete distribution. • The joint probability function and distribution function (CDF) are, respectively:

Bivariate Discrete Distribution • The marginal probability function and distribution function (CDF) are, respectively:

Bivariate Discrete Distribution • The marginal probability is the probability of the occurrence of those events, for which without regard to the value of Y. • If the random variables X and Y are such that for all i, j then the variables X and Y are said to be statistically independent.

Combinatorics and Probability • For example, if engineering students have today Calculus (C), Physics (P), and Information Theory (I) classes. How we can calculate the probability that I is the last class? • The following 6 arrangements are possible: CPI, CIP, PCI, PIC, ICP, IPC. Two of them are desirable: CPI and PCI. Thus, if all events are equiprobable, then the probability is 2/6=1/3.

Combinatorics and Probability • If engineering students take during this semester Calculus (C), Physics (P), and Information Theory (I) classes, two classes/day. How we can calculate the probability that I and P are taken at the same day and P is the first class? • There are 6 different arrangements of 2 objects selected from 3: CP, PC, CI, IC, IP, PI. One of them is desirable: PI. Thus, the probability is 1/6.

Combinatorics and Probability • The number of different permutations of n objects is • The number of different (ordered) arrangements of r objects selected from n is the number of all possible permutations of n objects (n!) divided by the number of all possible permutations of n-r objects ((n-r)!):

Combinatorics and Probability • If engineering students take during this semester Calculus (C), Physics (P), and Information Theory (I) classes, two classes/day. How we can calculate the probability that I and P are taken at the same day? • There are 3 different combinations of 2 objects selected from 3: (CP=PC), (CI=IC), (IP=PI). One of them is desirable: (IP=PI). Thus, the probability is 1/3.

Combinatorics and Probability • The number of different (not ordered) combinations of r objects selected from n is the number of all possible arrangements of r objects selected from n divided by the number of all possible permutations of r objects (r!):

Combinatorics and Probability • Binomial Meaning: as it was discovered by I. Newton, are the coefficients of the binomial decomposition:

Binomial Distribution • Let a random experiment has only two possible outcomes E1 and E2. Let the probability of their occurrence be p and q=1-p, respectively. If the experiment is repeated n times and two successive trials are independent of each other, the probability of obtaining E1 and E2r and n-r times, respectively, is

Binomial Distribution • Let a random variable X takes the values r if in a sequence of n trials E1 occurs exactly r times. Then The probability function The probability distribution function (CDF) – thebinomial distribution function

Poisson’s Distribution • A random variable X is said to have a Poison probability distribution if • The Poisson’s probability distribution function (CDF) is

Expected Value ofa Random Variable • Let X be a discrete single-variate random variable and its associated probability function is also defined: • Then is the average (statistical average) of X.

Expected Value ofa Random Variable • In general, if is a function of a random variable X (a weighting function), then its mean value is referred to as the expected value. • E(X) is the expected value of X, E(X+Y) is the expected value of X+Y.

Expected Value ofa Random Variable • When the function is of the form where j>0, its expected value is called the moment of jth order of X. • - first order moment (mean) • - second order moment • … … … • - jth order moment

Basic Concepts of Information Theory A measure of uncertainty. Entropy.

The amount of Information • How we can measure the information content of a discrete communication system? • Suppose we consider a discrete random experiment and its sample space Ω. Let X be a random variable associated with Ω. If the experiment is repeated a large number of times, the values of X when averaged will approach E(X).

The amount of Information • Could we search for some numeric characteristic associated with the random experiment such that it provides a “measure” of surprise or unexpectedness of occurrence of outcomes of the experiment?

The amount of Information • C. Shannon has suggested that the random variable –log P{Ek} is an indicative relative measure of the occurrence of the event Ek. The mean of this function is a good indication of the average uncertainty with respect to all outcomes of the experiment.

The amount of Information • Consider the sample space Ω. Let us partition the sample space in a finite number of mutually exclusive events: • The way in which the probability space defined by such equations is called a complete finite scheme.

The amount of Information. Entropy. • Our task is to associate a measure of uncertainty (a measure of “surprise”), with complete finite schemes. • C. Shannon and N. Wiener suggested the following measure of uncertainty – the Entropy:

Entropy of a Bit (a simple communication channel) • A completely random bit with p=(½,½) has H(p) = –(½ log ½ + ½ log ½) = –(–½ + –½) = 1. • A deterministic bit with p=(1,0) has H(p) = –(1 log 1 + 0 log 0) = –(0+0) = 0. • A biased bit with p=(0.1,0.9) has H(p) = 0.468996… • In general, the entropylooks as follows as a function of 0≤P{X=1}≤1:

The amount of Information. Entropy. • We have to investigate the principal properties of this measure with respect to statistical problems of communication systems. • We have to generalize this concept to two-dimensional probability schemes. • Then we have to consider the n-dimensional probability schemes.

Basic Concepts of Discrete Probability