Probability theory

Probability theory Much inspired by the presentation of Kren and Samuelsson

3 view of probability • Frequentist • Mathematical • Bayesian (knowledge-based)

Sample space • A universe of elementary outcomes. In elementary treatments, we pretend that we can come up with sets of equiprobable outcomes (dice, coins, ...). Outcomes are very small. • An event is a set of those outcomes. Events are bigger than outcomes -- more interesting.

Probability measure • Every event (=set of outcomes) is assigned a probability, by a function we call a probability measure. • The probability of every set is between 0 and 1, inclusive. • The probability of the whole set of outcomes is 1. • If A and B are two event with no common outcomes, then the probability of their union is the sum of their probabilities.

Cards • Out universe of outcomes is single card pulls. • Events: a red card (1/2); a jack (1/13);

Other things to remember • The probability that event P will not happen (=event ~P will happen) is 1-prob(P). • Prob (null outcome) = 0. • p ( A  B ) = p(A) + p(B) - p( A  B).

Independence (definition) • Two events A and B are independent if the probability of AB = probability of A times the probability of B (that is, p(A)* p(B) ).

Conditional probability This means: what's the probability of A if I already know B is true? p(A|B) = p(A and B) / p (B) = p(A  B) / p(B) Probability of A given B. p(A) is the prior probability; p(A|B) is called a posterior probability. Once you know B is true, the universe you care about shrinks to B.

Bayes' rule • prob (A and B) = prob (B and A); so • prob (A |B) prob (B) = prob (B|A) prob (A) -- just using the definition of prob (X|Y)); • hence

Bayes’ rule as scientific reasoning • A hypothesis H which is supported by a set of data D merits our belief to the degree that: • 1. We believed H before we learned about D; • 2. H predicts data D; and • 3. D is unlikely.

A random variable • a.k.a. stochastic variable. • A random variable isn't a variable. It's a function. It maps from the sample space to the real numbers. This is a convenience: it is our way of translating events (whatever they are) to numbers.

Distribution function • Distribution function: • This is a function that takes a real number x as its input, and finds all those outcomes in the sample space that map onto x or anything less than x. • For a die, F(0) = 0; F(1) = 1/6; F(2) = 1/3; F(3) = 1/2; F(4) = 2/3; F(5) = 5/6; and F(6) = F(7) = 1.

discrete distribution function

discrete, continuous • If the set of values that the distribution function takes on is finite, or countable, then the random variable (which isn't a variable, it's a function) is discete; otherwise it's continuous (also, it ought to be mostly differentiable).

Distribution function aggregates • It's a little bit counterintuitive, in a way. What about a function P for a die that tells us that P ( 1) = 1/6, P(2) = 1/6, ... p(6) = 1/6? • That's a frequency function, or probability function. We'll use the letter f for this. For the case of continuous variables, we don't want to ask what the probability of "1/6" is, because the answer is always 0...

Rather, we ask what's the probability that the value is in the interval (a,b) -- that's OK. So for continuous variables, we care about the derivative of the distribution function at a point (that's the derivative of an integral, after all...). This is called a probability density function. The probability that a random variable has a value in a set A is the integral of the p.d.f. over that set A.

Frequency function f • The sum of the values of the frequency function f must add up to 1! • The integral of the probability density function must be 1. • A set of numbers that adds up to 1 is called a distribution.

Means that have nothing to do with meaning • The mean is the average; in everyday terms, we add all the values and divide by the number of items. The symbol is 'E', for 'expected' (why is the mean expected? What else would you expect?) • Since the frequency function f tells you how many there are of any particular value, the mean is

Weight a moment... • The mean is the first moment; the second moment is the variance, which tells you how much the random variable jiggles. It's the sum of the differences from the mean (square those differences so they're positive). The square root of this is the standard deviation. (We don't divide by N here; that's inside the f-function, remember?)

Particular probability distributions: • Binomial • Gaussian, also known as normal • Poisson

Binomial distribution If we run an experiment n times (independently: simultaneous or not, we don't care), and we care only about how many times altogether a particular outcome occurs -- that's a binomial distribution, with 2 parameters: the probability p of that outcome on a single trial, and n the number of trials.

If you toss a coin 4 times, what's the probability that you'll get 3 heads? • If you draw a card 5 times (with replacement), what's the probability that you'll get exactly 1 ace? • If you generate words randomly, what's the probability that you'll have two the's in the first 10 words?

In general, the answer is

Normal or Gaussian distribution • Start off with something simple, like this: That's symmetric around the y-axis (negative and positive x treated the same way -- if x = 0, then the value is 1, and it slides to 0 as you go off to infinity, either positive or negative.

Gaussian or normal distribution • Well, x's average can be something other than 0: it can be any old m

And its variance (s2) can be other than 1

And then normalize-- so that it all adds up (integrates, really) to 1, we have to divide by a normalizing factor:

Probability theory