1.15k likes | 1.84k Views
Probability Boot camp. Joel Barajas October 13 th 2008. Basic Probability. If we toss a coin twice sample space of outcomes = ? {HH, HT, TH, TT} Event – A subset of the sample space only one head comes up probability of this event: 1/2. Permutations.
E N D
Probability Boot camp Joel Barajas October 13th 2008
Basic Probability • If we toss a coin twice • sample space of outcomes = ? {HH, HT, TH, TT} • Event – A subset of the sample space • only one head comes up • probability of this event: 1/2
Permutations • Suppose that we are given n distinct objects and wish to arrange r of these on line where the order matters. • The number of arrangements is equal to: • Example: The rankings of the schools
Combination • If we want to select r objects without regard the order then we use the combination. • It is denoted by: • Example: The toppings for the pizza
Venn Diagram S A B A B
Probability Theorems Theorem 1 : The probability of an event lies between ‘0’ and ‘1’. i.e. O<= P(E) <= 1. Proof: Let ‘S’ be the sample space and ‘E’ be the event. Then or 0 < =P(E) <= 1 The number of elements in ‘E’ can’t be less than ‘0’ i.e. negative and greater than the number of elements in S.
Probability Theorems Theorem 2 : The probability of an impossible event is ‘0’ i.e. P (E) = 0 Proof: Since E has no element, n(E) = 0 From definition of Probability:
Probability Theorems Theorem 3 : The probability of a sure event is 1. i.e. P(S) = 1. where ‘S’ is the sure event. Proof : In sure event n(E) = n(S) [ Since Number of elements in Event ‘E’ will be equal to the number of element in sample-space.] By definition of Probability : P(S) = n (S)/ n (S) = 1 P(S) = 1
Probability Theorems Theorem 4: If two events ‘A’ and ‘B’ are such that A <=B, then P(A) < =P(B). Proof: n(A) < = n(B) or n(A) / N(S) < = n(B) / n(S) Then P(A) < =P(B) Since ‘A’ is the sub-set of ‘B”, so from set theory number of elements in ‘A’ can’t be more than number of element in ‘B’.
Probability Theorems Theorem 5 : If ‘E’ is any event and E1 be the complement of event ‘E’, then P(E) + P(E1) = 1. Proof: Let ‘S’ be the sample – space, then n(E) + n(E1) = n(S) or n (E) / n (S) + n (E1) / n (S) = 1 or P(E) + P(E1) = 1
Computing Conditional Probabilities Conditional probability P(A|B) is the probability of event A, given that event B has occurred: The conditional probability of A given that B has occurred Where P(A B) = joint probability of A and B P(A) = marginal probability of A P(B) = marginal probability of B
Computing Joint and Marginal Probabilities • The probability of a joint event, A and B: • Independent events: • P(B|A) = P(B) equivalent to • P(A and B) = P(A)P(B) • Bayes’ Theorem: • A1, A2,…An are mutually exclusive and collectively exhaustive
Visualizing Events • Contingency Tables • Tree Diagrams Ace Not Ace Total Black 2 24 26 Red 2 24 26 Total 4 48 52 Sample Space 2 24 2 24 Ace Sample Space Black Card Not an Ace Full Deck of 52 Cards Ace Red Card Not an Ace
Joint Probabilities Using Contingency Table Event Total B1 B2 Event P(A1 B2) A1 P(A1 B1) P(A1) P(A2 B1) P(A2 B2) A2 P(A2) 1 Total P(B1) P(B2) Marginal (Simple) Probabilities Joint Probabilities
Example • Of the cars on a used car lot, 70% have air conditioning (AC) and 40% have a CD player (CD). 20% of the cars have a CD player but not AC. • What is the probability that a car has a CD player, given that it has AC ?
Introduction to Probability Distributions • Random Variable • Represents a possible numerical value from an uncertain event Random Variables Discrete Random Variable Continuous Random Variable
Mean Variance of a discrete random variable • Deviation of a discrete random variable where: E(X) = Expected value of the discrete random variable X Xi = the ith outcome of X P(Xi) = Probability of the ith occurrence of X
Example: Toss 2 coins, X = # of heads, compute expected value of X: E(X) = (0 x 0.25) + (1 x 0.50) + (2 x 0.25) = 1.0 X P(X) 0 0.25 1 0.50 2 0.25 • compute standard deviation Possible number of heads = 0, 1, or 2
The Covariance The covariance measures the strength of the linear relationship between two variables The covariance: where: X = discrete variable X Xi = the ith outcome of X Y = discrete variable Y Yi = the ith outcome of Y P(XiYi) = probability of occurrence of the ith outcome of X and the ith outcome of Y
Correlation Coefficient Measure of dependence of variables X and Y is given by if = 0 then X and Y are uncorrelated
Probability Distributions Probability Distributions Discrete Probability Distributions Continuous Probability Distributions Binomial Normal Poisson Uniform Hypergeometric Exponential Multinomial
Binomial Distribution Formula n ! - c c n P(X=c) = p (1-p) c ! ( - ) ! n c P(X=c) = probability of c successes in n trials, Random variable X denotes the number of ‘successes’ in n trials, (X = 0, 1, 2, ..., n) n = sample size (number of trials or observations) p = probability of “success” in a single trial (does not change from one trial to the next) Example: Flip a coin four times, let x = # heads: n = 4 p = 0.5 1 - p = (1 - 0.5) = 0.5 X = 0, 1, 2, 3, 4
Binomial Distribution • The shape of the binomial distribution depends on the values of p and n Mean n = 5 p = 0.1 P(X) .6 • Here, n = 5 and p = 0.1 .4 .2 0 X 0 1 2 3 4 5 n = 5 p = 0.5 P(X) .6 • Here, n = 5 and p = 0.5 .4 .2 X 0 0 1 2 3 4 5
Binomial Distribution Characteristics • Mean • Variance and Standard Deviation Where n = sample size p = probability of success (1 – p) = probability of failure
Multinomial Distribution P(Xi=c..Xk=Ck) = probability of having xi outputs in n trials, Random variable Xi denotes the number of ‘successes’ in n trials, (X = 0, 1, 2, ..., n) n = sample size (number of trials or observations) p= probability of “success” Example: You have 5 red, 4 blue and 3 yellow balls times, let xi = # balls: n =12 p =[ 0.416, 0.33, 0.25]
The Normal Distribution • ‘Bell Shaped’ • Symmetrical • Mean, Median and Mode are Equal Location is determined by the mean, μ Spread is determined by the standard deviation. The random variable has an infinite theoretical range: + to f(X) σ X μ Mean = Median = Mode
The formula for the normal probability density function is Any normal distribution (with any mean and standard deviation combination) can be transformed into the standardized normaldistribution (Z). Where Z=(X-mean)/std dev. Need to transform X units into Z units Where e = the mathematical constant approximated by 2.71828 π = the mathematical constant approximated by 3.14159 μ = the population mean σ = the population standard deviation X = any value of the continuous variable
Comparing X and Z units 100 200 X (μ = 100, σ = 50) 0 2.0 Z (μ = 0, σ = 1) Note that the distribution is the same, only the scale has changed. We can express the problem in original units (X) or in standardized units (Z)
Finding Normal Probabilities • Suppose X is normal with mean 8.0 and standard deviation 5.0 Find P(X < 8.6) = 0.5 + P(8 < X < 8.6) X 8.0 8.6
The Standardized Normal Table The value within the table gives the probability from Z = up to the desired Z value The columngives the value of Z to the second decimal point Z 0.00 0.01 0.02 … 0.0 0.1 The row shows the value of Z to the first decimal point . . . 2.0 .4772 2.0 P(Z < 2.00) = 0.5 + 0.4772
Relationship between Binomial & Normal distributions • If n is large and if neither p nor q is too close to zero, the binomial distribution can be closely approximated by a normal distribution with standardized normal variable given by X is the random variable giving the no. of successes in n Bernoulli trials and p is the probability of success. • Z is asymptotically normal
Normal Approximation to the Binomial Distribution • The binomial distribution is a discrete distribution, but the normal is continuous • To use the normal to approximate the binomial, accuracy is improved if you use a correction for continuity adjustment • Example: • X is discrete in a binomial distribution, so P(X = 4) can be approximated with a continuous normal distribution by finding P(3.5 < X < 4.5)
Normal Approximation to the Binomial Distribution (continued) • The closer p is to 0.5, the better the normal approximation to the binomial • The larger the sample size n, the better the normal approximation to the binomial • General rule: • The normal distribution can be used to approximate the binomial distribution if np ≥ 5 and n(1 – p) ≥ 5
Normal Approximation to the Binomial Distribution (continued) • The mean and standard deviation of the binomial distribution are μ = np • Transform binomial to normal using the formula:
Using the Normal Approximation to the Binomial Distribution • If n = 1000 and p = 0.2, what is P(X ≤ 180)? • Approximate P(X ≤ 180) using a continuity correction adjustment: P(X ≤ 180.5) • Transform to standardized normal: • So P(Z ≤ -1.54) = 0.0618 X 180.5 200 Z -1.54 0
Poisson Distribution where: X = discrete random variable (number of events in an area of opportunity) = expected number of events (constant) e = base of the natural logarithm system (2.71828...)
Poisson Distribution Characteristics • Mean • Variance and Standard Deviation where = expected number of events
Poisson Distribution Shape • The shape of the Poisson Distribution depends on the parameter : = 0.50 = 3.00
Relationship between Poisson & Normal distributions • In a Binomial Distribution if n is large and p is small ( probability of success ) then it approximates to Poisson Distribution with= np.
Relationship b/w Poisson & Normal distributions • Poisson distribution approaches normal distribution as with standardized normal variable given by
Are there any other distributions besides binomial and Poisson that have the normal distribution as the limiting case?
The Uniform Distribution • The uniform distribution is a probability distribution that has equal probabilities for all possible outcomes of the random variable • Also called a rectangular distribution
Uniform Distribution Example Example: Uniform probability distribution over the range 2 ≤ X ≤ 6: 1 f(X) = = 0.25 for 2 ≤ X ≤ 6 b-a f(X) 0.25 X 2 6
Sampling Distributions Sampling Distributions Sampling Distribution of the Mean Sampling Distribution of the Proportion
Sampling Distributions • A sampling distribution is a distribution of all of the possible values of a statistic for a given size sample selected from a population
Developing a Sampling Distribution • Assume there is a population … • Population size N=4 • Random variable, X,is age of individuals • Values of X: 18, 20,22, 24 (years) D C A B
Developing a Sampling Distribution (continued) Summary Measures for the Population Distribution: P(x) .3 .2 .1 0 x 18 20 22 24 A B C D Uniform Distribution
Now consider all possible samples of size n=2 Sampling Distribution of Means (continued) 16 Sample Means 16 possible samples (sampling with replacement)
Sampling Distribution of Means Summary Measures of this Sampling Distribution: (continued)
Comparing the Population with its Sampling Distribution Population N = 4 Sample Means Distribution n = 16 _ P(X) P(X) .3 .3 .2 .2 .1 .1 _ 0 0 X 1820 22 24 A B C D 18 19 20 21 22 23 24 X