1 / 182

BU255: Final Exam-AID (updated) Taught by Greg Overholt

BU255: Final Exam-AID (updated) Taught by Greg Overholt. Chapter 9: Sampling Distribution. Chapter 9. Sampling Distributions Pop’s are usually TOO large to calculate accurate parameters . SO, take samples, calculate statistics related to the parameter and make inferences based on that!

gwidon
Download Presentation

BU255: Final Exam-AID (updated) Taught by Greg Overholt

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BU255: Final Exam-AID(updated)Taught by Greg Overholt

  2. Chapter 9: Sampling Distribution

  3. Chapter 9 • Sampling Distributions • Pop’s are usually TOO large to calculate accurate parameters. • SO, take samples, calculate statistics related to the parameter and make inferences based on that! • This is where the sampling distributions come in! • To get more accurate results, increase sample size!

  4. Sampling Distribution • This is reflected in the change in the formula. Standard deviation for the sample distribution is reduced with higher sample sizes. The standard deviation of the sampling distribution is called the standard error. (ux = u – mean of sample same as pop mean)

  5. Central Limit Theory • If pop = normal, then sample = normal for all values of n. • If pop = non-normal, then sample = approximately normal only for larger values of n. • In most practical situations, a sample size of 30 may be sufficiently large to allow us to use the normal distribution as an approximation for the sampling distribution of X. Central Limit Theory: states that the sampling distribution will be approximately normal for sufficiently large sampling sizes.

  6. Sampling Distribution of the Mean Assuming the population is infinitely large (so can be considered normal): Any pop that is 20 times the sample size is to be considered LARGE!

  7. Questions! • If a customer buys one bottle, what is the probability that the bottle will contain more than 32 ounces? • If a customer buys four bottles, what is the probability that the mean of the 4 bottles will be more than 32 ounces? • Want P( X > 32) with u = 32.2 and σ = .3 Z > (32 – 32.2) / ( .3 / √4 ) Z > -.2 / .15 Z > -1.33 = .9082 (TABLE!)

  8. Sample Dist for Inference • You can rearrange the formula to find a confidence interval: Question: Census indicated that people under 24 use facebook 7 hours a week. You polled 100 university students and found that on average they used facebook 8 hours a week with a standard deviation of 3 hours. Is the census average seem right using 95% confidence. P(7 – 1.96 (3/10)) < X < 7 + 1.96(3/10)) = .95 P(6.412 < X < 7.588) = .95 If the census data was correct, the sample mean should land with 6.4 and 7.5, 95% of the time. Yours is 8 which indicates that either the pop mean is wrong or there were errors in your data/calculations/polling.

  9. Normal Approximation to Binomial • With a binomial distribution (and ‘p’ close to .5) you can approximate a normal distribution: • If you have n = 20 and p = .5 (heads/tails) you can determine u and σ: • u = np = 20(.5) = 10 • σ2 = np(1-p) = 20(.5)(.5) = 5 • σ = √5 = 2.24 • For the approximation to provide good results two conditions should be met: 1) np ≥ 5 2) n(1–p) ≥ 5

  10. Normal Approx to Binomial • You can use the normal curve properties to estimate P(X=10), by taking the area between 9.5 and 10.5 on normal curve. • In fact: P(X = 10) = .176 • While P(9.5 < Y < 10.5) = .1742 • the approximation is quite good.

  11. Sampling Distribution of a Sample Prop • When you have a sample proportion (so votes, heads/tails,..) you can determine the same set of statistics: • E(P) = p • V(P) = σ2 = p(1-p)/n • σ = √p(1-p)/n • These can also be standardized to standard normal dist:

  12. Sampling Distribution of a Sample Prop Example: Suppose 60% of students use latex vs saran wrap for their condom choice. What is the probability of taking a random sample of size 120 students and finding that 50% or less use that brand? p = .60 p(hat) = .50 n = 120 = .50 - .60 / √.60(.40)/120 = -.1 / .0447 = -2.24 We aren’t done: we want to know less then -2.24, so a left tail. The chart says 2.24 has .0125 on the left tail = .0125 or 1.25%

  13. Sampling Dist: Different of 2 means • When you have 2 samples of the same population, you can compare them! • Two sample means will be (approximately) normally distributed if: • Two populations are both normally distributed • Two populations are not both normally distributed BUT the sample sizes are “large” (>30)

  14. Sampling Dist: Different of 2 means X1 - X2 X1 - X2 • You can find the difference of their means! • Need independent samples from normal pop’s • If so, the different X1-X2 is normal n

  15. EXAMPLE X1 - X2 • There are two species of green beings on Mars. The mean height of Species 1 is 32 while the mean height of Species 2 is 22. The standard deviation of the two species are 60 and 70 respectively and the heights of both species are normally distributed. • You randomly sample 10 members of Species 1 and 14 members of Species 2. What is the probability that the mean of the 10 members of Species 1 will exceed the mean of the 14 members of Species 2 by 5 or more? σ12 – σ22 = 3600/10 – 4900/14 = sqrt (10) = 3.17 = n n

  16. EXAMPLE Z > 5 – (32-22) 3600/10 – 4900/14 Z > (5 – 10 )/ 3.17 Z > -5/3.17 Z > -1.577 Look up on normal dist table!! = .941 or 94.1% chance that the mean of species 1 will be greater then species 2 by 5.

  17. Chapter 10:Estimation

  18. Binomial, Poisson, normal, and exponential distributions allow us to make probability statements about X (an individual member of the population). • To do so we need the population parameters. • Binomial: p • Poisson: μ • Normal: μ and σ • Exponential: λ

  19. Chapter 10 • Introduction to Statistical Inference • Estimation • Point and Interval Estimators • Properties of Estimators • Interval Estimation [ confidence intervals] • Determining Sample Size

  20. Estimation • Estimation: determining approximate value of pop parameter based on sample statistic. • 2 types: • Point Estimator • No good. Too small • Interval Estimator • Used almost all the time. • Uses an interval to estimate the population parameter. • Provides % certainty that it is between a lower and upper bound

  21. Estimating u when σ known In Chapter 9 we saw this, providing confidence in the sample mean: You can rearrange it, to be a confidence interval for the population mean!

  22. Desirable Qualities of Estimators…Great MC q! • Unbiased: the expected value of an estimator is equal to that parameter. • Consistent: the difference between the estimator and the parameter grows smaller as the sample size grows larger. • Relative Efficiency: If there are two unbiased estimators of a parameter, the one whose variance is smaller is said to be relatively efficient.

  23. EXAMPLE • Diageo, sampled 85 Laurier students and determined the sample mean of alcohol consumption was 510 drinks a term. They previous calculated that the population standard deviation was 46. Please create interval of population mean with 95% confidence. X(bar) = 510 n = 85 σ = 46 Za/2 .. Unknown, but we want 95% confidence. 95% in the middle, so that’s 2.5% on each tail, so we want to find the Z value of .475 = 1.96 = 510 – 1.96(46/√85) < u < 510+ 1.96(46/√85) = 510 – 9.78 < u < 510+ 9.78 = 500.22 < u < 519.78 95% confident that the average number of drinks for the population is between 500.22 and 519.78

  24. Selecting the sample size • The difference between the sample mean and the population mean is called the error of estimation. • You can make sure you stay within it, by another freaking formula: B = Bound on the error (given in q)

  25. EXAMPLE • I want to know how many students I need to interview to find out how many times a Laurier student facebook stakes in 1 day. I want to be 95% certain and that the range of error is 2. It turns out the standard deviation of this stat is 5. GO: n = ? (what we want to find out) σ = 5 W = 2 Z a/2 = 95% confidence.. Which is 2.5% in each tail, which is a z value of 1.96 n = (1.96 * 5 / 2 ) 2 n = 24.01 (so need 25 people)

  26. Chapter 11: Intro Hypothesis Testing

  27. Hypothesis Testing • There are two procedures for making inferences: • Estimation. • Hypotheses testing. • The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief about a parameter.

  28. Hypothesis Testing • There are two hypothesis: • Null Hypothesis (H0) • Assumed to be true • Ex. The defendant is innocent • Alternative (or research) Hypothesis (H1) • Opposite of H0 • Ex. The defendant is guilty • NOTE: The null will always states the parameter equal the value specified in the alternative.

  29. Hypothesis Testing Process • Step 1: State the Null and Alternative • Eg: You want to see if the exam average will be greater then 75%. • H0 = 75 • H1 > 75 • Step 2: randomly sample the pop and create a test statistic (in this case a sample mean) • The procedure begins with the NULL BEING TRUE (and the goal is to see if there is enough evidence to say that the alternative is true). • Step 3: Make statement about hypo • If t-stat value is inconsistent with null hypo, we reject the null  alternative is true.

  30. Hypo Testing Decisions • Reject the null in favour of the alternative • Sufficient evidence to support the alternative • Do not reject the null in favour of alt. • Does not mean ‘accepting the null’ (just not enough evidence) • Ex. Can’t prove that the defendant is guilty does not mean that he is innocent

  31. Hypo Testing Errors • Two types of errors are possible when making the decision whether to reject H0(the null hypothesis) • Type 1 error (alpha): reject null hypothesis – send a innocent man to jail (reject null when null is true!) MOST SERIOUS OF THE TWO!

  32. Hypo Testing Errors Type 2 error: don’t reject a false null hypothesis (go with the safe null assumption.. Don’t have the balls to reject it!!  ) Guilty man goes free. (not rejected null when null is actually false) It can be calculated .. (later). Our original hypothesis… THIS EXAMPLE IS TESTING IF HYDRO BILLS ARE > estimated mean of 170. Sample bills were taken to get x bar.. etc our new assumption…

  33. 2 ways to Test: Rejection Region • Depending on you are looking for <, >, or not equal to, you define the rejection region • Level of significance = α

  34. Test It: P-value • The p-value of a test is the probability of observing a test statistic at least as extreme as the one computed given that the null hypothesis is true. • The smallest value of α for which H0 can be rejected p-value P-value =.0069 Z=2.46

  35. Type II Error Example Example: • H0: µ = 170 • H1: µ > 170 • At a significance level of 5% we rejected H0 in favor of H1 since our sample mean (178) was greater than the critical value of (175.34). In the question – they will have to give you the new mean to test. ($180 mean) • β= P( x < 175.34, given that µ = 180), thus…

  36. Our original hypothesis… our new assumption… Chance we send a guilty man free

  37. Changing your confidence requirement!

  38. INCREASE THE SAMPLE SIZE!

  39. Test It with P-values The p-value of a test is the probability of observing a test statistic at least as extreme as the one computed given that the null hypothesis is true. The smallest value of α for which H0 can be rejected p-value P-value =.0069 Z=2.46

  40. Chapter 12: Inferences about a Population

  41. What will we ‘infer’? • Inference About: • Population Mean • Population Variance • Population Proportion • Inference About: • 1 population • 2 or more pop’s

  42. What’s different? • In past, we have known standard deviation of the population (which is unrealistic) • With it, we can use Z stat to make inferences • NOW, we don’t know st dev. So, have to use the ‘sample st dev’ – why we use the T-stat • GOT to have a normal (or approx) population dist!

  43. t-Distribution • Created by: William Sealy Gosset (MC?) • It has one parameter: degree of freedom (df) v • Degree of freedom: number of observations that are free to vary after sample mean has been found • How many degrees??? N-1!! • (if 5 items, 4 degrees)

  44. EXCEL (good MC) • T-dist calculations can be done using excel: • TDIST(x,degrees_freedom,tails) • This is when you want the % in the tail(s). • TDIST(1.3,60,1) • 1.3 is your t-value (like your z-value) and the curve is drawn with 60 degrees of freedom and you want the 1 tail test (vs 2). (ANSWER = 0.0992 (so 9% in the 1 tail test)) • TINV(p-value,degrees_freedom) • This is the inverse. Give it the % in the tails and it will give you the T-value. • NOTE: will give you the % in a 2-tail test!!!! • SO, if they wanted you to do the inverse of the q above to get a t-value of 1.3: • TINV(0.1984, 60) – you double the percentage for 2 tail!

  45. Estimating MEAN T-dist instead of normal & sample stdev and not population stdev. QUESTION: Tiger Woods is rumoured to pay his ‘girls’ $1million per year to stay quiet. If a random sample of 7 of them were taken and the mean was $800,000 with a stdev of $100,000. Find the 95% interval estimate of the population mean. (assume pop is normal..) Degrees of freedom = 6 = 800K + t(.025) (100K/√7) = 800K + 2.447(37,796) = 800,000 +/- 92,486 RANGE between $707,513 and $892,486

More Related