220 likes | 416 Views
Chapter 18 – Sampling Distribution Models. How accurate is our sample?. Sometimes different polls show different results for the same question. Since each poll samples a different group of people, we should expect some variation in the results.
E N D
How accurate is our sample? Sometimes different polls show different results for the same question. Since each poll samples a different group of people, we should expect some variation in the results. We could try drawing lots of samples and looking at the variation amongst those samples.
Experiment: Simulating a sample • A recent US Census Bureau study (source) reports that about 30% of Americans 25 or older have a Bachelor’s degree. • Open up a blank Minitab worksheet and let’s generate some random data: • Calc > Random Data > Bernoulli • Enter 200 rows • Store in Column C1-C20 • Event Probability: .3
Proportion estimates for samples of size 5 • We can treat each row as a sample and calculate the proportion of each sample using the mean. • Samples of size 5: • Calc > Row Statistics > Mean • Input Variables: C1 – C5 • Store result in: C21 • Look at these sample proportions. Are they close to the population proportion of 30%? • Draw a histogram of the sample proportions in C21
Proportion estimates for samples of size 10 • Samples of size 10: • Calc > Row Statistics > Mean • Input Variables: C1 – C10 • Store result in: C22 • Look at these sample proportions. Are they close to the population proportion of 30%? • Draw a histogram of the sample proportions in C22
Proportion estimates for samples of size 20 • Samples of size 10: • Calc > Row Statistics > Mean • Input Variables: C1 – C20 • Store result in: C23 • Look at these sample proportions. Are they close to the population proportion of 30%? • Draw a histogram of the sample proportions in C23
Sampling Distribution Model for a Proportion • Our histogram of the sample proportions started to look like a Normal model • The larger our sample size gets, the better the Normal model works • Assumptions: • Independence: sampled values must be independent of each other • Sample Size: n must be large enough
Conditions to check for assumptions • Randomization Condition: • Experiments should have treatments randomly assigned • Survey samples should be a simple random sample or representative, unbiased sample otherwise • 10% Condition: • Sample size n must be no more than 10% of population • Success/Failure Condition: • Sample size needs to be large enough to expect at least 10 successes and 10 failures
Sampling Distribution Model for a Proportion If the sampled values are independent and the sample size is large enough, The sampling distribution model of is modeled by a Normal model with:
Example: Proportion of Vegetarians • 7% of the US population is estimated to be vegetarian. If a random sample of 200 people resulted in 20 people reporting themselves as vegetarians, is this an unusually high proportion? • Conditions: • Randomization • 10% condition • Success/Failure
Vegetarians Example continued Since our conditions were met, it’s ok to use a Normal model. = 20/200 = .10 E( ) = p = .07 z = This result is within 2 sd’s of mean, so not unusual
68-95-99.7 Rule with Vegetarians 68% 95% 98% -3σ -2σ -1σ 1σ 2σ 3σ p
Sampling Distribution of a Mean Rolling dice simulation 10,000 individual rolls recorded Figure from DeVeaux, Intro to Stats
Sampling Distribution of a Mean Roll 2 dice 10,000 times, average dice Figure from DeVeaux, Intro to Stats
Sampling Distribution of a Mean Rolling 3 dice 10,000 times and averaging dice Figure from DeVeaux, Intro to Stats
Sampling Distribution of a Mean Rolling 5 dice 10,000 times and averaging Figure from DeVeaux, Intro to Stats
Sampling Distribution of a Mean Rolling 20 dice 10,000 times and averaging Once again, as sample size increases, Normal model appears Figure from DeVeaux, Intro to Stats
Central Limit Theorem • The sampling distribution of any mean becomes more nearly Normal as the sample size grows. • The larger the sample, the better the approximation will be • Observations need to be independent and collected with randomization.
CLT Assumptions • Assumptions: • Independence: sampled values must be independent • Sample Size: sample size must be large enough • Conditions: • Randomization • 10% Condition • Large enough sample
Which Normal Model to use? The Normal Model depends on a mean and sd Sampling Distribution Model for a Mean When a random sample is drawn from any population with mean µ and standard deviation σ, its sample mean y has a sampling distribution with: Mean: µ Standard Deviation:
Example: CEO compensation 800 CEO’s Mean (in thousands) = 10,307.31 SD (in thousands) = 17,964.62 Samples of size 50 were drawn with: Mean = 10,343.93 SD = 2,483.84 Samples of size 100 were drawn with: Mean = 10,329.94 SD = 1,779.18 According to CLT, what should theoretical mean and sd be? Example from DeVeaux, Intro to Stats