690 likes | 861 Views
Introducing Concepts of Statistical Inference. Beth Chance, John Holcomb, Allan Rossman Cal Poly – San Luis Obispo, Cleveland State University. Ptolemaic Curriculum?.
E N D
Introducing Concepts of Statistical Inference Beth Chance, John Holcomb, Allan Rossman Cal Poly – San Luis Obispo, Cleveland State University
Ptolemaic Curriculum? “Ptolemy’s cosmology was needlessly complicated, because he put the earth at the center of his system, instead of putting the sun at the center. Our curriculum is needlessly complicated because we put the normal distribution, as an approximate sampling distribution for the mean, at the center of our curriculum, instead of putting the core logic of inference at the center.” – George Cobb (TISE, 2007) 2
Is this feasible? • Experience at post-calculus level • Developed spiral curriculum with logic of inference (for 2×2 tables) in chapter 1 • ISCAM: Investigating Statistical Concepts, Applications, and Methods (Chance, Rossman) • New project (funded by NSF/CCLI) • Rethinking for lower mathematical level • More complete shift, including focus on entire statistical process as a whole 3
Workshop goals Enable you to: Re-examine how you introduce concepts of statistical inference to your students Help your students to understand fundamental concepts of statistical inference Develop students’ understanding of the process of statistical investigations Introduce normal-based methods of inference to complement randomization-based ones
Workshop goals (cont.) Enable you to: Implement activities based on real data from genuine studies Assess student understanding of inference concepts Make effective use of simulations, both tactile and computer-based
Agenda Mon pm: Inference for proportion Overview, introductions Statistical significance via simulation Exact binomial inference CI for proportion Transition to normal-based inference for proportion 6 6 CAUSE Webinar April 2009
Agenda (cont.) Tues am: Inference for 2×2 table Simulating randomization test Fisher’s exact test Observational studies, confounding Independent random samples Tues pm: Comparing 2 groups with quant response Simulating randomization test Matched pairs designs 7 7
Agenda (cont.) Wed am: Assessment issues Strategies for assessing student understanding/learning Preliminary findings Wed pm: More inference scenarios Comparing several groups (ANOVA, chi-square) Correlation/regression Discussion of implementation issues 8 8
Some notes Agenda is always subject to change Already has changed some! We’ll discuss some assessment, implementation issues throughout Please offer questions, comments as they arise Be understanding when we don’t have all the answers! We’ll also discuss some thorny issues that we have not resolved among ourselves 9 9
Introductions Who are you? Where/what do you teach? Why interested in this topic?
Example 1: Helper/hinderer? • Sixteen infants were shown two videotapes with a toy trying to climb a hill • One where a “helper” toy pushes the original toy up • One where a “hinderer” toy pushes the toy back down • Infants were then presented with the two toys as wooden blocks • Researchers noted which toy infants chose • http://www.yale.edu/infantlab/socialevaluation/Helper-Hinderer.html 11
Example 1: Helper/hinderer? Data: 14 of the 16 infants chose the “helper” toy Core question of inference: Is such an extreme result unlikely to occur by chance (random selection) alone … … if there were no genuine preference (null model)? 12
Analysis options Could use a binomial probability calculation We prefer a simulation approach To emphasize issue of “how often would this happen in long run?” Starting with tactile simulation 13
Strategy Students flip a fair coin 16 times Count number of heads, representing choices of “helper” toy Fair coin represent null model of no genuine preference Repeat several times, combine results See how surprising to get 14 or more heads even with “such a small sample size” Approximate (empirical) P-value Turn to applet for large number of repetitions: http://statweb.calpoly.edu/bchance/applets/BinomDist3/BinomDist.html 14
Results • Pretty unlikely to obtain 14 or more heads in 16 tosses of a fair coin, so … • Pretty strong evidence that infants do have genuine preference for helper toy and were not just picking at random
Example 1: Helper/hinderer Can do this on day 1 of course Logic of statistical inference/significance Null model, simulation, p-value, significance
Example 2: Kissing Study: 8 of 12 kissing couples lean to right Does this provide evidence against 50/50 model? Does this provide evidence against 75/25 model? What models does this provide evidence against?
Example 2: Kissing Many new ideas here: Students describe rather than perform simulation Non-significant result (8/12) Null model other than 50/50 Looking at lower tail Sample size effect Big idea: Interval of plausible values (CI) Effect of confidence level Importance of random sampling
Transition to normal-based inference Two methods to find p-value for proportion: Approximation by simulation Exact binomial calculation Why should we present normal approx at all? Because it’s commonly used (not good reason) Because even minimally observant student will notice similarities of these simulated distributions Because z-scores convey additional information Distance from expected, measured in SDs
Example 1: Baseball Big Bang Some non-trivial aspects Defining parameter Expressing hypotheses Sampling distribution z = -5.75 conveys more information than p-value ≈ 0 95% CI: Does this produce more/less understanding than forming CI by inverting test?
Example 2: Which tire? Which tire would you choose? Fun, simple in-class data collection Almost always in conjectured direction May or may not be significant Can use simulation or binomial or normal Investigate effect of sample size
Example 3: Cat Households Sensible to use normal approx here H0: p = 1/3, Ha: p≠ 1/3 z = -10.4, p-value ≈ .0000 99% CI: (.312, .320) P-value and CI are complementary But provide different information Statistical vs practical significance
Example 4: Female Senators 95% CI for p: (.096, .244) Beware of biased sampling methods If you have access to entire population: no inference to be drawn!
Example 2: Dolphin therapy? Subjects who suffer from mild to moderate depression were flown to Honduras, randomly assigned to a treatment Is dolphin therapy more effective than control? Core question of inference: Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)? 24
Some approaches • Could calculate test statistic, P-value from approximate sampling distribution (z, chi-square) • But it’s approximate • But conditions might not hold • But how does this relate to what “significance” means? • Could conduct Fisher’s Exact Test • But there’s a lot of mathematical start-up required • But that’s still not closely tied to what “significance” means • Even though this is a randomization test 25
Alternative approach • Simulate random assignment process many times, see how often such an extreme result occurs • Assume no treatment effect (null model) • Re-randomize 30 subjects to two groups (using cards) • Assuming 13 improvers, 17 non-improvers regardless • Determine number of improvers in dolphin group • Or, equivalently, difference in improvement proportions • Repeat large number of times (turn to computer) • Ask whether observed result is in tail of distribution • Indicating saw a surprising result under null model • Providing evidence that dolphin therapy is more effective 26
Analysis http://www.rossmanchance.com/applets/Dolphins/Dolphins.html 27 27
Non-simulation approach Exact randomization distribution Hypergeometric distribution Fisher’s Exact Test p-value = = .0127
Conclusion • Experimental result is statistically significant • And what is the logic behind that? • Observed result very unlikely to occur by chance (random assignment) alone (if dolphin therapy was not effective) 29
Example 2: Yawning What’s different here? Group sizes not the same So calculating success proportions more important Experimental result not significant Lack of surprising-ness is harder for students to spot than surprising-ness Well-stated conclusion is more challenging, subtle Don’t want to “accept null model”
Example 3: Murderous Nurse? Murder trial: U.S. vs. Kristin Gilbert Accused of giving patients fatal dose of heart stimulant Data presented for 18 months of 8-hour shifts Relative risk: 6.34
Example 3 (cont.) Structurally the same as dolphin and yawning examples, but with one crucial difference No random assignment to groups Observational study Allows many potential explanations other than “random chance” Confounding variables Perhaps she worked intensive care unit or night shift Is statistical significance still relevant? Yes, to see if “random chance” can plausibly be ruled out as an explanation Some statisticians disagree
Example 4: Native Californians? What’s different here? Not random assignment to groups Independent random sampling from populations So … Scope of conclusions differs Generalize to larger populations, but no cause/effect conclusions Use different kind of randomness in simulation To model use of randomness in data collection
Example 1: Lingering sleep deprivation? • Does sleep deprivation have harmful effects on cognitive functioning three days later? • 21 subjects; random assignment • Core question of inference: • Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)? 34
One approach • Calculate test statistic, p-value from approximate sampling distribution 35
Randomization approach • Simulate randomization process many times under null model, see how often such an extreme result (difference in group means) occurs • Start with tactile simulation using index cards • Write each “score” on a card • Shuffle the cards • Randomly deal out 11 for deprived group, 10 for unrestricted group • Calculate difference in group means • Repeat many times 36 36
Example 1 Sleep deprivation (cont.) • Conclusion: Fairly strong evidence that sleep deprivation produces lower improvements, on average, even three days later • Justifcation: Experimental results as extreme as those in the actual study would be quite unlikely to occur by chance alone, if there were no effect of the sleep deprivation
Exact randomization distribution • Exact p-value 2533/352716 = .0072
Example 2: Age discrimination? • Employee ages: • 25, 33, 35, 38, 48, 55, 55, 55, 56, 64 • Fired employee ages in bold: • 25, 33, 35, 38, 48, 55, 55, 55, 56, 64 • Robert Martin: 55 years old • Do the data provide evidence that the firing process was not “random” • How unlikely is it that a “random” firing process would produce such a large average age?
Exact permutation distribution • Exact p-value: 6 / 120 = .05
Example 3: Memorizing letters • You will be given a string of 30 letters • Memorize as many as you can, in order, in 20 seconds
Confidence Intervals based on Randomization Tests (Quantitative) • Invert randomization test • Subtract d from all subjects in group B, re-randomize, add d from all subjects in group B, compare to observed difference • Similar to binomial example (kissing study) • Get standard error from randomization distribution and use observed +- 2 SEs • Get percentiles from randomization distribution and use observed +- percentiles • t-interval • Bootstrapping
Series of Lab Assignments • Lab 1: Helper/Hinderer (Binomial test) • Lab 2: Dolphin Therapy (2x2 table) • Lab 3: Textbook prices (matched pairs from normal population) or JFK/JFKC (randomization on quantitative variable) • Lab 4: Random Babies • Lab 5: One-sample z-test for proportion (Reeses Pieces) • Lab 6: Sleepless nights (t-test, confidence interval) • Lab 7: Sleep deprivation (randomization test) • Lab 8: Study Hours and GPA (regression with simulation and Minitab output)
Random Babies • Suppose that 4 mothers give birth to baby boys at the same hospital on the same night • Hospital staff returns babies to mothers at random! • How likely is it that … • … nobody gets the right baby? • … everyone gets the right baby? • …
Random Babies • Last Names First Names • Jones Jerry • Miller Marvin • Smith Sam • Williams Willy
Random Babies Last Names First Names Jones Marvin Miller Smith Williams
Random Babies Last Names First Names Jones Marvin Miller Willy Smith Williams
Random Babies Last Names First Names Jones Marvin Miller Willy Smith Sam Williams
Random Babies Last Names First Names Jones Marvin Miller Willy Smith Sam 1 match Williams Jerry
Random Babies 1234 1243 1324 1342 1423 1432 2134 2143 2314 2341 2413 2431 3124 3142 3214 3241 3412 3421 4123 4132 4213 4231 4312 4321