300 likes | 422 Views
Statistics Workshop Principles of Hypothesis Testing J-term 2009 Bert Kritzer. Statistical Inference. Inference about populations from samples Inference about underlying processes Could the observed pattern been generated by a random process?
E N D
Statistics Workshop Principles of Hypothesis TestingJ-term 2009Bert Kritzer
Statistical Inference • Inference about populations from samples • Inference about underlying processes • Could the observed pattern been generated by a random process? • Inference about systematic vs. random (“stochastic”) components Observation = Systematic + Random • Sampling • Process • Observed statistics as random variables
Statistics As Random Variables μ = 0.564 mean of means = 0.566
Hypothesis Testing A procedure for drawing conclusions about characteristics in a population or about a process: • Is it reasonable to conclude that there is a relationship between two variables? • Is it reasonable to conclude that a populaton parameter exceeds some value? • Is it reasonable to conclude that the population parameter differs among two or more groups?
Hypothesis TestingResearch Hypotheses Are African-Americans more likely to be stopped by the police? Is the average number of stops for African-Americans different than that for Whites?
Hypothesis TestingNull Hypotheses Are African-Americans more likely to be stopped by the police? Is the average number of stops greater for African-Americans than for Whites?
Why the Null Hypothesis? • Can state the null hypothesis with precision, and that allows us to compute the probability of observing a particular result if the null hypothesis is true • Logically it is easier to ascertain what is untrue than what is true. If we can dichotomize the possibilities A and B, and then determine that A is not true, it must be the case that B is true.
Hypothesis Testing andLegal Decisionmaking H0: Innocent HA: Guilty Legal Decisionmaking: Pr(Guilty|Evidence) Hypothesis testing: Pr(Evidence|Innocent)
Substantive vs. Statistical Significance • How big of a difference could be explained by random processes such as sampling? • Depends on sample size and characteristics of the underlying distribution • How big of a difference is enough that we should care about it? • Normative/policy question • Depends on “costs” associated with differences • Statistical significance refers only to the first of these
“Significance Level” • How careful do you want to be in reaching a conclusion about your hypothesis? • Process will ask whether your null hypothesis can be rejected • What probability that you incorrectly rejected the null hypothesis are you willing to accept? • This probability is the significance level
Directionality • Is the research hypothesis directional? • Do you think that the salaries of men are greater on average than the salaries of women? • Do you think that the salaries paid to African-Americans differ from the salaries paid to Whites? • Is the coin loaded vs. is the coin loaded toward heads? • “One-tailed” (directional) vs. “Two-tailed” (nondirectional) hypotheses • If directional hypothesis is correct, you are more likely to reject null hypothesis with one-tailed test
TYPE I (α error): Rejecting a Null Hypothesis that is in fact true Set by the “significance level” TYPE II (β error): Failing to Reject a Null Hypothesis that is in fact false Depends on the significance level, the sample size, and how wrong the null is Types of Error
Steps in Hypothesis Testing • State research (“alternative”) hypothesis HA • State null hypothesis H0 • Set decision rule • “level of significance” Pr(α error) • Directional or nondirectional (“one-tailed” or “two-tailed”) based on research hypothesis • Obtain data (set sample size) • Compute test statistic and get probability of observing it under H0 (“p-value”) • Make decision whether to reject H0
The Concept of Power The probability that a hypothesis test will reject a false null hypothesis β is the probability of a Type II error 1-β is the “power” of a significance test
What Determines the Power of a Test? • The characteristics of a particular test • The sample size • The magnitude of the true effect (difference, regression coefficient, etc.) that we are trying to detect 200 feet vs. 500 feet A chickadee vs. a bluejay
Is the Coin Honest? • If the coin is honest, the probability of a head on any one flip is .5. • Do we suspect that it is loaded one direction or the other? • How dishonest do we think it is (i.e., what is the actual probability of a head if that probability is not .5)? • How big is the sample (number of flips)?
What are the probabilities of different outcomes for an honest coin? • The binomial distribution • Two parameters are the probability and the “sample size” (number of flips of the coin) • Choosing the sample size will affect our ability to make a correct decision about the coin • The bigger the sample size the better • Knowing the “alternative hypothesis” (how dishonest the coin is) can help in deciding the sample size
Other Sample Size Options Probability of Outcomes for an Honest Probabilities shown are probabilities of a more extreme outcome
What Is Our Research Hypothesis? • Dishonest? • Loaded toward heads?
What Significance Level Do We Want to Use? • How willing are we to decide that the coin is dishonest (loaded toward heads?) when it is actually honest? • Some possibilities: .10 .05 .01 .001
How Big of a Sample? • How sure do we want to be able to reject the null hypothesis if the coin is in fact loaded toward heads? • Power • Do we have any idea of how dishonest the coin is?
15 Flips • We will need 12 or more heads on our 15 flips to reject the null at .05 (one-tailed) level • According to previous chart, if coin has a true probability of heads of .67, we have a 20% of rejecting the null if our research hypothesis is that coin is loaded toward heads
Three Frequently Used Methods of Hypothesis Testing • Direct comparison to hypothesized value • t-tests using t distribution • Z-tests using normal distribution • occasionally reported as a chi square • Comparisons to a set of hypothesized values based on a model • “Goodness-of-Fit” test using chi square • Reduction in predictive error • Analysis of Variance using F-ratio