Statistics Workshop Principles of Hypothesis Testing J-term 2009 Bert Kritzer

Statistics Workshop Principles of Hypothesis TestingJ-term 2009Bert Kritzer

Statistical Inference • Inference about populations from samples • Inference about underlying processes • Could the observed pattern been generated by a random process? • Inference about systematic vs. random (“stochastic”) components Observation = Systematic + Random • Sampling • Process • Observed statistics as random variables

Statistics As Random Variables μ = 0.564 mean of means = 0.566

Parameters, Statistics, and Estimators

Hypothesis Testing A procedure for drawing conclusions about characteristics in a population or about a process: • Is it reasonable to conclude that there is a relationship between two variables? • Is it reasonable to conclude that a populaton parameter exceeds some value? • Is it reasonable to conclude that the population parameter differs among two or more groups?

Hypothesis TestingResearch Hypotheses Are African-Americans more likely to be stopped by the police? Is the average number of stops for African-Americans different than that for Whites?

Hypothesis TestingNull Hypotheses Are African-Americans more likely to be stopped by the police? Is the average number of stops greater for African-Americans than for Whites?

Why the Null Hypothesis? • Can state the null hypothesis with precision, and that allows us to compute the probability of observing a particular result if the null hypothesis is true • Logically it is easier to ascertain what is untrue than what is true. If we can dichotomize the possibilities A and B, and then determine that A is not true, it must be the case that B is true.

Hypothesis Testing andLegal Decisionmaking H0: Innocent HA: Guilty Legal Decisionmaking: Pr(Guilty|Evidence) Hypothesis testing: Pr(Evidence|Innocent)

Substantive vs. Statistical Significance • How big of a difference could be explained by random processes such as sampling? • Depends on sample size and characteristics of the underlying distribution • How big of a difference is enough that we should care about it? • Normative/policy question • Depends on “costs” associated with differences • Statistical significance refers only to the first of these

“Significance Level” • How careful do you want to be in reaching a conclusion about your hypothesis? • Process will ask whether your null hypothesis can be rejected • What probability that you incorrectly rejected the null hypothesis are you willing to accept? • This probability is the significance level

Is the Coin Honest?H0: An Honest Coin

Directionality • Is the research hypothesis directional? • Do you think that the salaries of men are greater on average than the salaries of women? • Do you think that the salaries paid to African-Americans differ from the salaries paid to Whites? • Is the coin loaded vs. is the coin loaded toward heads? • “One-tailed” (directional) vs. “Two-tailed” (nondirectional) hypotheses • If directional hypothesis is correct, you are more likely to reject null hypothesis with one-tailed test

TYPE I (α error): Rejecting a Null Hypothesis that is in fact true Set by the “significance level” TYPE II (β error): Failing to Reject a Null Hypothesis that is in fact false Depends on the significance level, the sample size, and how wrong the null is Types of Error

Steps in Hypothesis Testing • State research (“alternative”) hypothesis HA • State null hypothesis H0 • Set decision rule • “level of significance” Pr(α error) • Directional or nondirectional (“one-tailed” or “two-tailed”) based on research hypothesis • Obtain data (set sample size) • Compute test statistic and get probability of observing it under H0 (“p-value”) • Make decision whether to reject H0

Hypothesis Testing Example

The Concept of Power The probability that a hypothesis test will reject a false null hypothesis β is the probability of a Type II error 1-β is the “power” of a significance test

What Determines the Power of a Test? • The characteristics of a particular test • The sample size • The magnitude of the true effect (difference, regression coefficient, etc.) that we are trying to detect 200 feet vs. 500 feet A chickadee vs. a bluejay

Power Curves

Power Curves-One Tailed

Is the Coin Honest? • If the coin is honest, the probability of a head on any one flip is .5. • Do we suspect that it is loaded one direction or the other? • How dishonest do we think it is (i.e., what is the actual probability of a head if that probability is not .5)? • How big is the sample (number of flips)?

What are the probabilities of different outcomes for an honest coin? • The binomial distribution • Two parameters are the probability and the “sample size” (number of flips of the coin) • Choosing the sample size will affect our ability to make a correct decision about the coin • The bigger the sample size the better • Knowing the “alternative hypothesis” (how dishonest the coin is) can help in deciding the sample size

Sample Size of 12H0: An Honest Coin

Other Sample Size Options Probability of Outcomes for an Honest Probabilities shown are probabilities of a more extreme outcome

What Is Our Research Hypothesis? • Dishonest? • Loaded toward heads?

What Significance Level Do We Want to Use? • How willing are we to decide that the coin is dishonest (loaded toward heads?) when it is actually honest? • Some possibilities: .10 .05 .01 .001

How Big of a Sample? • How sure do we want to be able to reject the null hypothesis if the coin is in fact loaded toward heads? • Power • Do we have any idea of how dishonest the coin is?

Power Curves(.05 one-tailed)

15 Flips • We will need 12 or more heads on our 15 flips to reject the null at .05 (one-tailed) level • According to previous chart, if coin has a true probability of heads of .67, we have a 20% of rejecting the null if our research hypothesis is that coin is loaded toward heads

Three Frequently Used Methods of Hypothesis Testing • Direct comparison to hypothesized value • t-tests using t distribution • Z-tests using normal distribution • occasionally reported as a chi square • Comparisons to a set of hypothesized values based on a model • “Goodness-of-Fit” test using chi square • Reduction in predictive error • Analysis of Variance using F-ratio

Statistics Workshop Principles of Hypothesis Testing J-term 2009 Bert Kritzer