1 / 74

Testing hypotheses using model selection

Testing hypotheses using model selection. Eric D. Stolen InoMedic Health Applications, Ecological Program, Kennedy Space Center, Florida NASA Environmental Management Branch. We h ve inv st d a l t of t m nd eff rt in cr at ng R, pl s c te it wh n us ng it f r d t n lys s. .

margo
Download Presentation

Testing hypotheses using model selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Testing hypotheses using model selection Eric D. Stolen InoMedic Health Applications, Ecological Program, Kennedy Space Center, Florida NASA Environmental Management Branch

  2. We hve inv st d a l t of t m ndeffrt in cratng R, pls cte it whn usng it fr dtnlyss.

  3. We have invested a lot of timeand effort in creating R, please cite it when using it for dataanalysis.

  4. “The human understanding, once it has adopted an opinion, collects any instances that confirm it, and though the contrary instances may be more numerous and more weighty, it either does not notice them or else rejects them, in order that this opinion will remain unshaken.” - Francis Bacon (1620)

  5. Outline Science issues The method of multiple working hypotheses Statistical models as science tools Making inference in science Information-theoretic model selection Multi-model inference

  6. Science What is it?

  7. Science is the organized process of creating testable explanations of how the natural world works.

  8. Understanding Theory Hypothesis

  9. Hypothetico-deductive model • Generate hypothesis (from theory) • Make a prediction from the hypothesis • Conduct experiment to test prediction • Decide whether or not the theory is supported

  10. Hypothetico-deductive model • Taught in Primary through graduate-school education • Not the way science is done in many fields • Modern science is largely inductive

  11. Null hypothesis testing H0: No effect HA: Effect of interest Probability{ data | H0 } Is this what we want to know?

  12. Null hypothesis testing Karl Pearson (1857 – 1936) Jerzy Neyman (1894 – 1981) R. A. Fisher (1890 – 1962) Known as the frequentist approach Not what Fisher, Neyman nor Pearson intended!

  13. Oops (c) Ian Britton - FreeFoto.com

  14. NHT problems • Some problems: • Silly nulls • Slow progress • Many systems not amenable • Inference dependent upon the sample space • Fosters unthinking approaches

  15. an alternative Probability{ HA | data }

  16. Multiple working hypotheses • Thomas C. Chamberlin (1843-1928) • Geologist • President University of Wisconsin • Director Walker Museum and Chair Dept. of Geology at the University of Chicago • President of the American Association for the Advancement of Science Chamberlin, T. C. 1890. The method of multiple working hypotheses. Science 15:92-96 (reprinted 1965, Science 148:754-759

  17. Reality Theory Data Alternative Hypotheses

  18. Wading bird group foraging behavior

  19. Multiple working hypotheses Wading bird group foraging H1: No effect H2: Group effect same for all species H3: Group effect differs by species H4: (Group by species) + prey density H5: Group + prey density • H6: (Group by species) + prey + habitat

  20. Mathematical models in science “Nature's great book is written in mathematics.” - Galileo Galilei

  21. Mathematical models in science Empirical Models Mechanistic Models Ecology Chemistry in 19th Century Climatology Physics Modern Chemistry Molecular biology

  22. Generalized Linear Model • Three parts • Probability distribution (error) Y i ~ N(i, 2) • Link function E(Y i) = i • linear equation i= n(xi1, xi2, xi3, …xiq)

  23. Generalized Linear Model Y = b0 + b1X1 + b2X2 + e • Linear regression and ANOVA • Link function – Identity link • linear equation • error distribution – Normal Distribution (Gaussian)

  24. Generalized Linear Model Logit(p) = b0 + b1X1 + b2X2 + e • Logistic Regression • Link function - Logit link: ln(p / (1-p)) • linear equation • error distribution – Binomial Distribution

  25. Maximum likelihood estimnation • R. A. Fisher (1980-1962) • The parameter estimates that are most likely, given the data and the model • Example • Receive a cookie from the cafeteria 11 days • Observe 7 chocolate chip and 4 oatmeal raisin • What is the best estimate of p = proportion chocolate chip (given the observed data)

  26. Maximum likelihood estimnation “CC” “CC” “OR” “CC” “CC” “OR” “OR” “CC” “OR” “CC” “CC”

  27. Maximum likelihood estimnation “CC” “CC” “OR” “CC” “CC” “OR” “OR” “CC” “OR” “CC” “CC”

  28. Proportion Chocolate Chip

  29. Proportion Chocolate Chip

  30. Proportion Chocolate Chip

  31. Proportion Chocolate Chip

  32. Proportion Chocolate Chip

  33. Proportion Chocolate Chip

  34. Multiple working hypotheses Wading bird group foraging H1: No effect H2: Group effect same for all species H3: Group effect differs by species H4: (Group by species) + prey density H5: Group + prey density • H6: (Group by species) + prey + habitat

  35. Multiple working hypotheses Wading bird group foraging H1: Foraging rate = b0 + e H2: Group effect same for all species H3: Group effect differs by species H4: (Group by species) + prey density H5: Group + prey density • H6: (Group by species) + prey + habitat

  36. Multiple working hypotheses Wading bird group foraging H1: No effect H2: FR = b0 + Group * b1 + e H3: Group effect differs by species H4: (Group by species) + prey density H5: Group + prey density • H6: (Group by species) + prey + habitat

  37. Approaches to science Observational Study Experimental Study Strength of Inference

  38. Experimental study What is the effect of a particular treatment (or series of treatments) on a particular aspect of the system

  39. Experimental study Treatments: A, B, C, D Replicates: 1,2,3,…,n A B C D control 1,4,5, 38,62, 99 10,15, 41,44, 88 7,22,21,54,67, 81 6,29,33,61,77, 79 11,12, 69,74, 91,92

  40. Experimental study Treatments: A, B, C, D Replicates: 1,2,3,…,n A B C D control Randomization 1,4,5, 38,62, 99 10,15, 41,44, 88 7,22,21,54,67, 81 6,29,33,61,77, 79 11,12, 69,74, 91,92

  41. Observational study Treatments: A, B, C, D Replicates: 1,2,3,…,n A B C D control Bias 1,4,5, 38,62, 99 10,15, 41,44, 88 7,22,21,54,67, 81 6,29,33,61,77, 79 11,12, 69,74, 91,92

  42. Approaches to science Observational Study Confirmatory Study Experimental Study Strength of Inference

  43. Confirmatory study Make predictions a priori Design collection of observational data including as much replication and control as possible Weakness is still lack of randomization (not assigning treatment)

  44. Summary so far Science is a process to postulate and refine reliable descriptions (explanations) of reality The method of multiple working hypotheses is a particularly useful science tool Mathematics is the language of science Experiments are golden, confirmatory studies are helpful

  45. Next… Statistical model selection theory Information-theoretic tools R Model selection in practice Multi-model inference

  46. Precision-Bias Trade-off Y = b0 + b1X1 + b2X2 + e Bias 2 Model Complexity – increasing number of Parameters

  47. Precision-Bias Trade-off Y = b0 + b1X1 + b2X2 + e variance Model Complexity – increasing number of Parameters

  48. Precision-Bias Trade-off Y = b0 + b1X1 + b2X2 + e variance Bias 2 Model Complexity – increasing number of Parameters

  49. Kullbeck-Leibler information (1907-1994) (1914-2003) Kullback, S., and R. A. Leibler. 1951. On Information and Sufficiency The Annals of Mathematical Statistics 22:79-86

  50. Kullback-Leibler information divergence Full Truth G1 (best model in set) G2 G3

More Related