1 / 14

Why multiple tests are a problem?

Why multiple tests are a problem?. Rafael A. Irizarry. Other names. Multiple comparisons Data snooping Others?. References. H. Scheffe (1953), “ A method for judging all contrasts in the analysis of variance”, Biometrika 40:87-104

beatrice
Download Presentation

Why multiple tests are a problem?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Why multiple tests are a problem? Rafael A. Irizarry

  2. Other names • Multiple comparisons • Data snooping • Others?

  3. References • H. Scheffe (1953), “A method for judging all contrasts in the analysis of variance”, Biometrika 40:87-104 • D.B. Duncan (1965), “A Bayesian Approach to multiple comparisons” Technometrics 7:171-222. • J.W. Tukey (1953), “The problem on multiple comparisons” reprinted in CWJWT Vol. VIII (1994) • R.G. Miller, Simultaneous Statistical nference, 2nd ed. (Springer 1981)

  4. Thanks to Yoav Benjamini Benjamini and Hochberg (1995) “Controlling the false discovery rate: a practical and powerful approach to multiple testing”. JR Stat. Soc. Ser. B

  5. Example E. Giovannucci, A. Ascherio, E. Rimm, M. Stampfer, G. Coldizt, W. Willett: ‘‘Intake of Carotenoids and Retinol in Relation to Risk of Prostate Cancer’’, Journal of the National Cancer Insitute 87(23):1767--1776 (6 Dec 1995).

  6. ‘‘Using responses to a validated, semiquantitative food Frequency questionnaire mailed to participants in the Health Professionals Follow-up Study in 1986, we assessed dietary intake for a 1-year period for a cohort of 47,894 eligible subjects initially free of diagnosed cancer....We calculate the relative risk (RR) for each of the upper categories of intake of a specific food or nutrient by dividing the incidence of prostate cancer among men in each of these categories by the rate among men in the lowest intake level....

  7. ‘‘Of 46 vegetables and fruits or related products, four were significantly associated with lower prostate cancer risk; of the four --- tomato sauce (P for trend = 0.001), tomatoes (P for trend = 0.03), and pizza (P for trend = 0.05), but not strawberries --- were primary sources of lycopene.’’

  8. BUT the Methods section one page later states:‘‘For each of 131 food and beverage items listed ...’’And the (presumably strongest) carotenoids and p-valuesare listed in Table 2 (p.1770):Tomato sauce Tomatoes Tomato juice Pizza 0.001 0.03 0.67 0.05‘‘Our findings ... suggest that tomato-based foods may beespecially beneficial regarding prostate cancer risk.’’

  9. What is a p-value again? When nothing protects, we expect 131 x 0.05  7 foods/nutrients to have p-values < 0.05

  10. Microarrays When no genes are changing between two groups we expect 20,000 x 0.01 = 200 genes to have p-value < 0.01 However, false positives are not as bad as in other fields

  11. What can we do? • p-values no longer mean what they used to… no argument • Histogram of p-values is useful plot • What can we do… lots of argument

  12. Multiple Hypothesis Testing Null = Equivalent Expression; Alternative = Differential Expression

  13. Error Rates • Per comparison error rate (PCER): the expected value of the number of Type I errors over the number of hypotheses PCER = E(V)/m • Per family error rate (PFER): the expected number of Type I errors PFER = E(V) • Family-wise error rate: the probability of at least one Type I error FEWR = Pr(V ≥ 1) • False discovery rate (FDR) rate that false discoveries occur FDR = E(V/R; R>0) = E(V/R | R>0)Pr(R>0) • Positive false discovery rate (pFDR): rate that discoveries are false pFDR = E(V/R | R>0) • Many others

  14. Conclusions • Lets do a multiple comparison of the different beers sold by the IF

More Related