Why multiple tests are a problem?

Why multiple tests are a problem? Rafael A. Irizarry

Other names • Multiple comparisons • Data snooping • Others?

References • H. Scheffe (1953), “A method for judging all contrasts in the analysis of variance”, Biometrika 40:87-104 • D.B. Duncan (1965), “A Bayesian Approach to multiple comparisons” Technometrics 7:171-222. • J.W. Tukey (1953), “The problem on multiple comparisons” reprinted in CWJWT Vol. VIII (1994) • R.G. Miller, Simultaneous Statistical nference, 2nd ed. (Springer 1981)

Thanks to Yoav Benjamini Benjamini and Hochberg (1995) “Controlling the false discovery rate: a practical and powerful approach to multiple testing”. JR Stat. Soc. Ser. B

Example E. Giovannucci, A. Ascherio, E. Rimm, M. Stampfer, G. Coldizt, W. Willett: ‘‘Intake of Carotenoids and Retinol in Relation to Risk of Prostate Cancer’’, Journal of the National Cancer Insitute 87(23):1767--1776 (6 Dec 1995).

‘‘Using responses to a validated, semiquantitative food Frequency questionnaire mailed to participants in the Health Professionals Follow-up Study in 1986, we assessed dietary intake for a 1-year period for a cohort of 47,894 eligible subjects initially free of diagnosed cancer....We calculate the relative risk (RR) for each of the upper categories of intake of a specific food or nutrient by dividing the incidence of prostate cancer among men in each of these categories by the rate among men in the lowest intake level....

‘‘Of 46 vegetables and fruits or related products, four were significantly associated with lower prostate cancer risk; of the four --- tomato sauce (P for trend = 0.001), tomatoes (P for trend = 0.03), and pizza (P for trend = 0.05), but not strawberries --- were primary sources of lycopene.’’

BUT the Methods section one page later states:‘‘For each of 131 food and beverage items listed ...’’And the (presumably strongest) carotenoids and p-valuesare listed in Table 2 (p.1770):Tomato sauce Tomatoes Tomato juice Pizza 0.001 0.03 0.67 0.05‘‘Our findings ... suggest that tomato-based foods may beespecially beneficial regarding prostate cancer risk.’’

What is a p-value again? When nothing protects, we expect 131 x 0.05  7 foods/nutrients to have p-values < 0.05

Microarrays When no genes are changing between two groups we expect 20,000 x 0.01 = 200 genes to have p-value < 0.01 However, false positives are not as bad as in other fields

What can we do? • p-values no longer mean what they used to… no argument • Histogram of p-values is useful plot • What can we do… lots of argument

Multiple Hypothesis Testing Null = Equivalent Expression; Alternative = Differential Expression

Error Rates • Per comparison error rate (PCER): the expected value of the number of Type I errors over the number of hypotheses PCER = E(V)/m • Per family error rate (PFER): the expected number of Type I errors PFER = E(V) • Family-wise error rate: the probability of at least one Type I error FEWR = Pr(V ≥ 1) • False discovery rate (FDR) rate that false discoveries occur FDR = E(V/R; R>0) = E(V/R | R>0)Pr(R>0) • Positive false discovery rate (pFDR): rate that discoveries are false pFDR = E(V/R | R>0) • Many others

Conclusions • Lets do a multiple comparison of the different beers sold by the IF

Why multiple tests are a problem?

Why multiple tests are a problem?

Presentation Transcript

Psychological Assessment

Tests and monitoring in HIV infection

GCF and LCM Problem Solving

Non - Traditional Intelligence tests

Spirometry and Related Tests

PULMONARY FUNCTION TESTS

6-3 Multiple Regression

Multiple Access Techniques for Wireless Communication

Multiple Myeloma

Measures of Academic Progress

Multiple sequence alignment Tuesday, Feb 8 2011

PROBLEM SET 5: MORTGAGE-EQUITY VALUATION

D- ENDOCRINE FUNCTION TESTS

Statistics Review – Part I

Multiple Tests, Multivariable Decision Rules, and Prognostic Tests

MULTIPLE INTEGRALS

Regression Analysis and Multiple Regression

ENDOCRiNE FUNCTION TESTS

Programming with GUTs