1 / 34

P Values - part 2 Samples & Populations

P Values - part 2 Samples & Populations. Robin Beaumont 2011 With much help from Professor Chris Wilds material University of Auckland. probability. Aspects of the P value. P Value. sampling. statistic. Rule. A P value is a conditional probability considering a range of outcomes.

abia
Download Presentation

P Values - part 2 Samples & Populations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. P Values - part 2Samples & Populations Robin Beaumont 2011 With much help from Professor Chris Wilds material University of Auckland

  2. probability Aspects of the P value P Value sampling statistic Rule

  3. A P value is a conditional probability considering a range of outcomes Resume Sample value P value = P(observed summary value + those more extreme |population value = x) Hypothesised population value

  4. Populations and samples Ever constant at least for your study! = Parameter estimate = statistic

  5. One sample

  6. Size matters – single samples

  7. Size matters – multiple samples

  8. We only have a rippled mirror

  9. Standard deviation - individual level Area! Wait and see But does not take into account sample size = t distribution = measure of variability 'Standard Normal distribution' Area: Defined by sample size aspect ~ df 95% 68% Total Area = 1 SD value = 2 1 0 Between + and - three standard deviations from the mean = 99.7% of area Therefore only 0.3% of area(scores) are more than 3 standard deviations ('units') away. -

  10. Sampling level -‘accuracy’ of estimate Talking about means here We can predict the accuracy of your estimate (mean) by just using the SEM formula. From a single sample = 5/√5 = 2.236 SEM = 5/√25 = 1 From: http://onlinestatbook.com/stat_sim/sampling_dist/index.html

  11. Example - Bradford Hill, (Bradford Hill, 1950 p.92) • mean systolic blood pressure for 566 males around Glasgow = 128.8 mm. Standard deviation =13.05 • Determine the ‘precision’ of this mean. • “We may conclude that our observed mean may differ from the true mean by as much as ± 2.194 (.5485 x 4) but not more than that in around 95% of samples. page 93. [edited] All possible values of POPULATION mean

  12. Sampling summary • The SEM formula allows us to: • predict the accuracy of your estimate ( i.e. the mean value of our sample) • From a single sample • Assumes Random sample

  13. Variation what have we ignored! Onto Probability now

  14. Probabilities are rel. frequencies All outcomes at any one time = 1

  15. Probability Density Function 11 The total area = 1 total 48 scores 10 9 8 7 6 Probability 5 4 Density 3 B A 2 1 0 33 37 43 47 53 57 63 67 73 77 83 87 Scores p(score<45) = area A p(score > 50) = area B Multiple outcomes at any one time P(score<45 and score >50) = Just add up the individual outcomes

  16. What happens in the past affects the present = Conditional Probability Disease X P(disease x |male) Disease X AND Male Male P(male) P(disease AND male) = P(male) x P(disease x | male) No Disease X Disease X female P(disease AND male) /P(male) = P(disease x | male) No Disease X Multiple each branch of the tree to get end value

  17. Screening Example 0.1% of the population carry a particular faulty gene. A test exists for detecting whether an individual is a carrier of the gene. In people who actually carry the gene, the test provides a positive result with probability 0.9. In people who don’t carry the gene, the test provides a positive result with probability 0.01. Let G = person carries gene P = test is positive for gene N = test is negative for gene If someone gets a positive result when tested, find the probability that they actually are a carrier of the gene. We want to find P(P) = P(G and P) + P(G' and P) = 0.0009 + 0.00999 = 0.01089 P( P | G) Errors P(P | G) ≠ P (G | p) ORDER MATTERS

  18. Survival analysis • Each years survival depends on previous ones or does it?

  19. Probability summary • All outcomes at any one time add up to 1 • Probability histogram = area under curve =1 • -> specific areas = set of outcomes • -> specific areas = ‘equal to or more extreme’ • Conditional probability – present dependent on past – ORDER MATTERS

  20. sampling Putting it all together P Value probability statistic Rule

  21. Statistics • Summary measure – SEM, Average etc • T statistic – different types, simplest: So when t = 0 means 0/anything = estimated and hypothesised population mean are equal So when t = 1 observed different same as SEM So when t = 10 observed different much greater than SEM

  22. T statistic example Serum amylase values from a random sample of 15 apparently healthy subjects. The mean = 96 SD= 35 units/100 ml. How likely would such a sample be obtained from a population of serum amylase determinations with a mean of 120. (taken from Daniel 1991 p.202 adapted) GIVEN the population value = the null hypothesis This looks like a rare occurrence? But for what

  23. 9.037 n =15 t density: s = x 96 Original units: 120 Shaded area=0.0188 0 2.656 0 -2.656 t Given that the sample was obtained from a population with a mean of 120 a sample with a T(n=15) statistic of -2.656 or 2.656 or one more extreme will occur 1.8% of the time = just under two samples per hundred on average. . . . . Given that the sample was obtained from a population with a mean of 120 a sample of 15 producing a mean of 96 (120-x where x=24) or 144 (120+x where x=24) or one more extreme will occur 1.8% of the time, that is just under two samples per hundred on average. What does the shaded area mean! Serum amylase values from a random sample of 15 apparently healthy subjects. mean =96 SD= 35 units/100 ml. How likely would such a sample be obtained from a population of serum amylase determinations with a mean of 120. (taken from Daniel 1991 p.202 adapted) But it this not a P value P value= 2 · P(t(n−1) < t| Ho is true) = 2 · [area to the left of t under a t distribution with df= n − 1]

  24. P value and probability for t statistic p value = 2 x P(t(n-1) values more extreme than t(n-1) | Ho is true) = 2 · [area to the left of t under a t distribution with n − 1 shape] A p value is a special type of probability with: Multiple outcomes + conditional upon the specified parameter value

  25. sampling Putting it all together P Value probability statistic Rule Do we need it!

  26. 9.037 n =15 t density: s = x 96 Original units: 120 Shaded area=0.0188 0 2.656 0 -2.656 t Say one in twenty 1/20 = Or 1/100 Or 1/1000 or . . . . Rules Set a level of acceptability = critical value (CV)! If our result has a P value of less than our level of acceptability. Reject the parameter value. Say 1 in 20 (i.e.CV=0.5) Given that the sample was obtained from a population with a mean (parameter value) of 120 a sample with a T(n=15) statistic of -2.656 or 2.656 or one more extreme with occur 1.8% of the time, This is less than one in twenty therefore we dismiss the possibility that our sample came from a population mean of 120 . . . . What do we replace it with?

  27. Fisher – only know and only consider the model we have i.e. The parameter we have used in our model – when we reject it we accept that any value but that one can replace it. Neyman and Pearson + Gossling Must have an alternative specified value for the parameter

  28. Power – sample size • Affect size • – indication of clinical importance: If there is an alternative - what is it – another distribution! Serum amylase values from a random sample of 15 apparently healthy subjects. mean =96 SD= 35 units/100 ml. How likely would such a sample be obtained from a population of serum amylase determinations with a mean of 120. (taken from Daniel 1991 p.202 adapted)

  29. α = the reject region = 96 = 120 Correct decisions incorrect decisions

  30. Insufficient power – never get a significant result even when effect size large Too much power get significant result with trivial effect size

  31. Life after P values • Confidence intervals • Effect size • Description / analysis • Bayesian statistics - qualitative approach by the back door! • Planning to do statistics for your dissertation? • see: My medical statistics courses: • Course 1: • www.robin-beaumont.co.uk/virtualclassroom/stats/course1.html • YouTube videos to accompany course 1: • http://www.youtube.com/playlist?list=PL9F0EBD42C0AB37D0 • Course 2: • www.robin-beaumont.co.uk/virtualclassroom/stats/course2.html • YouTube videos to accompany course 2: • http://www.youtube.com/playlist?list=PL05FC4785D24C6E68

  32. Your attitude to your data

  33. Where do they fit in!

  34. Students bloomers • The p value did not indicate much statistic significance • Given that the population comes from one population • The p value is 0.003 thus rejecting the null hypothesis and there is a statistical significance • Correlation = 0.25 (p<0.001) indicating that assuming that the data come from a bivariate normal distribution with a correlation of zero you would obtain a correlation of <0.000. There is 95% chance that the relationship among the variables is not due to chance

More Related