1 / 57

Exact Logistic Regression

Exact Logistic Regression. Larry Cook. Outline. Review the logistic regression model Explore an example where model assumptions fail Brief algebraic interlude Explore an example with a different issue where logistic regression fails Computational considerations Example SAS code.

kamana
Download Presentation

Exact Logistic Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exact Logistic Regression Larry Cook

  2. Outline • Review the logistic regression model • Explore an example where model assumptions fail • Brief algebraic interlude • Explore an example with a different issue where logistic regression fails • Computational considerations • Example SAS code

  3. Logistic Regression • Model a binary outcome, Y, with one or more predictors • Success/failure • Disease/not disease • Model outcome in terms of the log odds of a success • log(odds of Yi) = a + bxi + e

  4. Why Log Odds? • Canonical link function • Makes a binary outcome continuous • Solves this problem • Probability is constrained to [0,1] • Odds are constrained to [0, ∞) • Log odds are in (-∞, ∞) • Exponentiating coefficients gives us estimates of odds ratios

  5. Example: Motor Vehicle Crash Fatalities • What are odds of being hospitalized or killed in a motor vehicle crash for drivers using safety restraints vs. those that are not? • Outcome: Hospitalized/killed or not • Covariate: safety belt use

  6. Hospital/Killed * Restraint Use OR = 0.22, p-value < 0.001

  7. Example: Motor Vehicle Crash Fatalities • What are odds of being hospitalized or killed in a motor vehicle crash for drivers using safety restraints vs. those that are not? • Outcome: Hospitalized/killed or not • Covariate: safety belt use gender, age, alcohol, rural area

  8. Logistic Regression Output

  9. Assumptions • Conditional probabilities follow a logistic function of the independent variables • Observations are independent • Asymptotics • Sample size is large enough • Minimum of 50 to 100 observations • 10 successes/failures per variable

  10. Corneal Graft Rejections • What if studying a rare disease? • Data for eight kids in young age group and eight in the older age group • Hypothesis is that rejection is more likely in older children

  11. Graft Rejections OR = 21, p-value = 0.012, 100% of cell have expected counts < 5!!! Fisher’s Exact Test p-value (2-sided) = 0.0406; (1-sided) = 0.0203

  12. Let’s Tackle the Graft Rejection Example as Logistic Regression

  13. Graft Rejections Sample Size << 50! Don’t have 10 success or 10 failures!

  14. Exact (Conditional)Logistic Regression • Rather than using the unconditional logistic regression, we will condition on nuisance parameters • Use conditional maximum likelihood for estimation and inference

  15. Warning Algebra Ahead Proceed with Caution

  16. Logistic Model

  17. Likelihood of a Sample

  18. Sufficient Statistics

  19. Conditioning • If we are only trying to describe the relationship between rejection and age, do we care about the value of the intercept? • Remove the intercept, a, out of the likelihood by conditioning on its sufficient statistic, t0 = Syi. • Let S(to) = Set of all tables with Syi = t0and observed sample sizes

  20. Conditional Likelihood

  21. Estimation

  22. Inference

  23. End of Algebra Back to Example

  24. Graft Rejections Sufficient Statistics t0 = Syi = # of rejections = 7 t1 = Sxiyi = 0*# of rejections in young + 1*# of rejections in old = 0*1 + 1*6 = 6

  25. Conditional Distribution for Graft Rejection • Need to calculate all possible tables that have exactly 7 rejections • Calculate how often each of the tables occur • Calculate CMLE • Calculate how rare our table is to obtain p-value

  26. Reference Set

  27. Estimate b and Find a p-value

  28. Estimate and p-value

  29. Confidence Interval • Lower Bound, b- • If t1 = t1,min • b- = -∞ • Otherwise • b- is the value of b that produces an upper p-value of a/2 • Upper Bound, b+ • If t1 = t1,max • b+ = ∞ • Otherwise • b+ is the value of b that produces a lower p-value of a/2

  30. Final Stats for Graft Rejection

  31. Example 2 PECARN C-Spine Study

  32. Case Control Study Any problems estimating the odds ratio? Could exact logistic regression help?

  33. What sufficient statisticsare needed? • Sy = 2 • Sxy = 0

  34. Conditional Density One-sided p-value = 0.438 Two-sided p-value = 2*0.438 = 0.876 95% confidence interval (-∞, 2.345) Point estimate?

  35. Median Unbiased Estimate

  36. One More Example Dose Response

  37. Toxicology Experiment • 400 mice randomized to one of four levels of a drug • Drug administered to each animal • Outcome is the number of deaths in each dose level Sy = 19 Sxy = 3 + 10 + 30 = 43

  38. Exact vs. Unconditional Exact Unconditional Estimate = 0.712 SE = 0.246 OR = 2.04 CI = (1.26, 3.30) p-value = 0.004 • Estimate = 0.710 • SE = 0.246 • OR = 2.03 • CI = (1.26, 3.52) • p-value = 0.002

  39. Computational Issues

  40. Counting All the Tables • One of the main hurdles for conditional logistic regression is counting all the tables in the sample space • Graft rejections – 11,440 possibilities • PECARN C-Spine - 1,277,601 • Toxicology – 2.79 x 1033 • Obviously don’t want to generate tables one at a time

  41. Network Algorithm • Graphical representation of the sample space • Nodes represent a partial sum of the sufficient statistic • Arcs have combinatorial weighting value • One path through the graph represents a table in the sample space

  42. Example Sufficient Statistics t0 = Syi = 4 t1 = Sxiyi = 1*0 + 2*1 + 3*1 + 4*2 = 13

  43. Network Representationof the Sample Space

  44. What About Multiple Covariates? More Conditioning!

  45. Osteogtenic SarcomaLogXact Manual • 46 patients surgically treated for osteogenic sarcoma and then observed for disease recurrence within 3 years • Covariates • Sex: Male = 1, Female = 0 • Any Ostoid Pathology (AOP) • Present = 1, not = 0 • Interested in the effect of AOP

  46. Osteogtenic Sarcoma

  47. Estimating the Effect of AOP • New statistics to condition • Group sizes • Sufficient statistic for intercept, Sy = 17 • Sufficient statistic for coefficient for sex, Sx1y = 15 • Calculate the conditional distribution of Sx2y • Sufficient statistic for coefficient for AOP • Number of cases with AOP in recurrence (=13) • Given exactly 17 with recurrence 15 of which are males

More Related