1 / 38

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression. John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009. 1. http://www.stat.usu.edu/~jrstevens/pcmi. Obs Flight Temp Damage 1 STS1 66 NO 2 STS9 70 NO 3 STS51B 75 NO

Download Presentation

Advanced Statistical Methods: Beyond Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Statistical Methods:Beyond Linear Regression John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

  2. Obs Flight Temp Damage 1 STS1 66 NO 2 STS9 70 NO 3 STS51B 75 NO 4 STS2 70 YES 5 STS41B 57 YES 6 STS51G 70 NO 7 STS3 69 NO 8 STS41C 63 YES 9 STS51F 81 NO 10 STS4 80 11 STS41D 70 YES 12 STS51I 76 NO 13 STS5 68 NO 14 STS41G 78 NO 15 STS51J 79 NO 16 STS6 67 NO 17 STS51A 67 NO 18 STS61A 75 YES 19 STS7 72 NO 20 STS51C 53 YES 21 STS61B 76 NO 22 STS8 73 NO 23 STS51D 67 NO 24 STS61C 58 YES What would your students know to do with these data?

  3. Two Sample t-test data: Temp by Damage t = 3.1032, df = 21, p-value = 0.005383 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 2.774344 14.047085 sample estimates: mean in group NO mean in group YES 72.12500 63.71429

  4. Does the t-test make sense here? • Traditional: • Treatment Group mean vs. Control Group mean • What is the response variable? • Temperature? [Quantitative, Continuous] • Damage? [Qualitative]

  5. Traditional Statistical Model 1 • Linear Regression: predict continuous response from [quantitative] predictors • Y=weight, X=height • Y=income, X=education level • Y=first-semester GPA, X=parent’s income • Y=temperature, X=damage (0=no, 1=yes) • Can also “control for” other [possibly categorical] factors (“covariates”): • Sex • Major • State of Origin • Number of Siblings

  6. Traditional Statistical Model 2 • Logistic Regression: predict binary response from [quantitative] predictors • Y=‘graduate within 5 years’=0 vs. Y=‘not’=1 • X=first-semester GPA • Y=0 (no damage) vs. Y=1 (damage) • X=temperature • Y=0 (survive) vs. Y=1 (death) • X=dosage (dose-response model) • Can also “control” for other factors, or “covariates” • Race, Sex • Genotype • p = P(Y=1 | relevant factors) = prob. that Y=1, given state of relevant factors

  7. Traditional Dose-Response Model • p = Probability of “death” at dose d: • Look at what affects the shape of the curve, LD50 (lethal dose for 50% efficacy), etc.

  8. “Fitting” the Dose-Response Model • Why “logistic” regression? • β0 = place-holder constant • β1 = effect of “dosage” d • To estimate parameters: • Newton-Raphson iterative process to “maximize the likelihood” of the model • Compare Y=0 (no damage) with Y=1 (damage) groups

  9. Likelihood Function (to be maximized) likelihood for obs. i multiply probabilities (independence)

  10. Estimation by IRLS • Iteratively Reweighted Least Squares • equivalent: Newton-Raphson algorithm for iteratively solving “score” equations

  11. Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 15.0429 7.3786 2.039 0.0415 * Temp -0.2322 0.1082 -2.145 0.0320 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

  12. What if the data were even “better”? • Complete separation of points • What should happen toour “slope” estimate?

  13. Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 928.9 913821.4 0.001 1 Temp -14.4 14106.7 -0.001 1

  14. Failure? • Shape of likelihood function • Large Standard Errors • Solution only in 2006 • Rather than maximizing likelihood, consider a penalty:

  15. Model fitted by Penalized ML Confidence intervals and p-values by Profile Likelihood coef se(coef) Chisq p (Intercept) 30.4129282 16.5145441 11.35235 0.0007535240 Temp -0.4832632 0.2528934 13.06178 0.0003013835

  16. Beetle Data

  17. Dose-response model • Recall simple model: • pij = Pr(Y=1 | dosage level j and genotype level i) • But – when is genotype (covariate Gi) observed?

  18. Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.657e+01 8.901e+04 -2.98e-04 1 dose -7.541e-26 1.596e+07 -4.72e-33 1 G1+ -3.386e-28 1.064e+05 -3.18e-33 1 G2B -1.344e-14 1.092e+05 -1.23e-19 1 G2H -3.349e-28 1.095e+05 -3.06e-33 1 dose:G1+ 7.541e-26 1.596e+07 4.72e-33 1 dose:G2B 3.984e-12 3.075e+07 1.30e-19 1 dose:G2H 7.754e-26 2.760e+07 2.81e-33 1 G1+:G2B 1.344e-14 1.465e+05 9.17e-20 1 G1+:G2H 3.395e-28 1.327e+05 2.56e-33 1 dose:G1+:G2B -3.984e-12 3.098e+07 -1.29e-19 1 dose:G1+:G2H -7.756e-26 2.763e+07 -2.81e-33 1 Before we “fix” this, first a little detour …

  19. A Multivariate Gaussian Mixture Component j isMVN(μj,Σj) with proportion πj

  20. The Maximum Likelihood Approach

  21. A Possible Work-Around • Keys here: • the true group memberships Δ are unknown (latent) • statisticians specialize in unknown quantities

  22. A reasonable approach • 1. Randomly assign group memberships Δ, and estimate group means μj , covariance matrices Σj , and mixing proportions πj • 2. Given those values, calculate (for each obs.) ξj = E[Δj|θ] = P(obs. in group j) • 3. Update estimates for μj , Σj , and πj , weighting each observation by these ξ: • 4. Repeat steps 2 and 3 to convergence

  23. Plotting character and color indicate most likely component

  24. The EM (Baum-Welch) Algorithm- maximization made easier with Zm = latent (unobserved) data; T = (Z,Zm) = complete data • Start with initial guesses for parameters • Expectation: At the kth iteration, compute • Maximization: Obtain estimateby maximizing over • Iterate steps 2 and 3 to convergence ($?)

  25. Beetle Data – Notation • Observed values • Unobserved (latent) values • If Nij had been observed: • How Nij can be [latently] considered:

  26. Likelihood Function • Parameters θ=(p,P) and complete data T=(n,N) • After simplification: • Mechanism of missing data suggests EM algorithm

  27. Missing at Random (MAR) • Necessary assumption for usual EM applications • Covariate x is MAR if probability of observing x does not depend on x or any other unobserved covariate, but may depend on response and other observed covariates (Ibrahim 1990) • Here – genotype is observed only for survivors, and for all subjects at zero dosage

  28. Initialization Step • Two classes of marginal information here • For all dosage levels j – observe • At zero dosage level – observe for genotype i • Allows estimate of Pi • Consider marginal distn. of missing categorical covariate (genotype) • Using zero dosage level: • This is the key – the marginal distribution of the missing categorical covariate

  29. Expectation Step • Dropping “constants” and : • Need to evaluate: (*)

  30. Expectation Step • Bayes Formula: • Multinomial  (*)

  31. Expectation Step • For : • Not needed for maximization – only affects EM convergence rate • Direct calculation from multinomial distn. is “possible” – but computationally prohibitive • Need to employ some approximation strategy • Second-order Taylor series about , using Binet’s formula (*)

  32. Expectation Step • Consider Binet’s formula (like Stirling’s): • Have: • Use a second-order Taylor series approximation taken about • as a function of : (*)

  33. Maximization Step • Portion of related to : • Portion of related to : by Lagrange multipliers by Newton-Raphson iterations, with some parameterization (*)

  34. Convergence

  35. Dose Response Curves (log scale)

  36. EM Results test statistic for H0: no dosage effect separation of points …

  37. Topics Used Here • Calculus • Differentiation & Integration (including vector differentiation) • Lagrange Multipliers • Taylor Series Expansions • Linear Algebra • Determinants & Eigenvalues • Inverting [computationally/nearly singular] Matrices • Positive Definiteness • Probability • Distributions: Multivariate Normal, Binomial, Multinomial • Bayes Formula • Statistics • Logistic Regression • Separation of Points • [Penalized] Likelihood Maximization • EM Algorithm • Biology – a little time and communication

More Related