1 / 48

SAS ® Global Forum 2014

SAS ® Global Forum 2014. Got Randomness? SAS TM for Mixed and Generalized Linear Mixed Models. March 23-26 Washington, DC. David A. Dickey NC State University. TM SAS and its products are the registered trademarks of SAS Institute, Cary, NC. Data: Challenger (Binomial with

oswald
Download Presentation

SAS ® Global Forum 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAS® Global Forum 2014 Got Randomness? SASTM for Mixed and Generalized Linear Mixed Models March 23-26 Washington, DC David A. Dickey NC State University TM SAS and its products are the registered trademarks of SAS Institute, Cary, NC

  2. Data: Challenger (Binomial with random effects) Data: Ships (Poisson with offset) Data: Crab mating patterns (X rated) (Poisson Regression, ZIP model, Negative Binomial) Data: Typists (Poisson with random effects) Data: Flu samples (Binomial with random effects)

  3. “Generalized”  non normal distribution Binary for probabilities: Y=0 or 1 Mean E{Y}=p Variance p(1-p) Pr{Y=j}= pj(1-p)(1-j) Link: L=ln(p/(1-p)) = “Logit” Range (over all L): 0<p<1 Poisson for counts: Y in {0,1,2,3,4, ….} Mean count l Variance l Pr{Y=j} = exp(- l )(lj)/(j!) Link: L = log(l) Range (over all L): l >0 2/3 1/3 Like Dislike Pr{ Like }=2/3 l=0.4055 2/3 .055 .27 .007 .001 0 1 2 3 4 5

  4. Mixed (not generalized) Models: Fixed Effects and Random Effects

  5. Fixed? or Random? Replication : Same levels Different levels Inference for:Only ObservedPopulation of LevelsLevels Levels : Picked onPicked at Purpose Random Inference on:MeansVariances Example: Only These All Doctors Drugs All Clinics Example: Only These All Farms Fertilizers All Fields

  6. Generalized (not mixed) linear models. Use link L = g(E{Y}), e.g. ln(p/(1-p)) =ln(E{Y}/(1-E{Y}) Assume L is linear model in the inputs with fixed effects. Estimate model for L, e.g. L=g(E{Y})=bo + b1 X Use maximum likelihood Example: L = -1 + .18*dose Dose = 10, L=0.8, p=exp(0.8)/(1+exp(0.8))= “inverse link” = 0.86

  7. Challenger was mission 24 From 23 previous launches we have: 6 O-rings per mission Y=0 no damage, Y=1 erosion or blowby p = Pr {Y=1} = f{mission, launch temperature) Features: Random mission effects Logistic link for p procglimmix data=O_ring; class mission; model fail = temp/dist=binomial s; random mission; run; • Generalized • Mixed

  8. Estimated G matrix is not positive definite. Covariance Parameter Estimates Cov Standard Parm Estimate Error mission 2.25E-18 . Solutions for Fixed Effects Effect Estimate Error DF t Value Pr > |t| Intercept 5.0850 3.0525 21 1.67 0.1106 temp -0.1156 0.04702 115 -2.46 0.0154 We “hit the boundary”

  9. Likelihood

  10. Just logistic regression – no mission variance component

  11. Flu Data CDC Active Flu Virus Weekly Data % positive data FLU; input fluseasn year t week pos specimens; pct_pos=100*pos/specimens; logit=log(pct_pos/100/(1+(pct_pos/100))); label pos = "# positive specimens"; label pct_pos="% positive specimens"; label t = "Week into flu season (first = week 40)"; label week = "Actual week of year"; label fluseasn = "Year flu season started"; Empirical Logit % positive

  12. (1) GLM all effects fixed (harmonic main effects insignificant) “Sinusoids” S(j) = sin(2pjt/52) C(j)=cos(2pjt/52) PROCGLM DATA=FLU; class fluseasn; model logit = s1 c1 fluseasn*s1 fluseasn*c1 fluseasn*s2 fluseasn*c2 fluseasn*s3 fluseasn*c3 fluseasn*s4 fluseasn*c4; output out=out1 p=p; data out1; set out1; P_hat = exp(p)/(1+exp(p)); label P_hat = "Pr{pos. sample} (est.)"; run;

  13. (2) MIXED analysis on logits Random harmonics. Normality assumed Toeplitz(1) PROCMIXED DATA=FLU method=ml; ** reduced model; class fluseasn; model logit = s1 c1 /outp=outpoutpm=outpmddfm=kr; random intercept/subject=fluseasn; random s1 c1/subject=fluseasn type=toep(1); random s2 c2/subject=fluseasn type=toep(1); random s3 c3/subject=fluseasn type=toep(1); random s4 c4/subject=fluseasn type=toep(1); run;

  14. (3) GLIMMIX analysis Random harmonics. Binomial assumed (overdispersed– lab effects?) PROCGLIMMIX DATA=FLU; title2 "GLIMMIX Analysis"; class fluseasn; model pos/specimens = s1 c1 ; * s2 c2 s3 c3 s4 c4; random intercept/subject=fluseasn; random s1 c1/subject=fluseasn type=toep(1); random s2 c2/subject=fluseasn; ** Toep(1) - no converge; random s3 c3/subject=fluseasn type=toep(1); random s4 c4/subject=fluseasn type=toep(1); random _residual_; covtestglm; output out=out2 pred(ilinkblup)=pblup pred(ilinknoblup)=overall pearson= p_resid; run;

  15. output out=out2 pred(ilinkblup)=pblup pred(ilinknoblup)= overall pearson= p_resid; run; Pearson Residuals: Used to check fit when using default (pseudo-likelihood) Variance should be near 1 procmeansmeanvar; varp_resid; run; Without random _residual_; variance 3.63.  With random _residual_; variance 0.83.  --------------------------------------------------------------- Fit Statistics -2 Res Log Pseudo-Likelihood 341.46 Generalized Chi-Square 1707.29 Gener. Chi-Square / DF 4.59 ------------------------------------------------------------------------- Or… use method=quad

  16. Type III Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F S1 1 8 34.93 0.0004 c1 1 8 25.49 0.0010 Tests of Covariance Parameters Based on the Residual Pseudo-Likelihood Label DF -2 Res Log P-LikeChiSq Pr > ChiSqNote Independence 6 1312.34 970.88 <.0001 MI MI: P-value based on a mixture of chi-squares Output due to covtestglm;  random _residual_ does not affect the fit (just standard errors)

  17. Could try 2 parameter Beta distribution instead: PROCGLIMMIX DATA=FLU; title2 "GLIMMIX Analysis"; class fluseasn; model f = s1 c1 /dist=beta link=logit s; random intercept/subject=fluseasn; random s1 c1/subject=fluseasn type=toep(1); random s2 c2/subject=fluseasn type=toep(1); random s3 c3/subject=fluseasn type=toep(1); random s4 c4/subject=fluseasn type=toep(1); output out=out3 pred(ilinkblup)=pblup pred(ilinknoblup)=overall pearson=p_residbeta; run;

  18. Binomial Assumption Without BLUPS and with BLUPS

  19. Beta Assumption Without BLUPS and with BLUPS

  20. Poisson Example: Wave induced damage incidents in 40 ships (ship groups) Variables: Factorial Effects (fixed, classificatory): Ship Type 5 levels A,B,C,D,E Year Constructed 4 levels Years of Operation 2 levels Covariate (“offset” - continuous) = Time in service (“Aggregate months”) Incidents (dependent, counts) Source” McCullough & Nelder (but I ignore cases where year constructed > period of operation)

  21. procglimmix data=ships; Title "Ignoring Ship Variance"; class operation construct shiptype; model incidents = operation construct shiptype/ dist=poisson s offset=log_service; run; ln(service) Poisson:ln(l) – ln(service)= b0 + b1(operation) + b2(construct) + b3(ship_type) ln(l/service) • >1 • Ship variance? -2 Log Likelihood 136.56 (more fit statistics) Pearson Chi-Square 42.28 Pearson Chi-Square / DF 1.69

  22. Without ship variance component: Type III Tests of Fixed Effects NumDen Effect DF DF F Value Pr > F Operation 1 25 10.57 0.0033 Construct 3 25 9.72 0.0002 Shiptype4 25 6.50 0.0010

  23. PROCGLIMMIX data=ships method=quad; class operation construct shiptype ship; model incidents = operation construct shiptype/ dist=poisson s offset=log_service; covtest "no ship effect" glm; * random ship; random intercept / subject = operation*construct*shiptype; run; Covariance Parameter Estimates Standard CovParm Subject Estimate Error Intercept Operat*Constr*Shipty0.

  24. Fit Statistics -2 Log Likelihood 136.56 Fit Statistics for Conditional Distribution -2 log L(incidents | r. effects) 136.56 Pearson Chi-Square 42.27 Pearson Chi-Square / DF 1.24 Type III Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F Operation 1 25 10.57 0.0033 no Construct 3 25 9.72 0.0002  shiptype 4 25 6.50 0.0010 changes Tests of Covariance Parameters covtest "no ship effect" glm; Based on the Likelihood Label DF -2 Log Like ChiSqPr > ChiSqNote no ship effect 1 136.56 . 1.0000MI MI: P-value based on a mixture of chi-squares.

  25. Horseshoe Crab study (reference: SAS GLIMMIX course notes): Female nests have “satellite” males Count data – Poisson?Generalized Linear Features (predictors): Carapace Width, Weight, Color, Spine condition Random Effect: SiteMixed Model To nest  Go State

  26. Fit Statistics Gener. Chi-Square / DF 2.77 CovParm Subject Estimate Intercept site 0.1625 Effect Estimate Pr > |t| Intercept -1.1019 0.2527 weight 0.5042 0.0035 width 0.0318 0.5229 procglimmix data=crab; class site; model satellites = weight width / dist=poi solution ddfm=kr; random int / subject=site; output out=overdisppearson=pearson; run; procmeans data=overdisp n mean var; varpearson; run; procunivariate data=crab normal plot; var satellites; run; Histogram # Boxplot 15.5+* 1 0 .* 1 0 . 12.5+* 1 | .* 1 | .** 3 | 9.5+** 3 | .*** 6 | .** 4 | 6.5+******* 13 | .******** 15 +-----+ .********** 19 | | 3.5+********** 19 | | .***** 9 *--+--* .******** 16 | | 0.5+******************************* 62 +-----+ ----+----+----+----+----+----+- * may represent up to 2 counts N Mean Variance --------------------------- 173 -0.0258264 2.6737114 --------------------------- Zero Inflated ?

  27. Zero Inflated Poisson (ZIP) Q: Can zero inflation cause overdispersion (s2>m)? Recall: in Poisson, s2=m=l

  28. A: yes!

  29. Nice job Grandpa. That proof just about put everyone to sleep

  30. Dickey ncsu Zero Inflated Poisson - (ZIP code  ) SAS Global Forum Washington DC 20745 procnlmixed data=crab; parms b0=0bwidth=0bweight=0 c0=-2 c1=0s2u1=1 s2u2=1; x=c0+c1*width+u1; p0 = exp(x)/(1+exp(x)); * width affects p0; eta= b0+bwidth*width +bweight*weight +u2; lambda=exp(eta); if satellites=0 then loglike = log(p0 +(1-p0)*exp(-lambda)); else loglike = log(1-p0)+satellites*log(lambda)-lambda-lgamma(satellites+1); expected=(1-p0)*lambda; id p0 expected lambda; model satellites~general(loglike); Random U1 U2~N([0,0],[s2u1,0,s2u2]) subject=site; predict p0+(1-p0)*exp(-lambda) out=out1; run;

  31. Parameter Estimates Parameter Estimate t Pr>|t| Lower Upper b0 2.7897 2.55 0.0268 0.3853 5.1942 bwidth -0.0944 -1.65 0.1267-0.2202 0.0314 bweight 0.4649 2.38 0.0366 0.0347 0.8952 c0 13.3739 4.42 0.0010 6.7078 20.0401 c1 -0.5447 -4.61 0.0008 -0.8049 -0.2844 s2u1 0.5114 1.12 0.2852 -0.4905 1.5133 s2u2 0.1054 1.67 0.1239 -0.0339 0.2447 weight affects l width affects p0 Variance for p0 l

  32. From fixed part of model, compute Pr{count=j} and plot (3D) versus Weight, Carapace width

  33. Your talk seems better now Grandpa!

  34. Another possibility: Negative binomial Number of failures until kth success ( p=Prob{success} )

  35. Negative binomial: In SAS, k (scale) is our 1/k procglimmix data=crab; class site; model satellites = weight width / dist=nb solution ddfm=kr; random int / subject=site; run; Fit Statistics -2 Res Log Pseudo-Likelihood 539.06 Generalized Chi-Square 174.83 Gener. Chi-Square / DF 1.03 Covariance Parameter Estimates CovParm Subject Estimate Std. Error Intercept site 0.09527 0.07979 Scale 0.7659 0.1349 Standard Effect Estimate Error DF t Value Pr > |t| Intercept -1.2022 1.6883 168.5 -0.71 0.4774 weight 0.6759 0.3239 156.6 2.09 0.0386 width 0.01905 0.08943 166.2 0.21 0.8316

  36. Population average model vs. Individual Specific Model 8 typists Y=Error counts (Poisson distributed) ln(li)= ln(mean of Poisson) = m+Ui for typist i so li=em+Ui conditionally (individual specific) Distributions for Y, U~N(0,1) and m=1 l=em=e1=2.7183 = mean for “typical” typist (typist with U=0)

  37. Population average model Expectation ||||| | | of individual distributions averaged across population of all typists. Run same simulation for 8000 typists, compute mean of conditional population means, exp(m+U). The MEANS Procedure Variable N Mean Std Dev Std Error ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ lambda 80004.4280478 6.0083831 0.067175 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Z=(4.428-2.7183)/0.06718 = 25.46 !! Population mean is notem Conditional means, m+U, are lognormal. Log(Y)~N(1,1)  E{Y}=exp(m+0.5s2) = e1.5 = 4.4817

  38. Main points: • Generalized linear models with random effects are subject specific models. • Subject specific models have fixed effects that represent an individual with random effects 0 (individual at the random effect distributional means). • Subject specific models when averaged over the subjects do not give the model fixed effects. • Models with only fixed effects do give the fixed effect part of the model when averaged over subjects and are thus called population average models.

More Related