600 likes | 1.3k Views
PROC GLIMMIX Generalized Mixed Linear Models. Animal Science 500 Lecture No. 17- 18 October 25, 2010. GLIMMIX Information. PROC GLIMMIX is a procedure for fitting G eneralized Li near Mix ed M odels GLiM’s (or GLM’s) allow for non-normal data and random effects
E N D
PROC GLIMMIXGeneralized Mixed Linear Models Animal Science 500 Lecture No. 17- 18 October 25, 2010
GLIMMIX Information • PROC GLIMMIX is a procedure for fitting Generalized Linear Mixed Models • GLiM’s (or GLM’s) allow for non-normal data and random effects • GLiM’s allow for correlation amongst responses An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX P. Gibbs, SAS Technical Support
Getting GLIMMIX • SAS 9.1 Download add-on (Windows, Unix, Linux) from • http://support.sas.com • http://www.sas.com/statistics • Supported on a limited number of platforms and platform configurations • SAS 9.2 (available now for most academic sites) An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX P. Gibbs, SAS Technical Support
GLIMMIX overview • PROC GLIMMIX fits statistical models to data with correlations or nonconstant variability and where the response is not necessarily normally distributed. • These models are known as generalized linear mixed models (GLMM). • The GLMMs, like linear mixed models, assume normal (Gaussian) random effects. • Conditional on these random effects, data can have any distribution in the exponential family
GLIMMIX overview • The exponential family comprises many of the elementary discrete and continuous distributions and include: • Binary, • The experiment consists of n repeated trials. • Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure. • The probability of success, denoted by P, is the same on every trial. • The trials are independent - that is, the outcome on one trial does not affect the outcome on other trials. • Binomial, • Poisson, and • Negative binomial distributions,
GLIMMIX overview • The exponential family comprises many of the elementary discrete and continuous distributions and include: • Binomial, • Situations in which the coin for example is biased, so that heads and tails have different probabilities. • The probability distributions for which there are just two possible outcomes with fixed probability summing to one. • These distributions are called are called binomial distributions • Poisson, and • Negative binomial distributions,
GLIMMIX overview • The exponential family comprises many of the elementary discrete and continuous distributions and include: • Poisson, • The poisson distribution is an appropriate model for count data. • Examples of such data are mortality data, • The number of misprints in a book, • The number of bacteria on a plate, and • The number of activations of a geiger counter. • Negative binomial distributions,
GLIMMIX overview • The exponential family comprises many of the elementary discrete and continuous distributions and include: • Negative binomial distributions, • The probability distribution of a negative binomial random variable is called a negative binomial distribution. • Also known as the Pascal distribution. • Example: You are flipping a coin repeatedly and count the number of heads (successes). If we continue flipping the coin until it has landed 2 times on heads, we are conducting a negative binomial experiment. • The negative binomial random variable is the number of coin flips required to achieve 2 heads.
GLIMMIX overview • The exponential family comprises many of the elementary discrete and continuous distributions and include: • The previous distributions are discrete members of this family. • The normal, beta, gamma, and chi-square distributions are representatives of the continuous distributions in this family. • In the absence of random effects, the GLIMMIX procedure fits generalized linear models (fit by the GENMOD procedure). • GLMMs are useful for estimating trends in disease rates
GLIMMIX overview • The continuous distribution forms in the exponential family include the: • Normal, also called Gaussian • Beta • The Beta distribution has two main uses: • As the description of uncertainty or random variation of a probability, fraction or prevalence; • As a useful distribution one can rescale and shift to create distributions with a wide range of shapes and over any finite range. As such, it is sometimes used to model expert opinion
GLIMMIX overview • The continuous distribution forms in the exponential family include the: • Gamma, • Applications based on intervals between events which derive from it being the sum of one or more exponentially distributed variables. In this form, examples of its use include queuing models, the flow of items through manufacturing and distribution processes and the load on web servers and the many and varied forms of telecom exchange. • Due to its moderately skewed profile, it can be used as a model in a range of disciplines, including climatology where it is a workable model for rainfall and financial services where it has been used for modelling insurance claims and the size of loan defaults and as such has been used in probability of ruin and value at risk calculations. From http://www.brighton-webs.co.uk/distributions/gamma.asp
GLIMMIX overview • The continuous distribution forms in the exponential family include the: • Chi-square distributions • The best-known situations in which the chi-square distribution is used are the common chi-square tests for goodness of fit to compare an observed distribution to a known or theoretical distribution. • Example expected movie rating distribution to the observed movie rating distribution • Also can be used to test the independency of two criteria of classification of qualitative data. • χ2 = Σ(O – E)2 E
GLIMMIX overview • The continuous distribution forms in the exponential family include the: • Chi-square distributions • Hypotheses • H0: The distribution of observed frequencies equals the distribution of expected frequencies. • H1: The distribution of observed frequencies does not equal the distribution of expected frequencies. • Assumptions • Observations are independent (each subject can appear once and only once in a table) • Expected frequencies in each row are at least 15.
Example of a Chi Square distribution • Example 1: Pepsi Challenge • Test whether cola preference among 220 college students in a simple random sample is equally distributed. • Each individual tastes each of the three colas. • Between tastes subjects eat a soda cracker. • Each subject receives the colas in a different order. • Each subject then selects which soda he/she likes best. • Results: Pepsi 85, Coke 57, RC 78. • Use equal expected frequencies for each row, E = 73.33. O E O-E (O-E)2 (O-E)2/E Pepsi 85 73.33 11.67 136.19 1.86 Coke 57 73.33 -16.33 266.67 3.64 RC 78 73.33 4.67 21.81 0.3 Totals 220 219.99 χ2 = 5.8 • df = rows - 1 = 3 - 1 = 2. • Critical value of χ2 = 5.99 at alpha = 0.05. • Observed value of χ2 = 5.8. • Decision: Fail to reject H0. Example from: http://www.philender.com/courses/intro/notes3/chi.html
Distributions Supported in PROC GLIMMIX • Discrete • Binary • Binomial • Poisson • Geometric • Negative Binomial • Multinomial (nominal and ordinal) • Continuous • Beta • Normal • “Lognormal” • Gamma • Exponential • Inverse Gaussian • Shifted T Distributions specified through DIST= (and LINK=) options on the MODEL statement An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX P. Gibbs, SAS Technical Support
GLIMMIX overview • In the absence of random effects, the GLIMMIX procedure fits generalized linear models (fit by the GENMOD procedure). • GLMMs are useful for estimating: • Trends in disease rates, • Modeling CD4 counts in a clinical trial over time, • Modeling the proportion of infected plants on experimental units in a design with randomly selected treatments or randomly selected blocks • Predicting the probability of high ozone levels in counties • Modeling skewed data over time, • Analyzing customer preference, • Joint modeling of multivariate outcomes, etc.
GLIMMIX overview • The syntax in SAS to use GLIMMIX to what we have learned for Proc Mixed. • This includes CLASS, MODEL, and RANDOM statements.
PROC GLIMMIX features. • SUBJECT= and GROUP= options, which enable blocking of variance matrices and parameter heterogeneity • Linear unbiased predictors • Flexible covariance structures for random and residual random effects, including variance components, unstructured, autoregressive, and spatial structures • The CONTRAST, ESTIMATE, LSMEANS, and LSMESTIMATE statements, which produce hypothesis tests and estimable linear combinations of effects • The NLOPTIONS statement, which enables you to exercise control over the numerical optimization. • You can choose techniques, update methods, line search algorithms, convergence criteria, and more. Or, you can choose the default optimization strategies selected for the particular class of model you are fitting.
PROC GLIMMIX features. • Computed variables with SAS programming statements inside of PROC GLIMMIX (except for variables listed in the CLASS statement). • These computed variables can appear in the MODEL, RANDOM, WEIGHT, or FREQ statement. • User-specified link and variance functions choice of model-based variance-covariance estimators for the fixed effects or empirical (sandwich) • Estimators to make analysis robust against misspecification of the covariance structure and to adjust for small-sample bias joint modeling for multivariate data. • For example, you can model binary and normal responses from a subject jointly and use random effects to relate (fuse) the two outcomes.
Comparing the GLIMMIX and MIXED Procedures • The MIXED procedure is different from the GLIMMIX procedure in the following respect: • Linear mixed models are a special case in the family of generalized linear mixed models; • A linear mixed model is a generalized linear mixed model where the conditional distribution is normal and the link function is the identity function. • Most models that can be fit with the MIXED procedure can also be fit with the GLIMMIX procedure. • Despite this overlap in functionality, there are also some important differences between the two procedures. • Knowledge concerning the differences enables the user to select the most appropriate tool in situations where you have a choice between procedures and to identify situations where a choice does not exist.
Comparing the GLIMMIX and MIXED Procedures The following PROC MIXED statement when using the repeated statement repeated / subject=id type=ar(1); is equivalent to the following Random statement in the GLIMMIX procedure: random _residual_ / subject=id type=ar(1);
Syntax: GLIMMIX Procedure • You can specify the following statements in the GLIMMIX procedure: • PROC GLIMMIX <options> ; • BY variables ; • CLASS variables ; • CONTRAST ’label’ contrast-specification <, contrast-specification> <, ...> </ options> ; • COVTEST <’label’> <test-specification> </ options> ; • EFFECT effect-specification ; • ESTIMATE ’label’ contrast-specification <(divisor=n)><, ’label’ contrast-specification <(divisor=n)>> <, ...> </ options> ; • FREQ variable
Syntax: GLIMMIX Procedure • ID Variables ; • LSMEANS fixed-effects </ options> ; • LSMESTIMATE fixed-effect <’label’> values <divisor=><, <’label’> values <divisor=n>> <, ...> </ options> ; • MODEL response<(response-options)> = <fixed-effects> </ model-options> ; • MODEL events/trials = <fixed-effects> </ model-options> ; • NLOPTIONS <options> ; • OUTPUT <OUT=SAS-data-set><keyword<(keyword-options)> <=name>>...<keyword<(keyword-options)> <=name>> </ options> ; • PARMS (value-list) ...</ options> ; • RANDOM random-effects </ options> ;
Syntax: GLIMMIX Procedure • WEIGHT variable ; • Programming statements ; • The CLASS, CONTRAST, COVTEST, EFFECT, ESTIMATE, LSMEANS, LSMESTIMATE, and RANDOM statements and the programming statements can appear multiple times. • The PROC GLIMMIX and MODEL statements are required, and the MODEL statement must appear after the CLASS statement if a CLASS statement is included. • The EFFECT statements must appear before the MODEL statement.
Comparing MIXED and GLIMMIX PROC GLIMMIX PROC MIXED BY BY CLASS CLASS CONTRAST CONTRAST EFFECT ESTIMATE ESTIMATE FREQ ID ID LSMEANS LSMEANS LSMESTIMATE MODEL MODEL NLOPTIONS OUTPUT PARMS PARMS PRIOR RANDOM RANDOM REPEATED WEIGHT WEIGHT <Programming Statements> An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX P. Gibbs, SAS Technical Support
Comparing MIXED and GLIMMIX MIXED uses RANDOM statement for G-side effects and REPEATED statement for R-side effects. An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX P. Gibbs, SAS Technical Support
Comparing MIXED and GLIMMIX Both types of effects are specified with the RANDOM statement in GLIMMIX An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX P. Gibbs, SAS Technical Support
Comparing MIXED and GLIMMIX What are G-and R-side Random Effects? Recallr from mixed models: Y = X*Beta + Z*Gamma + E • G-side effects enter through Z*Gamma • R-side effects apply to the covariance matrix on E • G-side effects are “inside” the link function, making them easier to interpret and understand • R-side effects are “outside” the link function and are more difficult to interpret An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX P. Gibbs, SAS Technical Support
Glimmix Example Proc glimmix data=one; Class treatment date site load; Model deads/pigs_transported = treatment/ dist=binomial link=logit solution; Random site date(site) load(date*site); LSMeanstreatment/ilinkpdiffcl; Run; Quit;
Glimmix Example The GLIMMIX Procedure Model Information Data Set WORK.ONE Response Variable (Events) Deads Response Variable (Trials) Pigs_Transported Response Distribution Binomial Link Function Logit Variance Function Default Variance Matrix Not blocked Estimation Technique Residual PL Degrees of Freedom Method Containment
Glimmix Example Class Levels Values Treatment 2 Blue Red Date 10 07/07/09 07/08/09 07/13/09 07/14/09 07/15/09 07/20/09 07/21/09 07/22/09 07/27/09 07/28/09 Site 2 L&L1 LPB Load 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 28 Number of Observations Read 54 Number of Observations Used 54 Number of Events 10 Number of Trials 4462
Glimmix Example Dimensions G-side Cov. Parameters 3 Columns in X 3 Columns in Z 40 Subjects (Blocks in V) 1 Max Obs per Subject 54
Glimmix Example The GLIMMIX Procedure Iteration History Objective Max Iteration Restarts Subiterations Function Change Gradient 0 0 1 180.8730287 2.00000000 9.588598 1 0 0 226.21287482 0.17907168 5.707842 2 0 3 244.93049605 2.00000000 4.510821 3 0 2 241.99123222 0.24664831 4.378435 4 0 2 241.22432004 0.03671922 4.357186 5 0 1 241.08063527 0.00328332 4.35531 6 0 1 241.06655367 0.00015363 4.355223 7 0 0 241.06587398 0.00000000 4.355221 Convergence criterion (PCONV=1.11022E-8) satisfied. Estimated G matrix is not positive definite. The Estimated G matrix not positive definite message usually indicates that one or more variance components on the RANDOM statement is/are estimated to be zero and could/should be removed from the model.
Glimmix Example Fit Statistics -2 Res Log Pseudo-Likelihood 241.07 Generalized Chi-Square 47.41 Gener. Chi-Square / DF 0.91
Glimmix Example Covariance Parameter Estimates Standard CovParm Estimate Error Site 0 . Date(Site) 0 . Load(Date*Site) 0.1569 0.7068
Glimmix Example Solutions for Fixed Effects Standard Effect Treatment Estimate Error DF t Value Pr > |t| Intercept -5.9213 0.4160 1 -14.23 0.0447 Treatment Blue -0.4067 0.6466 26 -0.63 0.5348 Treatment Red 0 . . . .
Glimmix Example Type III Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F Treatment 1 26 0.40 0.5348
Glimmix Example Treatment Least Squares Means Standard Treatment Estimate Error DF t Value Pr > |t| Alpha Lower Upper Mean Blue -6.3280 0.5066 26 -12.49 <.0001 0.05 -7.3692 -5.2867 0.001782 Red -5.9213 0.4160 26 -14.23 <.0001 0.05 -6.7764 -5.0662 0.002675 Treatment Least Squares Means Standard Error Lower Upper Treatment Mean MeanMean Blue 0.000901 0.000630 0.005033 Red 0.001110 0.001139 0.006267
Glimmix Example Differences of Treatment Least Squares Means Standard Treatment _Treatment Estimate Error DF t Value Pr > |t| Alpha Lower Upper Blue Red -0.4067 0.6466 26 -0.63 0.5348 0.05 -1.7358 0.9224