§❺ Metropolis-Hastings sampling and general MCMC approaches for GLMM

§❺ Metropolis-Hastings sampling and general MCMC approaches for GLMM Robert J. Tempelman

Genetic linkage example…again Recall plant genetic linkage analysis problem or Suppose flat constant prior (p(q)1) was used. Then

Suppose posterior density is not recognizable • Additionally, suppose there is no clear data augmentation strategy • Several solutions: • e.g. adaptive rejection sampling (not discussed here) • One recourse is to use the Metropolis-Hastings algorithm in which one generates from a candidate (or proposal)density function q(q', q'') in generating a MCMC chain of random variatesfrom. • q‘ : where you’re at nowat current MCMC cycle • q'': proposed value for next MCMC cycle

Metropolis Hastings • Say MCMC cycle is currently at value q[t-1] from cycle t-1. • Draw a proposed value q* from candidate density for cycle t. • Accept move from q[t-1] to q[t] = q* with probability: • Otherwise set q[t] = q* Good readable reference? Chib and Greenburg (1995)

How to compute this ratio “safely” • Always use logarithms whenever evaluating ratios!!! • Once you compute this…then backtransform

Back to plant genetics example Recall y1=1997, y2=906, y3=904, y4=32. Let’s use as the candidate generating function (based on likelihood approx.) 1.Determine a starting value (i.e. 0th cycle) q[o] 2.For t = 1, m (number of MCMC cycles) a) Generate q* from q(q[t-1], q*) = N(0.0357,3.6338 x 10-5) b) Generate U from a Uniform(0,1) distribution c) If U<a(q[t-1], q*) then set q[t]=q *, else set q[t]=q[t-1] • Note that this is an independence chains algorithm q(q[t-1], q*) = N(m =0.0357,s2 = 3.6338 x 10-5) q(q[t-1], q*) = q(q*)

Independence chains Metropolis • When candidate does not depend on q[t-1] • i.e. • However, in spite of this “independence” label, there is still serial autocorrelation between the samples. • IML code online. Generate output for 9000 draws after 1000 burn-in samples. Save every 10. q(q[t-1], q*) = q(q*)

Key plots and summaries

Monitoring MH acceptance rates over cycles for genetic linkage example • Average MH acceptance rates (for every 10 cycles) Many acceptance rates close to 1! Is this good? NO Intermediate acceptance ratios (0.25-0.5) are optimal for MH mixing.

How to optimize Metropolis acceptance ratios • Recall q(q[t-1], q*) = N(m,s2) • m = 0.0357, s2=3.6338 x 10-5 • Suggest using q(q[t-1], q*) = N(m,cs2) and modify c (during burn-in) so that MH acceptance rates are intermediate • Increase c….decrease acceptance rates • Decrease c….increase acceptance rates.

“Tuning” the MH-sampler:My strategy • Every 10 MH cycles for first half of burnin, assess the following: • if average acceptance rate > .80, then set c= 1.2c, • if average acceptance rate < .20 then set c= 0.7c, • otherwise let c be. • SAS PROC MCMC has a somewhat different strategy. • Let’s rerun same PROC IML code again but with this modification.

Average acceptance ratio versus cycle(during 400 burn-in cycles) One should finish the tuning process not much later than half-ways through “burnin” c cycle

Monitoring MH acceptance rates over cycles • Average MH acceptance rates (every 10 cycles) post burn-in (16000 cycles)

Posterior density of q.

Random walk Metropolis sampling • More common (especially when proposals based on likelihood function are not plausible) than independence chains Metropolis. • Proposal density is chosen to be symmetric in q* and q[t-1]. • i.e. q(q[t-1], q*) = q(q*, q[t-1]) • Example: generate a random variatedfrom N(0,cs2) and add it to the previous cycle value q[t-1] to generate q*= q[t-1] + d:same as sampling from

Random walk Metropolis (cont’d) • Because of symmetricity of q(q[t-1], q*) in q[t-1] and q*, MH acceptance ratio simplifies: • i.e., because

Back to example. • Start again with s2 = 0.00602 and a starting value for q[t-1] at t=1. • Generate proposed value from acceptwith probability • i.e., generate from N(0,cs2) and add to q[t-1] • Tune c for intermediate acceptance ratesduring burn-in.

Summary

What about “canned” software? • WinBugs • AD Model Builder • Various R packages (MCMCglmm) • SAS PROC MCMC • Will demonstrate shortly…functions a bit like PROC NLINMIXED (no class statement) • They all work fine. • But sometimes they don’t recognize conjugacy in priors • i.e., can’t distinguish between conjugate and non-conjugate (Metropolis) sampling. • So often defaults to Metropolis. (PROC MCMC: random walk Metropolis)

Recall old split plot in time example • Recall the “bunny” example from earlier. • We used PROC GLIMMIX and MCMC (SAS PROC IML) to analyze the data. • Our MCMC implementation involved recognizeable FCD • Split plot in time assumption. • Other alternatives? • Marginal versus conditional specifications on CS • AR(1) • Others? • Some FCD are not recognizeable • Metropolis updates necessary. • Let’s use SAS PROC MCMC.

First create the dummy variables using PROC TRANSREG (PROC MCMC does not have a “CLASS” statement)(Dataset called ‘recodedsplit’) Part of X matrix (full-rank) &_trgind

SAS PROC MCMC(“Conditional” specification) data null; call symputx(‘seed',8723); call symputx('nvar',12); run; Where to save the MCMC samples procmcmc data=recodedsplit outpost=ksu.postsplitpropcov=quanew seed = &seed nmc=400000thin=10 monitor = (beta1-beta&nvar sigmaesigmag); array covar[&nvar] intercept &_trgind; array beta[&nvar] ; parmssige1 ; * residual sd; parmssigg1 ; * random efsd; parms (beta1-beta&nvar) 1; prior beta:~normal(0,var=1e6); /* prior beta: ~ general(0); could also do this too */ prior sige ~ general(0,lower=0); /* Gelman prior */ prior sigg ~ general(0,lower=0); /* Gelman prior */ Metropolis implementation strategy Save how often? Total number of samples after burnin NBI = 1000 (default number of burn-in cycles) Fixed effects dummy variables Fixed effects Parms: starting values Priors: b ~ N(0,106) p(se) ~ constant; p(su) ~ constant

SAS PROC MCMC(conditional specification) beginnodata; sigmae= sige*sige; sigmau= sigg*sigg; endnodata; call mult(covar, beta, mu); random u ~ normal (0,var=sigmau) subject=trtrabbit ; model y ~ normal(mu + u,var=sigmae); run;

PROC MCMC output

Compare to conditional model results from § 82,84

LSMEANS USING PROC MIXED

“Least-squares means”using output from PROC MCMC Cell means Marginal means Compare to Gibbs sampling results from § 85

Posterior densities of s2us2e Bounded above 0…by definition

The Marginal Model Specification (Type = CS) • SAS PROC MIXED CODE title "Marginal Model: Compound Symmetry using PROC MIXED"; procmixed data=ear ; class trt time rabbit; model temp = trt time trt*time /solution; repeated time /subject = rabbit(trt) type=csrcorr; lsmeanstrt*time; run;

Now • To ensure R is p.s.d, • nt: number of repeated measures per rabbit

Need to format data differently data=recodedsplit1

I’ll keep the covariates in a different file too. data=covariates

PROC MCMC data a; run; /* PROC MCMC WITH COMPOUND SYMMETRY ASSUMPTION */ title1 "Bayesian inference on compound symmetry "; procmcmcjointmodeldata=a outpost=ksu.postcspropcov=quanew seed = &seed nmc=400000 thin=10; array covar[1]/nosymbols ; array data[1]/nosymbols; array first1[1]/nosymbols; array last1[1]/nosymbols; array beta[&nvar] ; array mu[&nrec]; array ytemp[&nrep]; array mutemp[&nrep]; array VCV[&nrep,&nrep]; This data step is a little silly but it is required. jointmodeloption implies that each observation contribution to likelihood function is NOT conditionally independent.

begincnst; rc = read_array("recodedsplit1",data,"y"); rc = read_array("recodedsplit1",first1,"first"); rc = read_array("recodedsplit1",last1,"last"); rc = read_array("covariates",covar); endcnst; parmssige.25 ; * residual sd; parmsintrcl.3 ; * intraclass correlation; parms (beta1-beta&nvar) 1;

beginnodata; prior beta:~normal(0,var=1e6); prior sige ~ general(0, lower=0); /* Gelman prior */ prior intrcl ~ general(0,lower=&lbound1,upper=.999); sigmae = sige*sige; sigmag = intrcl*sigmae; call fillmatrix(VCV,sigmag); do i = 1 to &nrep; VCV[i,i] = sigmae; end; call mult(covar,beta,mu); endnodata; ljointpdf = 0; • &lbound1 = -1/3 (lower bound on CS correlation when blocksize = 4)

do irec = 1 to &nrec; if (first1[irec] = 1) then counter=0; counter = counter + 1; ytemp[counter] = data[irec]; mutemp[counter] = mu[irec]; if (last1[irec] = 1) then do; do; ljointpdf = ljointpdf + lpdfmvn(ytemp, mutemp, VCV); end; end; end; model general(ljointpdf); run;

PROC MCMC

PROC MIXED vs PROC MCMC PROC MIXED PROC MCMC

Posterior marginal densities for s2u and s2e under marginal model Notice how much of the posterior density of s2u is concentrated to the left of 0! Potential “ripple effect” on inferences on K’b? (Stroup and Littell., 2002) relative to conditional spec.?

First order autoregressive model (type = AR(1)) • SAS PROC MIXED CODE title "Marginal Model: AR(1)using PROC MIXED"; procmixed data=ear ; class trt time rabbit; model temp = trt time trt*time /solution; repeated time /subject = rabbit(trt) type= AR(1)rcorr; lsmeanstrt*time; run; CORRECTION!

Specifying VCV for AR(1) • Note • Might be easier to specify: Especially for large Rk(i) Example MCMC code provided online.

Variance Component Inference MCMC PROC MIXED

An example of a “sticky” situation • Consider a Poisson (count data) example. • Simulated data from a split plot design. • 4 whole plots per each of 3 levels of a whole plot factor. • 3 subplots per whole plot -> 3 levels of a subplot factor. • Whole plot variance: s2w = 0.50 • Overdispersion (G-side) variance: • B*wholeplot variance: s2e = 1.00

GLIMMIX code: procglimmix data=splitplot method=laplace; class A B wholeplot subject ; model y = A|B /dist=poisson solution ; random wholeplot(A) B*wholeplot(A); lsmeans A B A*B/e ilink; run;

Inferences on variance components: • PROC GLIMMIX

Using PROC MCMC procmcmc data=recodedsplit outpost=postoutpropcov=quanewseed = 9548nmc=400000 thin=10; array covar[&nvar] intercept &_trgind; array beta[&nvar] ; parmssigmau.5; parmssigmae.5; parms (beta1-beta&nvar) 1; prior beta: ~ normal(0,var=10E6); prior sigmae ~ igamma(shape=.1,scale=.1); prior sigmau ~ igamma(shape=.1,scale=.1); call mult(covar, beta, mu); random u~ normal (0,var=sigmau) subject=plot ; random e~ normal (0,var=sigmae) subject= subject; lambda = exp(mu + u + e); model y ~ poisson(lambda); run;

Some output In the same ball-park as the PROC GLIMMIX solutions/VC estimates…but there is a PROBLEM ->>>>>

Pretty slow mixing

beta1 sigmag sigmae beta2

§❺ Metropolis-Hastings sampling and general MCMC approaches for GLMM

§❺ Metropolis-Hastings sampling and general MCMC approaches for GLMM

Presentation Transcript

approaches to general education

MCMC Estimation

MCMC for Stochastic Epidemic Models

Sampling and Sampling Distributions

CSI5388 Error Estimation: Re-Sampling Approaches

MCMC

Sampling and Sampling Distributions

MCMC Style Sampling

Introduction to Sampling based inference and MCMC

General Concepts on Sampling Frames

Metropolis (1927)

Addressed Based Sampling as an Alternative to Traditional Sampling Approaches:

Metropolis

Perfect Sampling using MCMC and its application in Signal P rocessing

MCMC INTERCESSORY PRAYER

Unit 4: Sampling approaches

SAMPLING AND SAMPLING DISTRIBUTIONS

Introduction to MCMC and BUGS

Efficient MCMC for cosmological parameters

Using MCMC

Hierarchies of Units and non-traditional sampling approaches

«MCMC Conference»