1 / 44

Debuggin’

Debuggin’. Now comes the hard part. If the model is of moderate difficulty, we may run into problems in estimation. Even if the model is easy, the bugs code can be tricky So, what do you do?. Bugs is a pain. Most of the time we can execute programs line by line

tuwa
Download Presentation

Debuggin’

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Debuggin’

  2. Now comes the hard part • If the model is of moderate difficulty, we may run into problems in estimation. • Even if the model is easy, the bugs code can be tricky • So, what do you do?

  3. Bugs is a pain • Most of the time we can execute programs line by line • In R or stata we can see if specific lines are creating problems and the errors we see are often informative. • We can also see what is available in memory or the workspace. • You can, for instance, check the dimensionality of objects

  4. Bugs is a pain • This doesn’t work in Bugs • Bugs either works as a program, or it doesn’t. • There are a couple of different sources • Syntax • Compiling • Syntax will choke early and tell you so.

  5. Bugs problems • Model not compiling • Too much info in the distribution argument • Defines the mean of y.hat in the distribution • Bugs error: “expected a comma” • Added space • pow (sigma, -2) • Bugs error “invalid or unexpected token scanned”

  6. Bugs problems • Parameter not defined • mu.a definition is removed • Bugs error: “made use of undefined node mu.a” • Multiply defined paramters • y.hat <- a[county[i]] + b*x[i] • Should be y.hat[i] • Bugs error: “empty slot not allowed in variable name”

  7. Bugs problems • Data problems • Missing data • The first observation of x is NA • Undefined node • Wrong variables, include x code says x3 • Undefined variable • Leaves data file open

  8. Bugs problems • Subscripting wrong • i should be j or vice versa • n should be J • Initial values • Can ask initial values to generate values out of bounds (a uniform that is strictly negative • Error is: value of uniform sigma.a must be less than upper bound

  9. Bugs Problems • Nonsense results • Remember that the second element in the normal definition is the precision, not the variance • Allowing the variance to vary as a function of the individual observations • y[i] ~ dnorm(y.hat[i], tau.y[i]) • List the data to make sure it is what you think it is

  10. General Solution • Go from easy to hard model • Change line by line to make sure the code works • If all else fails, start over

  11. Speeding convergence • Thinning • The option in bug is n.thin • Set the number of iterations between saves • Don’t save every iteration to memory • Cuts down on the autocorrelation • Helps in convergence

  12. Convergence • Center the data • Cuts down on the correlation between the parameters • Speeds up convergence • The individual draws are form the conditional, but if the parameters are highly correlated then the draws are conditioned on the other estimates

  13. Redundant parameters • The idea is that some parameters are perfectly collinear with the problematic ones. • For instance, the group level means are perfectly collinear with the mean and variances

  14. Redundant parameters • Example:

  15. Redundant parameters model{ for (i in 1:n){ y[i] ~ dnorm(y.hat[i], tau.y) y.hat[i]<- mu+eta[county[i]] } mu ~ dnorm (0, .0001) tau.y <- pow(sigma.y, -2) sigma.y ~ d.unif (0, 100) for (j in 1:n.county){ eta[j]~ dnorm (0, tau.eta) } tau.eta <- pow(sigma.eta, -2) sigma.eta~dunif(0, 100) }

  16. Redundant parameters • The problem is the eta estimates will get stuck a ways away from zero. • It can be faster • Change the loop mu.adj <- mu + mean(eta[]) for (j in 1:n.county){ eta[j] ~ dnorm (0, tau.eta) eta.adj[j] <- eta[j] – mean(eta[]) } mu.eta ~ dnorm(0, .0001) }

  17. need initial values of all of the parameters • Only need to monitor the adjusted parameters • These are easier to estimate and work better

  18. Summarizing results • In classical models we report: • Coefficients • Standard errors • Diagnostics (maybe) • Graph of interesting results • We make inferences by • Specifying a null hypothesis • Looking at some test statistic • Use a knife edged test (p<0.05?)

  19. Multilevel models • These models create difficulties in this framework • Too many parameters • We don’t always care about all of them • Do you care about the random differences in the intercepts from different groups? • If we are estimating a Bayesian model inference is different. • There is both uncertainty and variability in the estimates

  20. Which parameters to present? • Which ones do you care about? • Generally, I don’t care about the higher level random parameters. • I care about the gammas—these test hypotheses • I don’t care about the etas. These are to get the specification right. • We generally don’t care about proper names

  21. Example Steenbergen and Jones • Steenbergen and Jones (2002), an intro to HLM in AJPS • Model: • Data: Eurobarometer of 15 countries in 1996 • DV is support for the EU • Ind. Level IV’s: low income, high income, ideology, opinion leadership, male, age, • Macro level IV’s: party cue, Tenure in the EU, trade. • Note: Two levels of macro: Party and nation • Three sources of dataerror: individual, party and country • Three subscripts individual, party, country • Subscripting increases alphabetically as aggregation increases

  22. Full equation

  23. Presentation • Note a couple of things • It is almost indistinguishable from the regression results • Variance components set aside • Could have added subheadings to separate out the effects into different levels • No party or country level effects presented (or probably even calculated) • This is MUCH better than G&H’s style

  24. Random slope Mean Variance Covariance with random intercept

  25. Uncertainty versus variability • Even if we take this approach there is something different here. • There are two “variance” estimates in the model • Standard error of the gamma term (level two effects) • Variance in the level one effect (tau) • These are different

  26. What are they? • The standard error is the uncertainty we have in the estimate of the mean • The variance term is how much the “average” effect varies across the groups • One is a feature of the probabilistic enterprise of statistics • The second is the variability inherent in the world

  27. Huh? • Keep in the mind the difference with OLS • In OLS we assume the tau’s are zero. • OLS is special case of HLM • We still get standard errors • They are the variance estimate for our coefficients • Adding the random terms says that there is something about the groups (aside from the z’s) that explains either y (random intercept) or the link between x&y (random slope)

  28. Superpopulation and finite population variance • It gets worse • There are two ways to think about the group level random terms • Superpopulation—the variance in the probability distribution that the terms were assumed to be drawn from—this is the uncertainty we have about the value of a NEW group

  29. Finite population—these are the variances of the estimated error terms themselves • The distinction is similar to the standard deviation standard error distinction or the difference between sigma squared and s squared in regression

  30. Superpopulation are the ones reported by R. They are the variances in the probability distribution that we estimate • Finite populations are smaller—the equations are in G&H and I don’t’ really want to deal with them. I haven’t seen any examples of people who care about them. • But the distinction is important. These superpopulation estimates are what we are talking about. The broader probability distribution of our coefficients.

  31. Hypothesis testing • The key is to keep in mind what the null hypothesis is. • Do variables matter? • This is easy—these are the individual coefficients • Ignore the random stuff for a minute. • We are specifying linear, additive, and maybe interactive models. In HLM, these are the gammas • This is the irony—theory is all about the gammas. Difficulty is all about the tau’s

  32. Bayesian inference • With Bugs we are living in a different world. • Likelihood—calculate beta, get the standard error, compare the ratio to a z table • But think about what you are doing • Coefficients follow some distribution that you are summarizing • Bayesian is different • Parameter estimates are not assumed to follow any distribution. • Use the Monte Carlo estimates

  33. Bayesian inference • “null hypothesis” is still basically the same • Effect is zero—compared to a largely positive or negative effects • What does this mean? That the samples drawn from the posterior will contain zero or less(more) with some regularity. • Because Bayesian inference does not rely on the same probabilistic theory, you can’t really have the same knife edged inference

  34. Bayesian Inference • What you see is how much of the posterior sample is larger(smaller) than zero. If this is a sizable proportion, then the effect is “significant.” If it isn’t a sizable proportion, then it isn’t signficant. • My experience is that readers still like the classical framework.

  35. Bayesian inference • What do you present • For each of the effects: • Mean • Standard deviation • Confidence region • Bugs output gives us all of that • Maybe how much of the sample is greater or less than zero. • Histogram of the posterior sample might be nice.

  36. Bayesian Inference • See handout

  37. Also why we don’t really care about the group level effects. They aren’t zero.

  38. Complex hypothesis tests • Deviance tests • Note that BUGS output gives us the DIC • Smaller DIC is better • It reminds us of that • Can compare nested models this way

  39. So, do I need HLM? • Basic strategy is to start easy and work to hard. • Run lm/glm • That is, assume that the group level errors are all zero. • Relax this assumption and estimate a likelihood based HLM. • See how much the results change and if the deviance tests change • Start with a random intercept and work up

  40. So do I need HLM • Add parameters and complexity as needed. • Note the difference between a “random slope” model and an interactive one • Random implies probability distribution • It is always a good idea run as complex a model as you can to satisfy yourself that it does not better.

  41. MLE vs. Bayesian • When would I use lmer? • When the model is pretty easy • N and J are large • Not a lot of random terms • Not much covariance between them

  42. Other issues • There aren’t great fit statistics • R-square but that strikes me as silly • Subscripting • It matters when you write out the equation • It doesn’t matter much to lmer (you don’t need to specify it in the equation) when you are specifying the coefficients • It does matter a lot to lmer when you are specifying the random terms. You can only have random terms (1|variable) of lower level variables based on upper level variables. The order matters. The lower level goes before the | and the upper level goes after it.

More Related