1 / 13

Dummy Variables

Dummy Variables. Dummy variables refers to the technique of using a dichotomous variable (coded 0 or 1) to represent the separate categories of a nominal level measure.

landry
Download Presentation

Dummy Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dummy Variables • Dummy variables refers to the technique of using a dichotomous variable (coded 0 or 1) to represent the separate categories of a nominal level measure. • The term “dummy” appears to refer to the fact that the presence of the trait indicated by the code of 1 represents a factor or collection of factors that are not measurable by any better means within the context of the analysis.

  2. Coding of dummy Variables • Take for instance the race of the respondent in a study of voter preferences • Race coded white(0) or black(1) • There are a whole set of factors that are possibly different, or even likely to be different, between voters of different races • Income, socialization, experience of racial discrimination, attitudes toward a variety of social issues, feelings of political efficacy, etc. • Since we cannot measure all of those differences within the confines of the study we are doing, we use a dummy variable to capture these effects.

  3. Multiple categories • Now picture race coded white(0), black(1), Hispanic(2), Asian(3) and Native American(4) • If we put the variable race into a regression equation, the results will be nonsense since the coding implicitly required in regression assumes at least ordinal level data – with approximately equal differences between ordinal categories. • Regression using a 3 (or more) category nominal variable yields un-interpretable and meaningless results.

  4. Creating Dummy variables • The simple case of race is already coded correctly • Black: coded 0 for white and 1 for black • Note the coding can be reversed and leads only to changes in sign and direction of interpretation. • The complex nominal version turns into 5 variables: • White; coded 1 for whites and 0 for non-whites • Black; coded 1 for blacks and 0 for non-blacks • Hispanic; coded 1 for Hispanics and 0 for non- Hispanics • Asian; coded 1 for Asians and 0 for non- Asians • AmInd; coded 1 for native Americans and 0 for non-native Americans

  5. Regression with Dummy Variables • The dummy variable is then added the regression model • Interpretation of the dummy variable is usually quite straightforward. • The intercept term represents the intercept for the omitted category • The slope coefficient for the dummy variable represents the change in the intercept for the category coded 1 (blacks)

  6. Regression with only a dummy • When we regress a variable on only the dummy variable, we obtain the estimates for the means of the depended variable. • a is the mean of Y for Whites and a+B1 is the mean of Y for Blacks.

  7. Omitting a category • When we have a single dummy variable, we have information for both categories in the model • Also note that White = 1 – Black • Thus having both a dummy for White and one for Blacks is redundant. • As a result of this, we always omit one category, whose intercept is the model’s intercept. • This omitted category is called the reference category • In the dichotomous case, the reference category is simply the category coded 0 • When we have a series of dummies, you can see that the reference category is also the omitted variable.

  8. Suggestions for selecting the reference category • Make it a well defined group – ‘other’ or an obscure one (low n) is usually a poor choice. • If there is some underlying ordinality in the categories, select the highest or lowest category as the reference. (e.g. blue-collar, white-collar, professional) • It should have ample number of cases. The modal category is also often a good choice.

  9. Multiple dummy Variables • The model for the full dummy variable scheme for race is: • Note that the dummy for White has been omitted, and the intercept a is the intercept for Whites.

  10. Tests of Significance • With dummy variables, the t tests test whether the coefficient is different from the reference category, not whether it is different from 0. • Thus if a = 50, and B1 = -45, the coefficient for Blacks might not be significantly different from 0, while Whites are significantly different from 0

  11. Interaction terms • When the research hypotheses state that different categories may have differing responses to other independent variables, we need to use interaction terms. • For example, race and income interact with each other so that the relationship between income and ideology is different (stronger or weaker) for Whites than Blacks.

  12. Creating Interaction terms • To create an interaction term is easy • Multiply the category * the independent variable • The full model is thus: • a is the intercept for Whites; • (a + B1) is the intercept for Blacks; • B2 is the slope for Whites; and • (B2 + B3) is the slope for Blacks • t-tests for B1 and B3 are whether they are different than a and B2

  13. Separating Effects • The literature is unclear on how to fully interpret interaction effects • There is multicolinearity between a dummy and its interaction terms, and also the regular independent variable • It is suggested that you do not use a model with Interactions terms and no intercept!

More Related