E N D
1. Multiple Regression Involves the use of more than one independent variable.
Multivariate analysis involves more than one dependent variable - OMS 633
Adding more variables will help us to explain more variance - the trick becomes: are the additional variables significant and do they improve the overall model? Additionally, the added independent variables should not be too highly related with each other!
2. Multiple Regression A sample data set:
Sales= hundreds of gallons
Price = price per gallon
Advertising = hundreds of dollars
3. Analyzing the output Evaluate for multicollinearity
State and interpret the equation
Interpret Adjusted R2
Interpret Syx
Are the independent variables significant?
Is the model significant
Forecast and develop prediction interval
Examine the error terms
Calculate MAD, MSE, MAPE, MPE
4. Correlation Matrix Simple correlation for each combination of variables (independents vs. independents; independents vs. dependent)
5. Multicollinearity Its possible that the independent variables are related to one another. If they are highly related, this condition is called multicollinearity. Problems:
A regression coefficient that is positive in sign in a two-variable model may change to a negative sign
Estimates of the regression coefficient change greatly from sample to sample because the standard error of the regression coefficient is large.
Highly interrelated independent variable can explain some of the same variance in the dependent variable - so there is no added benefit, even though the R-square has increased.
We would throw one variable out - high correlation (.7)
6. Multiple Regression Equation
Gallon Sales = 16.4 - 8.2476 (Price) + .59 (Adv)
7. Regression Coefficients bo is the Y-intercept - the value of sales when X1 and X2 are 0.
b1 and b2 are net regression coefficients. The change in Y per unit change in the relevant independent variable, holding the other independent variables constant.
8. Regression Coefficients For each unit increase ($1.00) in price, sales will decrease 8.25 hundred gallons, holding advertising constant.
For each unit increase ($100, represented as 1) in Advertising, sales will increase .59 hundred gallons, holding price constant.
Be very careful about the units! 10 in the advertising indicates $1,000 because advertising is in hundreds
Gallons = 16.4 - 8.2476 (1.00) + .59 (10)
= 14.06 or 1,406 Gallons
9. Regression Coefficients How does a one cent increase in price affect sales (holding advertising at $1,000)?
16.4-8.25(1.01)+.59(10) = 13.9675
If price stays $1.00, and increase advertising $100, from $1,000 to $1100:
16.4-8.25(1.00)+.59(11) = 14.65
10. Regression Statistics Standard error of the estimate
R2 and Adjusted R2
11. R2 and Adjusted R2 Same formulas as Simple Regression
SSR/SST (this is an UNADJUSTED R2 )
Adjusted R2 from ANOVA = 1-MSR/(SST/n-1)
91% of the variance in gallons sold is explained by price per gallon and advertising.
12. Standard Error of the Estimate Measures the standard amount that the actual values (Y) differ from the estimated values .
No change in formula, except, in this example, k=3.
Can still use square root of MSE
13. Evaluate the Independent Variables Ho: The regression coefficient is not significantly different from zero
HA: The regression coefficient is significantly different from zero
Use the t-stat and the --value to evaluate EACH independent variable. If an independent variable is NOT significant, we remove it from the model and re-run!
14. Evaluate the Model Ho: The model is NOT valid and there is NOT a statistical relationship between the dependent and independent variables
HA: The model is valid. There is a statistical relationship between the dependent and independent variables.
If F from the ANOVA is greater than the F from the F-table, reject Ho: The model is valid. We can look at the P-values. If the p-value is less than our set a level, we can REJECT Ho.
15. Forecast and Prediction Interval Same as simple regression - however, many times we will not have the correction factor (formula under the square root). It is acceptable to use the Standard error of the estimate provided in the computer output.
16. Examining the Errors Heteroscedasticity exists when the residuals do not have a constant variance across an entire range of values.
Run an autocorrelation on the error terms to determine if the errors are random. If the errors are not random, the model needs to be re-evaluated. More on this in Chapter 9.
Evaluate with MAD, MAPE, MPE, MSE
17. Dummy Variables Used to determine the relationship between qualitative independent variables and a dependent variable.
Differences based on gender
Effect of training/no-training on performance
Seasonal data- quarters
We use 0 and 1 to indicate off or on. For example, code males as 1 and females as 0.
18. Dummy Variables The data indicates jobperformance rating basedon achievement test score and female (0) and males (1).
How do males and females differ in their job performance?
19. Dummy Variables The regression equation:
Job performance = -1.96 +.12 (test score) -2.18 (gender)
Holding gender constant, a one unit increase in test score increases job performance rating by 1.2 points.
Holding test score constant, males experience a 2.18 point lower performance rating than females. Or stated differently, females have a 2.18 higher job performance than males, holding test scores constant.
20. Dummy Variable Analysis Evaluate for multicollinearity
State and interpret the equation
Interpret Adjusted R2
Interpret Syx
Are the independent variables significant?
Is the model significant
Forecast and develop prediction interval
Examine the error terms
Calculate MAD, MSE, MAPE, MPE
21. Model Evaluation If the variables indicate multicollinearity, run the model, interpret, but then re-run the best model (I.e. throw out one of the highly correlated variables)
If one of the independent variables are NOT significant, (whether dummy variable or other) throw it out and re-run the model
If the overall model is not significant - back to the drawing board - need to gather better predictor variables maybe an elective course!
22. Stepwise Regression Sometimes, we will have a great number of variables - running a correlation matrix will help determine if any variables should NOT be in the model (low correlation with the dependent variable).
Can also run different types of regression, such as stepwise regression
23. Stepwise regression Adds one variable at a time - one step at a time. Based on explained variance (and highest correlation with the dependent variable). The independent variable that explains the most variance in the dependent variable is entered into the model first.
A partial f-test is determined to see if a new variable stays or is eliminated.
24. Start with the correlation Matrix
25. Stepwise Regression
26. Stepwise Regression The equation at Step1:
Sales = -100.85 + 6.97 (age)
The equation at Step2:
Sales = -86.79 + 5.93 (age) + .200 (test score)
No other variables are significant; the model stops.