1 / 86

Chapter 12

Chapter 12. Multiple Regression and Model Building. Multiple Regression and Model Building. Part 1 Basic Multiple Regression Part 2 Using Squared and Interaction Terms Part 3 Dummy Variables and Advanced Statistical Inferences (Optional). Part 1 Basic Multiple Regression.

rafi
Download Presentation

Chapter 12

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 12 Multiple Regression and Model Building

  2. Multiple Regression and Model Building Part 1 Basic Multiple Regression Part 2 Using Squared and Interaction Terms Part 3 Dummy Variables and Advanced Statistical Inferences (Optional)

  3. Part 1 Basic Multiple Regression 12.1 The Multiple Regression Model 12.2 Model Assumptions and the Standard Error 12.3 The Least Squares Estimates and Point Estimation and Prediction 12.4 R2 and Adjusted R2 12.5 The Overall F Test 12.6 Testing the Significance of an Independent Variable 12.7 Confidence and Prediction Intervals

  4. Part 2 Using Squared and InteractionTerms 12.8 The Quadratic Regression Model (Optional) 12.9 Interaction (Optional)

  5. Part 3 Dummy Variables andAdvanced Statistical Inferences 12.10 Using Dummy Variables to Model Qualitative Independent Variables 12.11 The Partial F Test: Testing the Significance of a Portion of a Regression Model

  6. Part 1 Basic multiple regression

  7. The Multiple Regression Model • Simple linear regression uses one independent variable to explain the dependent variable • Some relationships are too complex to be described using a single independent variable • Multiple regression models use two or more independent variables to describe the dependent variable • This allows multiple regression models to handle more complex situations • There is no limit to the number of independent variables a model can use • Like simple regression, multiple regression has only one dependent variable

  8. The Multiple Regression Model • The linear regression model relating y to x1, x2,…, xk is y = my|x1,x2,…,xk + e = b0 + b1x1 + b2x2 + … + bkxk +e where • my|x1,x2,…,xk + e = b0 + b1x1 + b2x2 + … + bkxkis the mean value of the dependent variable y when the values of the independent variables are x1, x2,…, xk • β0, β1,β2, … βkare the regression parameters relating the mean value of y to x1, x2,…, xk • ɛ is an errorterm that describes the effects on y of all factors other than the independent variablesx1, x2,…, xk

  9. Example: Multiple Regression • Consider the following data table that relates two independent variables x1 and x2 to the dependent variable y (table 12.1)

  10. Plotting y versus x1

  11. Scatter Plot Analysis • The plot shows that y tends to decrease in a straight-line fashion as x1 increases • This suggests that if we wish to predict y on the basis of x1 only, the simple linear regression model y = β0 + β1x1 + ɛ relates y to x1

  12. Plotting y versus x2

  13. Scatter Plot Analysis • This plot shows that y tends to increase in a straight-line fashion as x2 increases • This suggests that if we wish to predict y on the basis of x2 only, the simple linear regression model y = β0 + β1x2 + ɛ

  14. Geometric Interpretation L01 • The experimental region is defined to be the range of the combinations of the observed values of x1 and x2

  15. Plane of Means L01 • The mean value of y when IV1 (independent variable one) is x1 and IV2 is x2 is μy|x1, x2(mu of y given x1 and x2 • Consider the equation μy|x1, x2 = β0 + β1x1 + β2x2, which relates mean y values to x1 and x2 • This is a linear equation with two variables, geometrically this equation is the equation of a plane in three-dimensional space

  16. Plane of Means L01

  17. Model Assumptions and The Standard Error L02 • We need to make certain assumptions about the error term ɛ • At any given combination of values of x1, x2, . . . , xk, there is a population of error term values that could occur

  18. Model Assumptions and The Standard Error L02 • The model isy = my|x1,x2,…,xk + e = b0 + b1x1 + b2x2 + … + bkxk +e • Assumptions for multiple regression are stated about the model error terms, ’s

  19. The Regression Model Assumptions L02 • Mean of Zero AssumptionThe mean of the error terms is equal to 0 • Constant Variance AssumptionThe variance of the error terms s2 is, the same for every combination values of x1, x2,…, xk • Normality AssumptionThe error terms follow a normal distribution for every combination values of x1, x2,…, xk • Independence AssumptionThe values of the error terms are statistically independent of each other

  20. Sum of Squared Errors

  21. Mean Square Error • This is the point estimate of the residual variance s2 • This formula is slightly different from simple regression

  22. Standard Error • This is the point estimate of the residual standard deviation s • MSE is from last slide • This formula too is slightly different from simple regression • n-(k+1) is the number of degrees of freedom associated with the SSE

  23. From Our Previous Data • Using Table 12.6 • Compute the SSE to be

  24. The Least Squares Estimatesand Point Estimation and Prediction L03 • Estimation/prediction equation • is the point estimate of the mean value of the dependent variable when the values of the independent variables are x1, x2,…, xk • It is also the point prediction of an individual value of the dependent variable when the values of the independent variables are x1, x2,…, xk • b0, b1, b2,…, bk are the least squares point estimates of the parameters b0, 1, 2,…, k • x01, x02,…, x0k are specified values of the independent predictor variables x1, x2,…, xk

  25. Calculating the Model • A formula exists for computing the least squares model for multiple regression • This formula is written using matrix algebra and is presented in Appendix F available on Connect • In practice, the model can be easily computed using Excel, MegaStat or many other computer packages

  26. Table 12.1 Excel Regression Analysis Output

  27. Residual Calculation Table 12.1

  28. R2and Adjusted R2

  29. What Does R2 Mean? L04 • The multiple coefficient of determination, R2, is the proportion of the total variation in the n observed values of the dependent variable that is explained by the multiple regression model

  30. Multiple Correlation Coefficient R • The multiple correlation coefficient R is just the square root of R2 • With simple linear regression, r would take on the sign of b1 • There are multiple bi’s in a multiple regression model • For this reason, R is always positive • To interpret the direction of the relationship between the x’s and y, you must look to the sign of the appropriate bi coefficient

  31. The Adjusted R2 • Adding an independent variable to multiple regression will always raise R2 • R2 will rise slightly even if the new variable has norelationship to y • The adjusted R2 corrects for this tendency in R2 • As a result, it gives a better estimate of the importance of the independent variables • The bar notation indicates adjusted R2

  32. Calculating R2 and Adjusted R2 • Excel Multiple Regression Output from Table 12.1 n Explained variation k Total variation

  33. The Overall F Test • Hypothesis • H0: b1= b2 = …= bk = 0 versus • Ha: At least one of b1, b2,…, bk≠ 0 • Test Statistic • Reject H0 in favor of Ha if: • F(model) > Fa* or • p-value < a *Fa is based on k numerator and n-(k+1) denominator degrees of freedom

  34. EXCEL ANOVA: Table 12.1 Data • Test Statistic • F-test at  = 0.05 level of significance • Fais based on 2 numerator and 5 denominator degrees of freedom • Reject H0 at =0.05 level of significance

  35. What Next? • The F test tells us that at least one independent variable is significant • The natural question is which one(s)? • That question will be addressed in the next section

  36. Testing the Significance of an Independent Variable • A variable in a multiple regression model is not likely to be useful unless there is a significant relationship between it and y • Significance Test Hypothesis • H0: bj = 0 versus • Ha: bj ≠ 0

  37. Testing Significance of anIndependent Variable • If the regression assumptions hold, we can reject H0: j = 0 at the  level of significance (probability of Type I error equal to ) if and only if the appropriate rejection point condition holds • Or, equivalently, if the corresponding p-value is less than 

  38. Rejection Rules

  39. Testing Significance of anIndependent Variable • Test Statistic • A 100(1-α)% confidence interval for βj is • t, t/2 and p-values are based on n – (k+1) degrees of freedom

  40. Testing Significance of anIndependent Variable • It is customary to test the significance of every independent variable in a regression model • If we can reject H0: bj = 0 at the 0.05 level of significance, then we have strong evidence that the independent variable xj is significantly related to y • If we can reject H0: bj = 0 at the 0.01 level of significance, we have very strong evidence that the independent variable xj is significantly related to y • The smaller the significance level a at which H0 can be rejected, the stronger is the evidence that xj is significantly related to y

  41. A Note on Significance Testing • Whether the independent variable xj is significantly related to y in a particular regression model is dependent on what other independent variables are included in the model • That is, changing independent variables can cause a significant variable to become insignificant or cause an insignificant variable to become significant • This issue is addressed in a later section on multicollinearity

  42. Example 12.4 The Sales Territory Performance Case • A sales manager evaluates the performance of sales representatives by using a multiple regression model that predicts sales performance on the basis of five independent variables • x1 = number of months the representative has been employed by the company • x2 = sales of the company’s product and competing products in the sales territory (market potential) • x3 = dollar advertising expenditure in the territory • x4 = weighted average of the company’s market share in the territory for the previous four years • x5 = change in the company’s market share in the territory over the previous four years • y = β0 + β 1x1 + β 2x2 + β 3x3 + β 4x4 + β 5x5 + ɛ

  43. Example 12.4 The Sales Territory Performance Case • Using MegaStat a regression model was computed using collected data • The p values associated with Time, MktPoten, Adver, and MktShare are all less than 0.01, we have very strong evidence that these variables are significantly related to y and, thus, are important in this model • The p value associated with Change is 0.0530, suggesting weaker evidence that this variable is important Sbj

  44. Confidence and PredictionIntervals L06 • The point on the regression line corresponding to a particular value of x01, x02,…, x0k, of the independent variables is • It is unlikely that this value will equal the mean value of y for these x values • Therefore, we need to place bounds on how far the predicted value might be from the actual value • We can do this by calculating a confidence interval for the mean value of y and a prediction interval for an individual value of y

  45. Distance Value L06 • Both the confidence interval for the mean value of y and the prediction interval for an individual value of y employ a quantity called the distance value • With simple regression, we were able to calculate the distance value fairly easily • However, for multiple regression, calculating the distance value requires matrix algebra • See Appendix F on Connect for more details

  46. A Confidence Interval for a MeanValue of y L06 • Assume that the regression assumptions hold • The formula for a 100(1-a) confidence interval for the mean value of y is as follows: • This is based on n-(k+1) degrees of freedom

  47. A Prediction Interval for an IndividualValue of y • Assume that the regression assumptions hold • The formula for a 100(1-a) prediction interval for an individual value of y is as follows: • This is based on n-(k+1) degrees of freedom

  48. Sales Territory Performance Case Data

  49. Confidence & Prediction Intervals • Using The Sales Territory Performance Case • The point prediction of the sales corresponding to; • TIME = 85.42 • MktPoten = 35182.73 • Adver = 7281.65 • Mothered = 9.64 • Change = 0.28 • Using the regression model from before; • ŷ = -1,113.7879 + 3.6121(85.42) + 0.0421(35,182.73) + 0.1289(7,281.65) + 256.9555(9.64) + 324.5334(0.28) = 4,181.74 (that is, 418,174 units) • This point prediction is given at the bottom of the MegaStat output in Figure 12.7, which we repeat here:

  50. MegaStat Output

More Related