1 / 19

Simple Linear Regression

Simple Linear Regression. Start by exploring the data. Construct a scatterplot Does a linear relationship between variables exist? Is the relationship strong? How much variation can be explained by a linear relationship with the independent or explanatory variable?. Beers and BAC.

Download Presentation

Simple Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simple Linear Regression

  2. Start by exploring the data • Construct a scatterplot • Does a linear relationship between variables exist? • Is the relationship strong? • How much variation can be explained by a linear relationship with the independent or explanatory variable?

  3. Beers and BAC

  4. Variance “Candy Bar” Explained Unexplained • The R-sq value: estimates the percentage of variation explained by a linear relationship with the independent or explanatory variable. Unless this estimate is 100% (or very near), it is not sufficient on its own. • The amounts of explained and unexplained information due to the model are measured by Sums of Squares

  5. Decomposition of information into explained and unexplained parts

  6. Residuals • A residual is the difference between an observed value of the dependent variable and the value predicted by the regression line. • Residual = (observed y) - (predicted y)= y – ŷ They help us assess the fit of a regression line.

  7. Variance “Candy Bar” Explained Unexplained SS explained by model SS Total SS Error Systematic SS + Random SS = Total SS

  8. Model Assumptions about the residuals (ε) • The distribution is NORMAL • The mean is ZERO • The variance is CONSTANT for all values of x (σ2) • Errors associated with any two observations are independent

  9. Assessing the utility of the model: model variance • Variance is variability of the random error (σ2) • The higher the variability of the random error, the greater the error of prediction • σ2 is estimated with s2 (often called the mean square for error, MSE) • Variance: s2= SSE/degrees of freedom (n-2) • Standard error: • This is like standard deviation; with standard error, we are looking at deviation from the line • Approximately 95% of observed y values will lie within 2s of their respective predicted values

  10. Assessing the utility of the model: Slope • Does y change as x changes? Does x contribute information for the prediction of y? Test this with the t-statistic or p-value (p<.05); these values are included in software output

  11. Assessing the utility of the model: Correlation Coefficient r • Measure of the strength and direction of the linear relationship between x and y • Always between -1 and +1 • High correlation does not imply causality

  12. Assessing the utility of the model: Coefficient of Determination (r2) • The R squared value is the % of the variation in y explained by the model. • For linear regression, the higher the value, the better the model.

  13. Using the model for estimation and prediction: Confidence interval for mean response • For any specific value of x: • A confidence interval for adds to this estimate a margin of error based on the standard error . • Confidence intervals widen as the value of x is further from its mean.

  14. Confidence interval for mean response

  15. Prediction interval for a future observation • Similar to confidence interval for mean response • Standard error used in prediction interval includes • Variability due to the fact that the least-squares line is not exactly equal to the true regression line • Variability of the future response variable y around the subpopulation mean.

  16. Prediction interval for a future observation

  17. In the MINITAB regression window, you might want to… • Set confidence levels in Options • Enter a value for prediction in Options • Store Residuals and Fits in Storage • Display full table of fits and residuals in Results (select last bullet)

  18. Beware of Extrapolation • Extrapolation is the use of a regression line for prediction far outside the range of values of the independent variable x that you used to obtain the line. Such predictions are not accurate.

  19. Example from book: p. 138 • How can we tell if it is reasonable to fit a linear regression model? • Let’s run the analysis and interpret the results

More Related