1 / 22

REGRESSION ANALYSIS

REGRESSION ANALYSIS. YOU NEED TO KNOW WHAT THIS MEANS. In the OUTCOME that you will commence first week back, you might be given data and asked to perform a REGRESSION ANALYSIS. REGRESSION ANALYSIS is the process of fitting a linear model to a data set.

keanu
Download Presentation

REGRESSION ANALYSIS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. REGRESSION ANALYSIS YOU NEED TO KNOW WHAT THIS MEANS In the OUTCOME that you will commence first week back, you might be given data and asked to perform a REGRESSION ANALYSIS

  2. REGRESSION ANALYSIS is the process of fitting a linear model to a data set. The aim is to determine the best linear model possible and to use it to make predictions.

  3. What do we mean by the “best possible linear model”? a. The data is linear or has been linearized: The best possible linear model is the one in which:

  4. andwe also wantb. the linear model which has the greatest possible value of r2 REMEMBER: the value of the coefficient of determination measures the predictive power of our regression model. PREDICTIVE POWER R2

  5. STEP 1: Construct a scatterplot of the RAW (Original ) Data and note: Its shape The value of the coefficient of determination FIRST: We must decide Which is the INDEPENDENT (x) VARIABLE: Which is the DEPENDENT (y ) VARIABLE We are predicting LIFE EXPECTANCY from GDP, so: X GDP Y LIFE EXPECTANCY

  6. List A = gdpList B = le Life expectancy CONCLUSION: Data is NON-LINEAR

  7. From the Home screen determine the value of r2. Value of r2 = 0.3665.

  8. STEP 2: We seek a Transformation to linearize the data. CHECK THE CIRCLE OF TRANSFORMATIONS!! Our scatterplot most closely resembles Quadrant 2! POTENTIALLY SUITABLE TRANSFORMATIONS are: Y2 Logx 1 x Quadrant 2 Quadrant 1 Quadrant 3 Quadrant 4

  9. Step 3 Try each of these transformations to determine which one effectively linearizes the data and gives the highest value for r2. In each case, obtain a RESIDUAL PLOT to confirm that the transformed data is linear.

  10. List A gdp ( x variable) Y SQUARED TRANSFORMATION List B le (y variable) List C lesqu (y2 transformed variable ) R2 = 38.3% TRANSFORMED DATA APPEARS NON-LINEAR STILL

  11. Establish the value of r2 in HOMESCREEN:

  12. CONFIRM WITH RESIDUAL PLOT Remember: to get the correct residual plot use the split screen view. Make sure that the scatterplot at the top has the correct transformed variable. CONCLUSION: The residual plot shows a definite curved pattern, indicating that the transformed data is still not linear. The y2 transformation has NOT succeeded in producing an effective linear model.

  13. NEXT STEP…. You guessed it!! Now we try the next potential candidate transformation. It was the log x transformation!

  14. (Delete the y2 column, as we have discarded this transformation.) List A= gdpList B= le List C= loggdp R2 = 66.0% CONCLUSION: It appears that the log(GDP) transformation has successfully linearized the data! Scatterplot appears linear, and R2 has increased.

  15. NOTE THE VARIABLES ARE LISTED HERE SO YOU CAN CHECK

  16. Now confirm this by creating a RESIDUAL PLOT for the log(x) transformation CONCLUSION: The residual plot shows a random scattering of points with no pattern, indicating that the transformed data is linear. The value of r2has now increased to 66.0%. The logx transformation has succeeded in producing an effective linear model for the data with significant predictive power.

  17. And now…… Yes you guessed it! We need to check out the reciprocal x transformation, because ….. maybe it will give a higher coefficient of determination than the logx! (here we go again)

  18. Don’t delete log x column because we think this model was effective! List A = gdpList B = le List C = loggdpList D = recgdp Life expectancy R2 = 51.5% 1/GNP CONCLUSION: The transformed data appears to be linear, but the value of the coefficient of determination is 51.5%, lower than for the loggdptransformation.

  19. Coefficient of determination

  20. CONCLUSION: The residual plot shows a random scattering of points with no pattern, indicating that the 1/x transformation has made the data linear.

  21. OVERALL CONCLUSIONS We have tested three transformations: Y squared transformation: Ineffective (did not linearize the data) Log (x ) transformation: Effective in linearizing data with r2 = 66.0% 1/x transformation: Effective in linearizing data with r2 = 51.5% Based on this regression analysis, we conclude that the log(GDP) transformation provides the best model for making predictions from this data.

  22. MAKING A PREDICTION Use your linear regression model to predict the Life Expectancy in a country where the GNP is $8000 Find the equation of the LEAST SQUARES REGRESSION line for the Log transformation Regression(a+bx) Xlist = log(GNP) Ylist=le a = 14.3 b = 14.5 Life Expectancy = 14.3 + 14.5  log(GNP) Life Expectancy = 14.3 + 14.5 × log(8000) = 70.9 Predicted Life Expectancy = 70.9 years

More Related