1 / 29

Lecture 4: Non-Linear Patterns

Lecture 4: Non-Linear Patterns. January 22, 2014. Question. In my opinion the first Quiz was: Very Easy Somewhat Easy Neither easy nor hard Somewhat Hard Very Hard. Administrative. Problem Set 2 due Monday. Quiz 2 next Wednesday Exam 1 two weeks from this coming Monday Questions?.

devaki
Download Presentation

Lecture 4: Non-Linear Patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 4:Non-Linear Patterns January 22, 2014

  2. Question In my opinion the first Quiz was: • Very Easy • Somewhat Easy • Neither easy nor hard • Somewhat Hard • Very Hard

  3. Administrative • Problem Set 2 due Monday. • Quiz 2 next Wednesday • Exam 1 two weeks from this coming Monday • Questions?

  4. Last time • Regression “by hand” • Interpreting the slope and intercept. • Properties of Residuals

  5. Properties of Residuals Residual Plots: • If the least squares line captures the association between x and y, then a plot of residuals versus x should stretch out horizontally with consistent vertical scatter. • Can use a visual test for association to check for the absence of a pattern. • Don’t look too long: if you look long enough, you’ll see a pattern. You want to check if there is an obvious and immediate one. • Is there a pattern? • Subtle; increasing as x increases

  6. Properties of Residuals Standard Deviation of the Residuals (se) • Measures how much the residuals vary around the fitted line. • Also known as standard error of the regression or the root mean squared error (RMSE). • For the diamond example, se = $170.21. • Since the residuals are approximately normal, the empirical rule implies that about 95% of the prices are within $340 of the regression.

  7. Explaining Variation R-squared (r2) • Is the square of the correlation between x and y • 0 ≤ r2 ≤ 1 • Is the fraction of the variation accounted for by the least squares regression line. • Higher is obviously better • For the diamond example, r2 = 0.4297 (i.e., the fitted line explains 42.97% of the variation in price). • But I see r-squared and “adjusted r-squared” reported. What’s the difference? • We’ll get there… Always report both r2 and seso others can judge how well the regression equation describes the data.

  8. Conditions for Simple Regression • Linear: Look at a scatterplot. Does pattern resemble a straight line? • Random residual variation: Look at the residual plot to make sure no pattern exists. • No obvious lurking variable: need to think about whether other explanatory variables might better explain the linear association between x and y. • Pay attention to the substantive context of the model • Be very very cautious of making predictions outside the range of observed conditions. • Look at the plots; look at the data!

  9. Example 2: Gas Consumption Data: gas_consumption.csv • Use a simple regression model to Predict gas consumption – Gas (CCF) – by Average Temp • Are the conditions for simple regression met? • Yes • No • What is simple regression? • I have no idea what language you’re speaking

  10. Example 2: Gas Consumption Data: gas_consumption.csv • Use a simple regression model to Predict gas consumption – Gas (CCF) – by Average Temp • Using a simple regression model, what is your estimate of the intercept? • 338.76 • -4.33 • 287.46 • 12.25 • None of the above.

  11. Non-linear Patterns • When is a linear model appropriate? • Ask yourself: will changes in the explanatory variable result in equal sized changes in the estimated response, regardless of the value of x? • For example: Does trimming 200 pounds from a large SUV have the same effect on mileage as trimming 200 pounds from a small compact? • Many times the data we’re interested in estimating are not linearly related. Do we give up? No.

  12. Estimating the Model • Data: 20_cars.csv • Cars data: MPG by Weight (1000’s of lbs) • The fitted line: Estimated MPG City = 43.3– 5.17 Weight • r2 = 0.702 and se = 2.95 • The equation estimates that mileage would increase by how much, on average, by reducing car weight by 200 lbs: • 1.0 MPG • 4.5 MPG • 2.9 MPG • I have no idea

  13. Estimating the Model Cars data: MPG by Weight (1000’s of lbs) • It’s very easy to estimate an OLS regression model, but often a simple linear model isn’t appropriate. • Some times we can detect non-linearity with scatterplots. • In practice, it’s often hard to determine; especially if we start to consider outliers

  14. Look at Plots of the Residuals • Nonlinear patterns are often easier to spot when looking at the residuals (residuals by x-values):

  15. What to do? Transformations. • Create a new variable in the data set by applying a function to each observation • Two nonlinear transformations useful in many business applications: reciprocaland logarithms • Transformations allow the use of linear regression analysis to describe a curved pattern (sometimes) • How to decide? • Use theory for insights. Often thinking about the data will tell you what you should do. • Try different ones. Iterate. • Among the possible choices, select the one that captures the curvature of the data and produces an interpretable equation

  16. Choosing a Transformation • There are several suggested ones, depending on the curvature of your data: (but don’t forget to use context of the problem) • What was the shape of our MPG and Weight data?

  17. Reciprocal Transformation • Reciprocal transformation is useful when dealing with variables that are already in the form of a ratio, such as miles per gallon • In the context of our car data, a reciprocal transformation makes sense: • Instead of miles per gallon, use gallons per mile. • But there aren’t many cars that burn more than one gallon per mile. So… • Transform the response variable (MPG  GPM)and multiply by 100. The resulting response is number of gallons it takes to go 100 miles

  18. Reciprocal Transformation • Estimating the model with a transformed dependent variable: • EstimatedGallons/100 Miles = -0.112 + 1.204 Weight • r2= 0.713 se= 0.667

  19. Residual Plot: • Outliers clear(er) – sports cars.

  20. Comparing Models • Original linear model: • Estimated MPG City = 43.3– 5.17 Weight • r2 = 0.703 and se = 2.95 • Model with transformed dependent variable: • Estimated Gallons/100 Miles = -0.112 + 1.204 Weight • r2 = 0.7132 se = 0.667 • Can be tempting to say that model 1 is about the same as 2 because it has a similar r2 (70.3.% of variation explained vs 71.3%). • Not a valid comparison. Don’t compare r2 between models that have different data! (i.e., observations or response variables)

  21. Reciprocal Transformation • Visually, what we did:

  22. Reciprocal Transformation • But what if we really wanted to predict MPG? • We do not stop with just fitting the linear regression model. • We can transform back to MPG • Estimated Gallons/100 Miles = -0.112 + 1.204 Weight • Given the above model, what is the MPG of a car that weighs 3,000lbs? • 28.7 • 27.76 • 3.5 • 35

  23. Next time • More on transformations • I.e.,: log transformations (very useful and common) • Exam 1 two weeks from this coming Monday!

  24. Comparing Models • Original linear model: • Estimated MPG City = 35.6 – 4.52 Weight • r2 = .57 and se = 2.9 • Model with transformed dependent variable: • Estimated Gallons/100 Miles = 1.111 + 1.21 Weight • r2 = 0.412 se = 1.04 • A Hummer H2 (weight = 6400 lbs) is predicted to get 6.7 MPG from model 1. What is the MPG it’s predicted to get in model 2? • 8.8 • 11.3 • 7.7 • Not possible.

  25. Substantive Comparison • The reciprocal equation treats weights differently than the linear equation • In the reciprocal eq, differences in weight matter less as cars get heavier • Diminishing effect of changes in weight makes more sense than a constant decrease • Substantive knowledge / theory important! • Knowledge of market forces (economics) very important

  26. Log Transformations • Another very useful transformation: logarithms • Useful for distributions with positive skew (long right tail) • Useful when the association between variables is more meaningful on a percentage scale. • Price Elasticity of Demand • Percentage change in quantity demanded given a 1% change in price • Key to figuring out the optimal price to charge.

  27. Log Transformations Estimated Sales Volume = 190,480 – 125,190*Price r2 = 0.83

  28. Log Transformations • Residual Plot: systematic patterns easier to spot

  29. Log Transformations • If we take the log of both independent and dependent variables (also called a log-log regression): Estimated log(Sales Volume) = 11.05 – 2.442 * log(Price) r2 = 0.955 Fitted line Residual Plot

More Related