1 / 14

Transformations

Review. Recall polynomial regression was a means for handling curvature in a regression.Polynomial regression expands the model Yi =

keelty
Download Presentation

Transformations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Transformations STA 671 410,411 First Summer 2006

    2. Review Recall polynomial regression was a means for handling curvature in a regression. Polynomial regression expands the model Yi = ß0 + ß1 Xi + ei with a quadratic term Yi = ß0 + ß1 Xi + ß2 Xi2 + ei or perhaps a cubic term Yi = ß0 + ß1 Xi + ß2 Xi2 + ß3 Xi3 + ei. Recall you fit the lowest order model that fits the data.

    3. Note higher order models ALWAYS fit better Note that Yi = ß0 + ß1 Xi + ei (the linear model) is a special case of Yi = ß0 + ß1 Xi + ß2 Xi2 + ei where ß2=0. When you fit the quadratic, you have the option of picking a BETTER value of ß2 than 0. So the fit for the quadratic must be at least as good. Similarly, the fit for the cubic must be at least as good as the quadratic.

    4. Higher order models fit better – but how much better? Note the fact that a quadratic fits better than a linear model, or that a cubic fits better than a quadratic, is a mathematical fact. It holds no matter what the TRUE model is. Thus, knowing the cubic model fits better than the quadratic model does NOT by itself mean the true model is more likely to be a cubic model. The p-value indicates whether the improvement in fit is LARGE ENOUGH to justify increasing the order of the model.

    5. Polynomial regression may not be enough. Polynomial regression is not a panacea, some data are not fit well by polynomial regression. Recall in our lab we had the Engine exhaust data. NOx is a measure of the exhaust from the engine, while E is a measure of the fuel/air mixture (high values are almost all fuel, low values are almost all air) A cubic model does not fit the data. A quadratic of linear model would do worse.

    6. Scatterplot with cubic model and residuals

    7. ???? Remember how a p-value is computed. You need the sampling distribution of the statistic of interest. This sampling distribution is made under certain assumptions. If those assumptions are not satisfied, the p-value is meaningless. Thus, your decision to accept or reject H0 : ß1 = 0, for example, is hinged on knowing the real distribution of betahat1.

    8. Transformations Instead of fitting a regression involving Y and X, we can transform either variable to produce a large number of models. Instead of Yi = ß0 + ß1 Xi + ei, you can fit log(Yi) = ß0 + ß1 Xi + ei, or sqrt(Yi) = ß0 + ß1 Xi + ei, or Yi = ß0 + ß1 log(Xi) + ei, or log(Yi) = ß0 + ß1 [sqrt(Xi)] + ß2 [sqrt(Xi)2] + ei Thus, you greatly expand the possible models you can fit, and thus can fit more kinds of data.

    9. Transformations allow different errors structures. A quadratic regression looks like Yi = ß0 + ß1 Xi + ß2 Xi2 + ei. At any particular X, the variance is the same. Taking the square root transformation sqrt(Yi) = ß0 + ß1 Xi + ei means that Yi = [ß0 + ß1 Xi + ei]2 = ß02 + ß12 Xi2 + ei2 + 2 ß0 ß1 Xi + 2 ß0 ei + 2 ß1 Xi ei. There is a quadratic relationship between X and Y. Note the multiplication between Xi and ei, this allows the variance to change for each Xi. Thus, in addition to handling curvature, transformations allow you to address changing variance.

    10. Prototypical Data requiring transformation

    11. Which transformation? There’s no hard and fast rules on which transformation to try, no guaranteed method for finding a good transformation (in some data, you seem to never find a great fit). Usually you have to perform trial and error. You can combine three things – transforming Y, transforming X, and performing polynomial regression on the results.

    12. After square root transformation

    13. Some “typical” transformations If you have area data, a square root transform is often useful (converts area to something proportional to the radius or length). Similarly with volume, a cube root transformation may be appropriate. With financial data (incomes, etc.), a log transform may be appropriate. Logs change percentage increases to constant increases, thus if a unit increase in X results in a 10% increase in Y, it also results in a 0.0953 increase in Y.

    14. A general strategy Fit the raw data (X and Y) with a least squares line. See if you get a good residual plot. If so, stop and be happy ? If not, try a polynomial regression (quadratic or cubic). If one of these fits, stop and be happy (remember, fit the smallest model possible). If a polynomial regression does not work, try transforming Y to log, sqrt, and cube root (i.e. perform three more regressions) and see if those work. If not of those work, then regression might not be effective (there are more advanced techniques) or you may have to start transforming X as well. This becomes true trial and error.

More Related