1 / 24

Standardization of variables

Standardization of variables. Maarten Buis 5-12-2005. Recap. Central tendency Dispersion SPSS. Standardization. Is used to improve interpretability of variables. Some variables have a natural interpretable metric: e.g. income, age, gender, country.

oistin
Download Presentation

Standardization of variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Standardization of variables Maarten Buis 5-12-2005

  2. Recap • Central tendency • Dispersion • SPSS

  3. Standardization • Is used to improve interpretability of variables. • Some variables have a natural interpretable metric: e.g. income, age, gender, country. • Others, primarily ordinal variables, do not: e.g. education, attitude items, intelligence. • Standardizing these variables makes them more interpretable.

  4. Standardization • Transforming the variable to a comparable metric • known unit • known mean • known standard deviation • known range • Three ways of standardizing: • P-standardization (percentile scores) • Z-standardization (z-scores) • D-standardization (dichotomize a variable)

  5. When you should always standardize • When averaging multiple variables, e.g. when creating a socioeconomic status variable out of income and education. • When comparing the effects of variables with unequal units, e.g. does age or education have a larger effect on income?

  6. P-Standardization • Every observation is assigned a number between 0 and 100, indicating the percentage of observation beneath it. • Can be read from the cumulative distribution • In case of knots: assign midpoints • The median, quartiles, quintiles, and deciles are special cases of P-scores.

  7. P-standardization • Turns the variable into a ranking, i.e. it turns the variable into a ordinal variable. • It is a non-linear transformation: relative distances change • Results in a fixed mean, range, and standard deviation; M=50, SD=28.6, This can change slightly due to knots • A histogram of a P-standardized variable approximates a uniform distribution

  8. Linear transformation • Say you want income in thousands of guilders instead of guilders. • You divide INCMID by f1000,-

  9. Linear transformation • Say you want to know the deviation from the mean • Subtract the mean (f2543,-) from INCMID

  10. Recap: multiplication and addition and the number line

  11. Linear transformation • Adding a constant (X’ = X+c) • M(X’) = M(X)+c • SD(X’) = SD(X) • Multiply with a constant (X’ = X*c) • M(X’) = M(X)*c • SD(X’) = SD(X) * |c|

  12. Z-standardization • Z = (X-M)/SD • two steps: • center the variable (mean becomes zero) • divide by the standard deviation (the unit becomes standard deviation) • Results in fixed mean and standard deviation: M=0, SD=1 • Not in a fixed range! • Z-standardization is a linear transformation: relative distances remain intact.

  13. Z-standardization • Step 1: subtract the mean • c = -M(X) • M(X’) = M(X)+c • M(X’) = M(X)-M(X)=0 • SD(X’)=SD(X)

  14. Z-standardization • Step 2: divide by the standard deviation • c is 1/SD(X) • M(Z) = M(X’) * c • M(Z) = 0 * 1/SD(X) = 0 • SD(Z) = SD(X’) * c • SD(Z) = SD(X) * 1/SD(X) = 1

  15. Normal distribution • Normal distribution = Gauss curve = Bell curve • Formula (McCall p. 120) • Note the (x-m)2 part • apart from that all you have to remember is that the formula is complicated • Normal distribution occurs when a large number of small random events cause the outcome: e.g. measurement error

  16. Normal distribution • Other examples the height of individuals, intelligence, attitude • But: the variables Education, Income and age in Eenzaam98 are not normally distributed

  17. Z-scores and the normal distribution • Z-standardization will not result in a normally distributed variable • Standardization in NOT the same as normalization • We will not discuss normalization (but it does exist) • But: If the original distribution is normally distributed, than the z-standardized variable will have a standard normal distribution.

  18. Standard normal distribution • Normal distribution with M=0 and SD=1. • Table A in Appendix 2 of McCall • Important numbers (to be remembered): • 68% of the observations lie between ± 1 SD • 90% of the observations lie between ± 1.64 SD • 95% of the observations lie between ± 1.96 SD • 99% of the observations lie between ± 2.58 SD

  19. Why bother? • If you know: • That a variable is normally distributed • the mean and standard deviation • Than you know the percentage of observations above or below and observation • These numbers are a good approximation, even if the variable is not exactly normally distributed

  20. P & Z standardization • Both give a distribution with fixed mean, standard deviation, and unit • P-standardization also gives a fixed range • Both are relative to the sample: if you take observations out, than you have to re-compute the standardized variables

  21. P & Z-standardization • When interpreting Z-standardized variables one uses percentiles • With P-standardization one decreases the scale of measurement to ordinal, BUT this improves interpretability.

  22. Student recap

  23. Do before Wednesday • Read McCall chapter 5 • Understand Appendix 2, table A • make exercises 5.7-5.28

More Related