1 / 102

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression. PSY440 June 10, 2008. A few points of clarification. For the chi-squared test, the results are unreliable if the expected frequency in too many of your cells is too low.

annawright
Download Presentation

Correlation and Simple Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation and Simple Linear Regression PSY440 June 10, 2008

  2. A few points of clarification • For the chi-squared test, the results are unreliable if the expected frequency in too many of your cells is too low. • A rule of thumb is that the minimum expected frequency should be 5 (i.e., no cells with expected counts less than 5). A more conservative rule recommended by some is a minimum expected frequency of 10. If your minimum is too low, you need a larger sample! The more categories you have the larger your sample must be. • SPSS will warn you if you have any cells with expected frequency less than 5.

  3. Regarding threats to internal validity • One of the strengths of well-designed single-subject research is the use of repeated observations during each phase. • Repeated observations during baseline and intervention (during an AB study, e.g.) helps rule out testing, instrumentation (somewhat) and regression. These effects would be unlikely to result in a marked change between experimental phases that is not apparent during repeated observations before and after the phase change.

  4. Regarding histograms The difference between a histogram and a bar graph is that the variable on the x axis (which represents the score on the variable being graphed, as opposed to the frequency of observations) is conceptualized as being continuous in a histogram, whereas a bar graph represents discrete categories along the x axis.

  5. About the exam…. Exam on Thursday will cover material from the first three weeks of class (lectures 1-6, or everything through Chi-Squared tests). Emphasis of exam will be on generating results with computers (calculations by hand will not be emphasized), and interpreting the results. Exam questions will be based mainly on lecture material and modeled on previous active learning experiences (homework and in-class demonstrations and exercises). Knowledge of material on qualitative methods and experimental & single-subject design is expected.

  6. Before we move on….. Any questions?

  7. Today’s lecture and next homework Today’s lecture will cover correlation and simple (bivariate) regression. Homework based on today’s lecture will be distributed on Thursday and due on Tuesday (June 17).

  8. Correlation • A correlation is the association between scores on two variables • age and coordination skills in children, as kids get older their motor coordination tends to improve • price and quality, generally the more expensive something is the higher in quality it is

  9. Correlation and Causality Correlational research • Correlation as a statistical procedure is generally used to measure the association between two (or more) continuous variables • Correlation as a kind of research design refers to observational studiesin which there is no experimental manipulation.

  10. Correlation and Causality Correlational research • Not all “correlational” (i.e., observational) research designs use correlation as the statistical procedure for analyzing the data (example: comparison of verbal abilities between boys and girls - observational study - don’t manipulate gender - but probably analyze mean differences with t-tests). • But: Virtually of the inferential statistical methods (including t-tests, anova, ancova) covered in 440 can be represented in terms of correlational/regression models (general linear model - we’ll talk more about this later). • Bottom line: Don’t confuse design with analytic strategy.

  11. One might argue that turbulence cause coffee spills One might argue that spilling coffee causes turbulence Correlation and Causality • Correlations (like other linear statistical models) describe relationships between variables, but DO NOT explain why the variables are related Suppose that Dr. Steward finds that rates of spilled coffee and severity of plane turbulence are strongly positively correlated.

  12. One might argue that bigger your head, the larger your digit span 1 24 37 21 15 One might argue that head size and digit span both increase with age (but head size and digit span aren’t directly related) AGE Correlation and Causation Suppose that Dr. Cranium finds a positive correlation between head size and digit span (roughly the number of digits you can remember).

  13. One might argue that bigger your head, the larger your digit span 1 24 37 21 15 One might argue that head size and digit span both increase with age (but head size and digit span aren’t directly related) AGE Correlation and Causation Observational research and correlational statistical methods (including regression and path analysis) can be used to compare competing models of causation, to see which model fits the data best.

  14. Relationships between variables • Properties of a statistical correlation • Form (linear or non-linear) • Direction (positive or negative) • Strength (none, weak, strong, perfect) • To examine this relationship you should: • Make a scatterplot - a picture of the relationship • Compute the Correlation Coefficient - a numerical description of the relationship

  15. Graphing Correlations • Steps for making a scatterplot (scatter diagram) • Draw axes and assign variables to them • Determine range of values for each variable and mark on axes • Mark a dot for each person’s pair of scores

  16. Y 6 5 4 3 2 1 X 1 2 3 4 5 6 Scatterplot • Plots one variable against the other • Each point corresponds to a different individual A 6 6

  17. Scatterplot • Plots one variable against the other • Each point corresponds to a different individual Y 6 A 6 6 5 B 1 2 4 3 2 1 X 1 2 3 4 5 6

  18. Scatterplot • Plots one variable against the other • Each point corresponds to a different individual Y 6 A 6 6 5 B 1 2 4 C 5 6 3 2 1 X 1 2 3 4 5 6

  19. Scatterplot • Plots one variable against the other • Each point corresponds to a different individual Y 6 A 6 6 5 B 1 2 4 C 5 6 3 D 3 4 2 1 X 1 2 3 4 5 6

  20. Scatterplot • Plots one variable against the other • Each point corresponds to a different individual Y 6 A 6 6 5 B 1 2 4 C 5 6 3 D 3 4 2 1 E 3 2 X 1 2 3 4 5 6

  21. Scatterplot • Plots one variable against the other • Each point corresponds to a different individual • Imagine a line through the data points Y 6 A 6 6 5 B 1 2 4 C 5 6 3 • Useful for “seeing” the relationship • Form, Direction, and Strength D 3 4 2 1 E 3 2 X 1 2 3 4 5 6

  22. Scatterplots with Excel and SPSS In SPSS, charts menu=>legacy dialogues=>scatter/dot=>simple scatter Click on define, and select which variable you want on the x axis and which on the y axis. In Excel, insert menu=>chart=>xyscatter Specify if variables are arranged in rows or columns and select the cells with the relevant data.

  23. Linear Non-linear Form

  24. Y Y X X Positive Negative Direction • X & Y vary in the same direction • As X goes up, Y goes up • positive Pearson’s r • X & Y vary in opposite directions • As X goes up, Y goes down • negative Pearson’s r

  25. Strength • The strength of the relationship • Spread around the line (note the axis scales) • Correlation coefficient will range from -1 to +1 • Zero means “no relationship”. • The farther the r is from zero, the stronger the relationship • In general when we talk about correlation coefficients: Correlation coefficient = Pearson’s product moment coefficient = Pearson’s r = r.

  26. r = 1.0 “perfect positive corr.” r2 = 100% r = -1.0 “perfect negative corr.” r2 = 100% r = 0.0 “no relationship” r2 = 0.0 -1.0 0.0 +1.0 The farther from zero, the stronger the relationship Strength

  27. The Correlation Coefficient • Formulas for the correlation coefficient: Conceptual Formula Common Alternative

  28. The Correlation Coefficient • Formulas for the correlation coefficient: Conceptual Formula Common alternative

  29. X Y 6 6 1 2 5 6 3 4 3 2 Computing Pearson’s r (using SP) • Step 1: SP (Sum of the Products) 3.6 4.0 mean

  30. = 1 - 3.6 -2.6 = 5 - 3.6 1.4 = 3 - 3.6 -0.6 -0.6 = 3 - 3.6 Quick check Computing Pearson’s r (using SP) • Step 1: SP (Sum of the Products) X Y = 6 - 3.6 6 6 2.4 1 2 5 6 3 4 3 2 3.6 4.0 0.0 mean

  31. 2.0 = 6 - 4.0 = 2 - 4.0 -2.0 2.0 = 6 - 4.0 = 4 - 4.0 0.0 = 2 - 4.0 -2.0 Quick check Computing Pearson’s r (using SP) • Step 1: SP (Sum of the Products) X Y 6 6 2.4 -2.6 1 2 5 6 1.4 3 4 -0.6 3 2 -0.6 3.6 4.0 0.0 0.0 mean

  32. 4.8 = = = = = * * * * * 5.2 2.8 0.0 1.2 Computing Pearson’s r (using SP) • Step 1: SP (Sum of the Products) XY 6 6 2.4 2.0 -2.6 -2.0 1 2 5 6 1.4 2.0 3 4 -0.6 0.0 3 2 -0.6 -2.0 3.6 4.0 0.0 0.0 14.0 SP mean

  33. Computing Pearson’s r (using SP) • Step 2: SSX & SSY

  34. 2 2 2 2 2 = = = = = 6.76 1.96 0.36 0.36 SSX Computing Pearson’s r (using SP) • Step 2: SSX & SSY XY 6 6 2.4 2.0 4.8 5.76 -2.6 -2.0 5.2 1 2 5 6 1.4 2.0 2.8 3 4 -0.6 0.0 0.0 3 2 -0.6 -2.0 1.2 3.6 4.0 0.0 15.20 0.0 14.0 mean

  35. 2 2 2 2 2 = = = = = 4.0 4.0 4.0 0.0 4.0 SSY Computing Pearson’s r (using SP) • Step 2: SSX & SSY XY 6 6 2.4 2.0 4.8 5.76 -2.6 6.76 -2.0 5.2 1 2 5 6 1.4 1.96 2.0 2.8 3 4 -0.6 0.36 0.0 0.0 3 2 -0.6 0.36 -2.0 1.2 3.6 4.0 0.0 15.20 0.0 16.0 14.0 mean

  36. Computing Pearson’s r (using SP) • Step 3: compute r

  37. SSY SSX Computing Pearson’s r (using SP) • Step 3: compute r XY 6 6 2.4 2.0 4.8 4.0 5.76 -2.6 6.76 -2.0 4.0 5.2 1 2 5 6 1.4 1.96 2.0 4.0 2.8 3 4 -0.6 0.36 0.0 0.0 0.0 3 2 -0.6 0.36 -2.0 4.0 1.2 3.6 4.0 0.0 15.20 0.0 16.0 14.0 SP mean

  38. SSY SSX Computing Pearson’s r • Step 3: compute r 15.20 16.0 14.0 SP

  39. SSY SSX Computing Pearson’s r • Step 3: compute r 15.20 16.0

  40. SSX Computing Pearson’s r • Step 3: compute r 15.20

  41. Computing Pearson’s r • Step 3: compute r

  42. Y 6 5 4 3 2 1 X 1 2 3 4 5 6 Computing Pearson’s r • Step 3: compute r • Appears linear • Positive relationship • Fairly strong relationship • .89 is far from 0, near +1

  43. The Correlation Coefficient • Formulas for the correlation coefficient: Conceptual Formula Common alternative

  44. X Y 6 6 1 2 5 6 3 4 3 2 Computing Pearson’s r(using z-scores) • Step 1: compute standard deviation for X and Y (note: keep track of sample or population) • For this example we will assume the data is from a population

  45. 2.4 -2.6 6.76 1.4 1.96 -0.6 0.36 -0.6 0.36 3.6 0.0 15.20 Mean SSX Computing Pearson’s r (using z-scores) • Step 1: compute standard deviation for X and Y (note: keep track of sample or population) • For this example we will assume the data is from a population X Y 6 6 5.76 1 2 5 6 3 4 3 2 1.74 Std dev

  46. 2.0 -2.0 4.0 2.0 4.0 0.0 0.0 -2.0 4.0 0.0 16.0 SSY Computing Pearson’s r (using z-scores) • Step 1: compute standard deviation for X and Y(note: keep track of sample or population) • For this example we will assume the data is from a population X Y 6 6 2.4 4.0 5.76 -2.6 6.76 1 2 5 6 1.4 1.96 3 4 -0.6 0.36 3 2 -0.6 0.36 3.6 4.0 15.20 Mean 1.74 1.79 Std dev

  47. Computing Pearson’s r (using z-scores) • Step 2: compute z-scores X Y 6 6 2.4 2.0 4.0 1.38 5.76 -2.6 6.76 -2.0 4.0 1 2 5 6 1.4 1.96 2.0 4.0 3 4 -0.6 0.36 0.0 0.0 3 2 -0.6 0.36 -2.0 4.0 3.6 4.0 15.20 16.0 Mean 1.74 1.79 Std dev

  48. Quick check Computing Pearson’s r (using z-scores) • Step 2: compute z-scores X Y 6 6 2.4 2.0 4.0 1.38 5.76 -2.6 6.76 -2.0 4.0 -1.49 1 2 5 6 1.4 1.96 2.0 4.0 0.8 3 4 -0.6 0.36 0.0 0.0 - 0.34 3 2 -0.6 0.36 -2.0 4.0 - 0.34 3.6 4.0 15.20 16.0 0.0 Mean 1.74 1.79 Std dev

  49. Computing Pearson’s r (using z-scores) • Step 2: compute z-scores X Y 6 6 2.4 2.0 4.0 1.38 1.1 5.76 -2.6 6.76 -2.0 4.0 -1.49 1 2 5 6 1.4 1.96 2.0 4.0 0.8 3 4 -0.6 0.36 0.0 0.0 - 0.34 3 2 -0.6 0.36 -2.0 4.0 - 0.34 3.6 4.0 15.20 16.0 Mean 1.74 1.79 Std dev

  50. Quick check Computing Pearson’s r (using z-scores) • Step 2: compute z-scores X Y 6 6 2.4 2.0 4.0 1.38 1.1 5.76 -2.6 6.76 -2.0 4.0 -1.49 -1.1 1 2 5 6 1.4 1.96 2.0 4.0 0.8 1.1 3 4 -0.6 0.36 0.0 0.0 - 0.34 0.0 3 2 -0.6 0.36 -2.0 4.0 - 0.34 -1.1 3.6 4.0 15.20 16.0 0.0 Mean 1.74 1.79 Std dev

More Related