1 / 76

Chapter 3 – Examining Relationships

Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1. Shows a relationship between two variables. Scatterplots:. Response Variables:. Variable on the y- axis. Response to a variable. Explanatory Variables:. Variable on the x- axis. Influences the response.

rhoda
Download Presentation

Chapter 3 – Examining Relationships

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3 – Examining Relationships

  2. Scatterplots and Correlation - 3.1

  3. Shows a relationship between two variables. Scatterplots: Response Variables: Variable on the y-axis. Response to a variable Explanatory Variables: Variable on the x-axis. Influences the response

  4. Looking at Scatterplots: Positive • Direction: as x increases, y increases Negative as x increases, y decreases • Form: Is there a linear relationship between the two variables? • Strength: Do the points follow a single stream that is tight to the line or is there considerable spread (or variability) around the line?

  5. Calculator Tip: Scatterplots L1: Explanatory Variable L2: Response Variable Use statplot to graph

  6. Example #1: Suppose you were to collect data for each pair of variables below. Which variable is the explanatory and which is the response? Determine the likely direction and strength of the relationship. • T-shirts at a store: Price of each, Number Sold response explanatory y 100 D: negative # sold S: strong 1 x $5 $50 Price of shirt

  7. Example #1: Suppose you were to collect data for each pair of variables below. Which variable is the explanatory and which is the response? Determine the likely direction and strength of the relationship. 2. Drivers: Reaction Time, Blood Alcohol Level response explanatory y 10 D: positive Time S: strong 1 x .01 .5 BAC

  8. Example #1: Suppose you were to collect data for each pair of variables below. Which variable is the explanatory and which is the response? Determine the likely direction and strength of the relationship. 3. Cars: Age of Owner, Weight of the Car Makes no sense!!!

  9. Example #2: In a study of whether a relationship exists between a child's aptitude and the age at which he/she first speaks, researchers recorded the age (in months) of a child's first speech and the child's score on an aptitude test. These data for these 21 children follow: Make a scatterplot and describe the relationship in the context of the problem.

  10. D: positive moderate S: F: curved

  11. Correlation: “r” Measures the direction and strength of the linear relationship Must be quantitative

  12. Attributes of the Correlation • The correlation coefficient is a unit-less measurement, denoted with the letter r, and has values between -1 and 1. 2. When r = 1 all the data points form a perfect straight line relationship with a positive slope. 3. When r = -1 all the data points form a perfect straight line relationship with a negative slope.

  13. Attributes of the Correlation 4. Values of r close to 0 means that the linear relationship is weak. There is a general linear trend, but there is a lot of variability around that trend. 5. When r =0 there is no relationship between the two variables. In other words, the best fitting line has a slope of zero.

  14. Attributes of the Correlation 6. Outliers have a large influence on the correlation coefficient. The correlation is NOT resistant to outliers. 7. Correlation does not describe curved relationships! (ONLY LINEAR)

  15. Types of Correlation: r = 0 r = -0.3 r = 0.5 r = -0.7 r = 0.9 r = -0.99

  16. Example #3: What is wrong with the following statements? There is a strong correlation between the gender of American workers and their income. Gender is categorical

  17. Example #3: What is wrong with the following statements? 2. We found a high correlation (r = 1.09) between students’ rating of faculty teaching and ratings made by other faculty members. r can’t be bigger than 1

  18. Example #3: What is wrong with the following statements? 3. We found a very weak correlation (r = -0.95) which suggests little relationship between income and hours spent at casinos. r = -0.95 is a strong negative relationship

  19. Example #3: What is wrong with the following statements? 4. We found a very weak correlation (r = 0.01) which suggests little relationship between age and death rate. Should be a very strong relationship!

  20. Guidelines: How strong is the linear relationship? 0 < r < 0.3 = weak positive -0.3 < r < 0 = weak negative 0.4 < r < 0.7 = moderate positive -0.4 < r < -0.7 = moderate negative 0.8 < r < 1 = strong positive -0.8 < r < -1 = strong negative

  21. HOW TO CALCULATE THE CORRELATION COEFFICIENT Remember how to calculate the z-score? We used this calculation to determine how many standard deviations our observations was from the mean. RECALL:

  22. In this case, we were only concerned with one variable. Now, we are considering two variables and each must be standardized.

  23. Notation:

  24. FORMULA:

  25. Calculator Tip: Correlation L1: Explanatory Variable L2: Response Variable Stat-calc-LinReg(a+bx), L1, L2 (make sure your diagnostic is on!!!)

  26. Example #4: Step #1: Find the following summary statistics: 3 30 10 35 10

  27. Step #2: Calculate z-scores 1 0 1

  28. Step #3: Calculate the Correlation

  29. 3.2 – Least-Squares Regression

  30. straight line that describes the linear relationship between an explanatory variable and a response variable. Regression line:

  31. LEAST SQUARES REGRESSION LINE: • This is the best-fitting line to the data. • The goal is to minimize the (vertical) distances • of your observations (data) from your line. • Again, we must square the distances (like the • calculation of the variance) because some data • points will be larger than the mean (positive) • and some are smaller than the mean (negative) • and they will cancel each other out. So to • compensate, they are squared.

  32. We can use this line to predict a response, y, from a given explanatory variable, x.

  33. Remember graphing?? Slope-Intercept formula for a line: y = mx + b where m = ____________ and b = ____________ slope y-intercept In statistics, we write it Do you remember the SLOPE?

  34. Example #1 Wildlife researchers monitor many wildlife populations by taking aerial photographs in order to estimate the weights of alligators. Here is the regression line of the weights of adult alligators (in pounds) and their lengths (in inches) based on the data collected from captured alligators. Predicted Weight = – 393 + 5.9(length) 1. What is the slope of the line? What does it mean? m = 5.9 For every inch in length, it adds 5.9 pounds in weight

  35. Example #1 Wildlife researchers monitor many wildlife populations by taking aerial photographs in order to estimate the weights of alligators. Here is the regression line of the weights of adult alligators (in pounds) and their lengths (in inches) based on the data collected from captured alligators. Predicted Weight = – 393 + 5.9(length) 2. What is the y-intercept of the line? What does it mean? b = -393 If an alligator is 0 inches, then it weights -393lbs. This makes no sense!!!

  36. Example #1 Wildlife researchers monitor many wildlife populations by taking aerial photographs in order to estimate the weights of alligators. Here is the regression line of the weights of adult alligators (in pounds) and their lengths (in inches) based on the data collected from captured alligators. Predicted Weight = – 393 + 5.9(length) 3. Describe the relationship between weight and length of alligators. As the length increases, their weight increases.

  37. Example #1 Wildlife researchers monitor many wildlife populations by taking aerial photographs in order to estimate the weights of alligators. Here is the regression line of the weights of adult alligators (in pounds) and their lengths (in inches) based on the data collected from captured alligators. Predicted Weight = – 393 + 5.9(length) 4. What is the predicted weight for an alligator 90 inches long? = -393 + 5.9(90) = -393 + 531 = 138 lbs

  38. CALCULATION:

  39. Facts about Least Squares Regression: 1. The distinction between explanatory and response variables is essential (which variable is used to predict which?). 2. It always passes through the point (x, y). 3. Correlation ‘r’ describes the direction and strength of the straight line, but doesn’t tell us anymore about the slope than if it is positive or negative, or zero.

  40. Extrapolation: Predicting outside the range of the x values

  41. Calculator Tip: LSRL L1: Explanatory Variable L2: Response Variable Stat-calc-LinReg(a+bx), L1, L2, vars/y-vars/ Function/ Y1

  42. Example #2: Is there a relationship between wine consumption (in liters) and yearly deaths from heart disease (deaths per 100,000)? Here are the summary statistics: Mean wine consumption: 3,026 SD of wine consumption: 2,510 Mean deaths from heart disease: 191,053 SD of heart disease deaths: 68,396 Correlation coefficient between wine consumption and yearly deaths from heart disease = -.0843 a. Interpret the value of the correlation coefficient in the context of the problem. As wine consumption increases, mean deaths from heart disease decreases.

  43. Example #2: Is there a relationship between wine consumption (in liters) and yearly deaths from heart disease (deaths per 100,000)? Here are the summary statistics: Mean wine consumption: 3,026 SD of wine consumption: 2,510 Mean deaths from heart disease: 191,053 SD of heart disease deaths: 68,396 Correlation coefficient between wine consumption and yearly deaths from heart disease = -.0843 b. Calculate the least-squares regression line predicting death rate from wine consumption. = -0.0843(68,396/2,510) -2.2971 = 191,053–(-2.2971*3,026) 198004.0991 = = = 198,004.0991 – 2.2971x

  44. Example #2: Is there a relationship between wine consumption (in liters) and yearly deaths from heart disease (deaths per 100,000)? Here are the summary statistics: Mean wine consumption: 3,026 SD of wine consumption: 2,510 Mean deaths from heart disease: 191,053 SD of heart disease deaths: 68,396 Correlation coefficient between wine consumption and yearly deaths from heart disease = -.0843 c. Use your line to predict death rate for an average adult who consumes 4 liters of wine. = 198,004.0991 – 2.2971x = 198,004.0991 – 2.2971(4) = 197,994.9107

  45. Example #3: The following data describes the relationship between a tree trunks diameter vs. it height. Make a scatterplot of the data and find the LSRL. Define any variables used in this equation. How strong of an association is there?

  46. = -1.31467 + 4.54133x Where x = trunk diameter and = tree height Strong correlation, r = 0.88

  47. Residual: How close is the data to the line? Observed y – predicted

  48. residual

  49. A plot that shows the residuals for all the data. A good line has no pattern. Residual Plot: Calculator Tip: Residual Plot Calculate the LSRL L3: vars/ y-vars/ function/ Y1(L1) L4: L2 – L3 Scatterplot: L1, L4

  50. Example of random residual plots

More Related