580 likes | 592 Views
Learn about the concept of correlation, its strength and direction, and how to interpret it for predicting variables.
E N D
BNAD 276: Statistical Inference in ManagementSpring 2016 Welcome Green sheets
Schedule of readings Before our fourth and final exam (April 28th) OpenStax Chapters 1 – 13 (Chapter 12 is emphasized) Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions
Homework • On class website: • Please complete homework worksheet #17 • Hypothesis testing with Correlations Worksheet • Due: Thursday, April 14th
By the end of lecture today4/12/16 Logic of hypothesis testing with Correlations Interpreting the Correlations and scatterplots Simple Regression Using correlation for predictions
It went really well! Exam 3 Thanks for your patience and cooperation We should have the grades up by Tuesday(takes about a week)
Correlation Correlation: Measure of how two variables co-occur and also can be used for prediction • Range between -1 and +1 • The closer to zero the weaker the relationship and the worse the prediction • Positive or negative Remember, We’ll call the correlations “r” Revisit this slide
Remember, Correlation = “r” Positive correlation • Positive correlation: • as values on one variable go up, so do values for other variable • pairs of observations tend to occupy similar relative positions • higher scores on one variable tend to co-occur with higher scores on the second variable • lower scores on one variable tend to co-occur with lower scores on the second variable • scatterplot shows clusters of point • from lower left to upper right Revisit this slide
Remember, Correlation = “r” Negative correlation • Negative correlation: • as values on one variable go up, values for other variable go down • pairs of observations tend to occupy dissimilar relative positions • higher scores on one variable tend to co-occur with lower scores on • the second variable • lower scores on one variable tend to • co-occur with higher scores on the • second variable • scatterplot shows clusters of point • from upper left to lower right Revisit this slide
Zero correlation • as values on one variable go up, values for the other variable • go... anywhere • pairs of observations tend to occupy seemingly random • relative positions • scatterplot shows no apparent slope Revisit this slide
Correlation does not imply causation Is it possible that they are causally related? Yes, but the correlational analysis does not answer that question What if it’s a perfect correlation – isn’t that causal? No, it feels more compelling, but is neutral about causality Number of Birthdays Remember the birthday cakes! Number of Birthday Cakes Revisit this slide
Correlation - How do numerical values change? r = +0.97 r = -0.48 r = 0.61 r = -0.91 Revisit this slide
Variable name is listed clearly Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number) Variable name is listed clearly Both axes have real numbers listed Both axes and values are labeled This shows the strong positive (r = +0.8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 5660 64 68 72 Height of Mothers (in) 48 52 56 60 64 68 72 76 Height of Daughters (inches) Revisit this slide
Variable name is listed clearly Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number) Variable name is listed clearly Both axes have real numbers listed Both axes and values are labeled This shows the strong positive (r = +0.8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 5660 64 68 72 Height of Mothers (in) 48 52 56 60 64 68 72 76 Height of Daughters (inches)
Variable name is listed clearly Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number) Variable name is listed clearly Both axes have real numbers listed Both axes and values are labeled This shows the strong positive (r = +0.8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 5660 64 68 72 Height of Mothers (in) 48 52 56 60 64 68 72 76 Height of Daughters (inches) Revisit this slide
Variable name is listed clearly Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number) Variable name is listed clearly Both axes have real numbers listed Both axes and values are labeled This shows the strong positive (r = +0.8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 5660 64 68 72 Height of Mothers (in) 48 52 56 60 64 68 72 76 Height of Daughters (inches) Revisit this slide
Variable name is listed clearly Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number) Variable name is listed clearly Both axes have real numbers listed Both axes and values are labeled This shows the strong positive (r = +0.8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 5660 64 68 72 • Statistically significantp < 0.05 • Reject the null hypothesis Height of Mothers (in) 48 52 56 60 64 68 72 76 Height of Daughters (inches) Revisit this slide
Finding a statistically significant correlation • The result is “statistically significant” if: • the observed correlation is larger than the critical correlationwe want our r to be big if we want it to be significantly different from zero!! (either negative or positive but just far away from zero) • the p value is less than 0.05 (which is our alpha) • we want our “p” to be small!! • we reject the null hypothesis • then we have support for our alternative hypothesis
Five steps to hypothesis testing Step 1: Identify the research problem (hypothesis) Describe the null and alternative hypotheses For correlation null is that r = 0 (no relationship) Step 2: Decision rule • Alpha level? (α= .05 or .01)? • Critical statistic (e.g. critical r) value from table? • Degrees of Freedom = (n – 2) df = # pairs - 2 Step 3: Calculations Step 4: Make decision whether or not to reject null hypothesis If observed r is bigger than critical r then reject null Step 5: Conclusion - tie findings back in to research problem
Five steps to hypothesis testing Problem 1 • Is there a relationship between the: • Price • Square Feet • We measured 150 homes recently sold
Five steps to hypothesis testing Step 1: Identify the research problem (hypothesis) Is there a relationship between the cost of a home and the size of the home Describe the null and alternative hypotheses • null is that there is no relationship (r = 0.0) • alternative is that there is a relationship (r ≠ 0.0) Step 2: Decision rule – find critical r (from table) • Alpha level? (α= .05) • Degrees of Freedom = (n – 2) • 150 pairs – 2 = 148 pairs df = # pairs - 2
Critical r value from table α= .05 df = 148 pairs Critical valuer(148) = 0.195 df = # pairs - 2
Five steps to hypothesis testing Step 3: Calculations
Five steps to hypothesis testing Step 3: Calculations
Five steps to hypothesis testing Step 3: Calculations r = 0.726965 Critical valuer(148) = 0.195 Observed correlation r(148) = 0.726965 Step 4: Make decision whether or not to reject null hypothesis If observed r is bigger than critical r then reject null Yes we reject the null 0.727 > 0.195
Conclusion: Yes we reject the null. The observed r is bigger than critical r (0.727 > 0.195) Yes, this is significantly different than zero – something going on These data suggest a strong positive correlation between home prices and home size. This correlation was large enough to reach significance, r(148) = 0.73; p < 0.05
Finding a statistically significant correlation • The result is “statistically significant” if: • the observed correlation is larger than the critical correlationwe want our r to be big if we want it to be significantly different from zero!! (either negative or positive but just far away from zero) • the p value is less than 0.05 (which is our alpha) • we want our “p” to be small!! • we reject the null hypothesis • then we have support for our alternative hypothesis
Education Age IQ Income 0.38* Education -0.02 0.52* Age 0.38* -0.02 0.27* IQ 0.52* Income 0.27* Correlation matrices Correlation matrix: Table showing correlations for all possible pairs of variables 1.0** 0.41* 0.65** 0.41* 1.0** 1.0** 0.65** 1.0** Remember, Correlation = “r” * p < 0.05 ** p < 0.01 Revisit this slide
Education Age IQ Income Correlation matrices Correlation matrix: Table showing correlations for all possible pairs of variables Education Age IQ Income 0.41* 0.38* 0.65** -0.02 0.52* 0.27* * p < 0.05 ** p < 0.01
Correlation matrices • Variable names • Make up any name that • means something to you • VARX = “Variable X” • VARY = “Variable Y” • VARZ = “Variable Z” Correlation of X with X Correlation of Y with Y Correlation of Z with Z
Correlation matrices Does this correlation reach statistical significance? • Variable names • Make up any name that • means something to you • VARX = “Variable X” • VARY = “Variable Y” • VARZ = “Variable Z” Correlation of X with Y Correlation of X with Y p value for correlation of X with Y p value for correlation of X with Y
Correlation matrices Does this correlation reach statistical significance? • Variable names • Make up any name that • means something to you • VARX = “Variable X” • VARY = “Variable Y” • VARZ = “Variable Z” Correlation of X with Z Correlation of X with Z p value for correlation of X with Z p value for correlation of X with Z
Correlation matrices Does this correlation reach statistical significance? • Variable names • Make up any name that • means something to you • VARX = “Variable X” • VARY = “Variable Y” • VARZ = “Variable Z” Correlation of Y with Z Correlation of Y with Z p value for correlation of Y with Z p value for correlation of Y with Z
Correlation matrices What do we care about?
Correlation matrices What do we care about? • We measured the following characteristics of 150 homes • recently sold • Price • Square Feet • Number of Bathrooms • Lot Size • Median Income of Buyers
Correlation matrices What do we care about?
Correlation matrices What do we care about?
Correlation matrices What do we care about?
Critical r value from table α= .05 df = 148 pairs Critical valuer(148) = 0.195 df = # pairs - 2
Correlation matrices What do we care about? Critical value from tabler(148) = 0.195
Correlation: Independent and dependent variables • When used for prediction we refer to the predicted variable • as the dependent variable and the predictor variable as the independent variable What are we predicting? What are we predicting? Dependent Variable Dependent Variable Independent Variable Independent Variable
YearlyIncome Expenses per year Correlation - What do we need to define a line If you probably make this much Y-intercept = “a” (also “b0”)Where the line crosses the Y axis Slope = “b” (also “b1”)How steep the line is If you spend this much • The predicted variable goes on the “Y” axis and is called the dependent variable • The predictor variable goes on the “X” axis and is called the independent variable
YearlyIncome Expenses per year Angelina Jolie Buys Brad Pitt a $24 million Heart-Shaped Island for his 50th Birthday Angelina probably makes this much Dustin probably makes this much Dustin spent this much Angelina spent this much Dustin spends $12 for his Birthday Revisit this slide
Assumptions Underlying Linear Regression • For each value of X, there is a group of Y values • These Y values are normally distributed. • The means of these normal distributions of Y values all lie on the straight line of regression. • The standard deviations of these normal distributions are equal. Revisit this slide
Assumptions Underlying Linear Regression • For each value of X, there is a group of Y values • These Y values are normally distributed. • The means of these normal distributions of Y values all lie on the straight line of regression. • The standard deviations of these normal distributions are equal.
Correlation - the prediction line - what is it good for? Prediction line • makes the relationship easier to see • (even if specific observations - dots - are removed) • identifies the center of the cluster of (paired) observations • identifies the central tendency of the relationship(kind of like a mean) • can be used for prediction • should be drawn to provide a “best fit” for the data • should be drawn to provide maximum predictive power for the data • should be drawn to provide minimum predictive error
Prediction line Y’ = a+ b1X1 Predicting Restaurant Bill Cost will be about 95.06 Cost Y-intercept The expected cost for dinner for two couples (4 people) would be $95.06Cost = 15.22 + 19.96 Persons People Slope If People = 4 If “Persons” = 4, what is the prediction for “Cost”? Cost = 15.22 + 19.96 Persons Cost = 15.22 + 19.96 (4) Cost = 15.22 + 79.84 = 95.06 If “Persons” = 1, what is the prediction for “Cost”? Cost = 15.22 + 19.96 Persons Cost = 15.22 + 19.96 (1) Cost = 15.22 + 19.96 = 35.18
Prediction line Y’ = a+ b1X1 Predicting Rent Rent will be about 990 Cost Y-intercept Slope Square Feet If SqFt = 800 The expected cost for rent on an 800 square foot apartment is $990Rent = 150 + 1.05 SqFt If “SqFt” = 800, what is the prediction for “Rent”? Rent = 150 + 1.05 SqFt Rent = 150 + 1.05 (800) Rent = 150 + 840 = 990 If “SqFt” = 2500, what is the prediction for “Rent”? Rent = 150 + 1.05 SqFt Rent = 150 + 1.05 (2500) Rent = 150 + 2,625 = 2,775
Regression Example Rory is an owner of a small software company and employs 10 sales staff. Rory send his staff all over the world consulting, selling and setting up his system. He wants to evaluate his staff in terms of who are the most (and least)productive sales people and also whether more sales calls actually result in more systems being sold. So, he simply measures the number of sales calls made by each sales person and how many systems they successfully sold.
50 40 Number of systems sold 30 20 10 0 0 1 2 3 4 Number of sales calls made Ava 70 Emily Regression Example Isabella 60 Do more sales calls result in more sales made? Emma Step 1: Draw scatterplot Ethan Step 2: Estimate r Joshua Jacob Dependent Variable Independent Variable