1 / 24

DESCRIPTIVE STATISTICS

DESCRIPTIVE STATISTICS. UNIT 5: Bivariate Data. TouchText. Introducing pairwise bivariate data The covariance and the correlation coefficient Multivariate Data. Problems and Exercises. Next. Bivariate Data: .

almira
Download Presentation

DESCRIPTIVE STATISTICS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DESCRIPTIVE STATISTICS UNIT 5: Bivariate Data TouchText • Introducing pairwise bivariate data • The covariance and the correlation coefficient • Multivariate Data Problems and Exercises Next

  2. Bivariate Data: Bivariate Data consists of two quantitative variables per observation. The data are paired together to determine the nature of the relationship, if any, between the two variables. Dictionary Example (at right): 20 students are sampled, comparing their study hours per week with their grade point averages (GPAs). Bivariate data can be used for both descriptive purposes and possibly for predictive purposes. Take Notes Back Next

  3. Bivariate Statistic: The Covariance There is a statistic for bivariate data that is analogous to the variance for univariate data. Dictionary The covariance of two variables is equal to: The (population and sample) variances, introduced in Unit 2, can be thought of as the covariance of a variable with itself. In this regard, it is instructive to compare the formulas for variance and covariance. (Unit 2 variance formulas are reproduced in the links below.) Take Notes Back Next

  4. Population Variance vs. Population Covariance The population variance is calculated as the mean of all squared deviations away from the population mean. Dictionary Compare with … Viewed this way, a variance is simply the covariance of a variable with itself. Take Notes Back Next

  5. Population Variance and Covariance (Alternative Derivation) The population variance can also be calculated as follows: Dictionary where E[ ] stands for “expectation”. So E[X] is simply the mean of X, μX, and E[X2] is the average value of X2. The population covariance can be calculated in similar fashion: Take Notes Back Next

  6. Correlation Coefficient and the (R) Coefficient of Determination (R2) A sample or population covariance is often standardized by dividing it by the product of the two standard deviations. The is known as the Correlation Coefficient (R). Dictionary Squaring the correlation coefficient produces the Coefficient of Determination (R2). R = R2 = 0 implies no correlation. R = R2 = 1 implies perfect positive Correlation. R = - 1 implies perfect negative Correlation. Take Notes Back Next

  7. Population Variance and Covariance: Example When there are two variables, it is common to call them X and Y. When there are more than two variables, it is common to call them X1, X2, X3, etc. Dictionary Take Notes Back Next

  8. Converting Population to Sample Population variances and covariances are solved by dividing Σ( )/N; while samples are divided by Σ( )/(n-1). Therefore, Dictionary So, if you have to solve for sample variances, covariances, etc., you can solve using the alternate method (as above) and then just multiply by (N/(N-1)). Take Notes Back Next

  9. Sample Variance The sample variance couldbe calculated as the average of all squared deviations away from the mean, but because it is a sample, the total squared deviations must be divided by one minus the number of observations. Dictionary Here, E[s2] = σ2 . Compare with … Take Notes Back Next

  10. Calculating the Sample Covariance on MS Excel Just as when calculating variances, a matrix can be constructed in MS Excel in which the covariance can be calculated. Dictionary This data is from the example presented above. Here, the (sample) covariance is calculated. Take Notes Back Next

  11. Calculating the Population Covariance on MS Excel If the 20 students represented the entire population (rather than a sample), the covariance between study hours and GPA would have been calculated accordingly. Dictionary This data is from the example presented above. Here, the (population) covariance is calculated. Take Notes Back Next

  12. Calculating the Covariance on MS Excel: Using Excel Formulas MS Excel has ready-made functions for calculating the covariance for either a sample or a population Dictionary Sample: Covariance.S( ) Population: Covariance.P( ) The covariance functions each have a cell range reference for both the X variable (in this example, I7:I26) and the Y variable (in this example, J7:J26) * The reader can confirm that the sample and population covariances calculated here are the same as those calculated manually in the tables above. Take Notes Back Next

  13. The Correlation Coefficient “r” The covariance between two variables will tell you if there is a positive or negative relationship between the two of them. But to measure the strength of the effect of the X on Y in a way that is independent of the units of measurement, one needs to standardize the statistic by dividing by the product of the two standard deviations. Dictionary Take Notes Back Next

  14. Calculating the Correlation Coefficient on MS Excel Use the CORREL( ) function, or calculate “r” manually. Dictionary Function (left) and manual (above) Take Notes Back Next

  15. Multivariate Correlations Often, more than two variables can be involved in a statistical analysis. Correlations will exist pairwise between each two pairs of variables. For example, if we have three variables X1, X2, and X3, we will have three correlations: rx1,x2, rx1,x3, and rx2,x3. With “n” variables, there will always be: n variances – one for each variable n(n – 1)/2 covariances – one for each pair of variables Dictionary It is usual in statistics to create a variance-covariance matrix, as displayed in the following pages. Take Notes Back Next

  16. Multivariate Correlations This is the covariance matrix created using MS Excel’s data analysis add-in. Dictionary Var(X1) Var(X2) Var(X3) Cov(X1,X2) Cov(X1,X3) Cov(X2,X3) The variances are on the diagonal of the matrix, and the covariances are off-diagonal. Take Notes Back Next

  17. Multivariate Correlations This is the correlation matrix created using MS Excel’s data analysis add-in. Dictionary Every variable is perfectly correlated with itself, so the diagonal shows all 1’s. Take Notes Back Next

  18. Calculating the Coefficient of Determination (R2) Covariance SXY = 0.334 Dictionary 0.15 = 15% R2 is very important. When an independent variable (study hours) has an effect on a second, dependent variable (GPA), the R2 tells us what percentage of the value of the dependent variable is explained by the value of the dependent variable. (none) 0 ≤ R2 ≤ 1 (perfect) Take Notes Back Next

  19. Visualizing Bivariate Data: The Scatter Diagram Bivariate data are often displayed in scatter diagrams, which show data points in X,Y space, with each data point representing an observed pair of the two variables. Dictionary Scatter Diagram Take Notes Back Next

  20. No Relationship There is a presumptive relationship of some kind between the dependent and independent variables – otherwise why analyze it! However, it may turn out that there is in fact no observable relationship between the two variables. Dictionary No observed relationship between X and Y is discernible. Take Notes Back Next

  21. Non-Linear Relationship It is possible that there is a relationship between the X and Y variables, but that the relationship is complex and non-linear. Dictionary In this example, Y seems at first to increase as X increases, but then falls as X increases. Take Notes Back Next

  22. Linear Relationships: The Direction of the Relationship As portrayed in a scatter diagram, bivariate data will almost never plot out along a straight line. Often, however, the relationship can be roughly characterized as linear. Dictionary A positive linear relationship will depict low values of Y with low values of X, and high values of Y with high values of X. Positive covariance and positive correlation coefficient. A negative linear relationship will depict values of X and Y related in opposite fashion: high X implies low Y, and conversely. Negative covariance and negative correlation coefficient. positive negative Take Notes Back Next

  23. The Strength of Relationship In addition to describing the direction of the relationship (i.e. positive or negative), once can also examine the strength of the relationship between the two variables. Dictionary A strong linear relationship will values of Y and X lying closely along a line. A weak linear relationship will reveal a (positive or negative) relationship between X and Y, but the data points do not come close to lying along a straight line. strong weak * The direction of the relationship (positive or negative) is independent of the strength of the relationship. Take Notes Back Next

  24. Relationships: Examples What do you think? Describe these relationships. Dictionary Linearor Non-Linear? Positiveor Negative? Strongor Weak? Linearor Non-Linear? Positiveor Negative? Strongor Weak? Take Notes Back Next

More Related