1 / 25

Introduction to Statistics

Cambodian Mekong University. MB102. Introduction to Statistics. Chapter 4 Measures of Dispersion. Learning Objectives. Calculate common measures of variation (including the range, interquartile range, mean deviation and standard deviation) from grouped and ungrouped data

karan
Download Presentation

Introduction to Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cambodian Mekong University MB102 Introduction to Statistics Chapter 4 Measures of Dispersion

  2. Learning Objectives • Calculate common measures of variation (including the range, interquartile range, mean deviation and standard deviation) from grouped and ungrouped data • Calculate and interpret the coefficient of variation

  3. 1. Introduction • A measure of central tendency in itself is not sufficient to describe a set of data adequately • A measure of dispersion (or spread) of the data is usually required • This measure gives an indication of the internal variation of the data—that is, the extent to which data items vary from one another or from a central point • Some reasons for requiring a measure of dispersion of a set of data: • As an indication of the reliability of the average value • To assist in controlling unwanted variation

  4. 2. The range • The simplest measure of dispersion is the range • It is the difference between the largest and smallest values in a set of data Range = largest observation – smallest observation • Examples of uses of range include • Temperature fluctuations on a given day • Movement of share prices…

  5. 2. The range • Range is considered primitive as it considers only the extreme values, which may not be useful indicators of the bulk of the population • Extreme values, called outliers, may often result from errors of measurement • Outliers are defined as values that are inconsistent with the rest of the data • Although the range is the quickest and easiest measure of dispersion to calculate, its should be interpreted with some caution

  6. 3. The interquartile range (midspread) • Measures the range of the middle 50% of the values only • Is defined as the difference between the upper and lower quartiles Interquartile range = upper quartile – lower quartile = Q3 – Q1 • May be calculated from grouped frequency distributions that contain open-ended class intervals • It is usually only used with a large number of observations

  7. 4. The mean deviation • The mean deviation takes into account the actual value of each observation • It measures the ‘average’ distance of each observation away from the mean of the data • It gives an equal weight to each observation • It is generally more sensitive than the range or interquartile range, since a change in any value will affect it

  8. 4. The mean deviation • The residual measures the actual deviation (or distance) of each observation from the mean • A set of x values has a mean of • Theresidual of a particular x-value is: Example If the mean for a set of data is 3.22, find the residual for an observation of 4.38 Solution The residual of 4.38 is 4.38 – 3.22 = 1.16 Note: Residuals can be in the negative range. It shows that the observation is below the mean

  9. 4. The mean deviation • The mean deviation is defined as the mean of these absolute deviations: • To calculate the mean deviation Step 1: Calculate the mean of the data Step 2: Subtract the mean from each observation and record the resulting differences Step 3: Write down the absolute value of each of the differences found in Step 2 (ignore their signs) Step 4: Calculate the mean of the absolute values of the differences found in step 3

  10. 4. The mean deviation Example The batting scores of a cricketer was recorded over 10 completed innings to date. His scores were: 32, 27, 38, 25, 20, 32, 34, 28, 40, 29 Calculate the mean deviation of the cricketers’ scores Solution Step 1 The cricketers’ average number of runs is 30.5

  11. 4. The mean deviation • Step 2 and 3 completed in the table • Step 4

  12. 4. The mean deviation • Calculation of the mean deviation from a frequency distribution • If the data is in the form of a frequency distribution, the mean deviation can be calculated Where f = the frequency on an observation x = the sum on the frequencies = n

  13. 5. The standard deviation • The most commonly used measure of dispersion is the standard deviation • It takes into account every observation and measures the ‘average deviation’ of observations from mean • It works with squares of residuals, not absolute values, therefore it is easier to use in further calculations • The values of the mean deviation and standard deviation should be reasonably close, since they are both measuring the variation of the observations from their mean

  14. 5. The standard deviation • Population standard deviation • Uses squares of the residuals, which will eliminate the effect of the signs, since squares of numbers cannot be negative Step 1: find the sum of the squares of the residuals Step 2: find their mean. Step 3: take the square root of this mean. Where N = the size of the population The square of the population standard deviation is called the variance.

  15. 5. The standard deviation • Sample standard deviation • It is rare to calculate the value of since populations are usually very large • It is far more likely that the sample standard deviation (denoted by s) will be needed. • Where: (n – 1) is the number of observations in the sample

  16. 5. The standard deviation • A note on the use of (n − 1) in formulae • If the value of n is large, it will only make a slight difference to the answer whether you divide by n or (n − 1) • To calculate the value of s from a sample the calculator buttonwill usually be indicated by one of sn−1 or xsn−1 or sx or swritten either on it or near it • To calculate the value of sfrom a population, the calculator key will usually be indicated by one of sn or xsn or sx or swritten either on it or near it

  17. 5. The standard deviation • Important points about the standard deviation • The standard deviation cannot be negative • The standard deviation of a set of data is zero if, and only if, the observations are of equal value • The standard deviation can never exceed the range of the data • The more scattered the data, the greater the standard deviation • The square of the standard deviation is called the variance

  18. 5. The standard deviation • Calculation of the sample standard deviation Step 1: Calculate the mean Step 2: For each x-value, find the value of the residual Step 3: Square the residuals Step 4: Calculate the sum of the squares of the residuals Step 5: Divide the sum found in step 4 by (n – 1) Step 6: Take the square root of the quantity found in step 5: this is the sample standard deviation

  19. 5. The standard deviation • Calculation of the standard deviation from a frequency distribution • If the data are in the form of a frequency distribution, • Calculate standard deviation using:

  20. 5. The standard deviation • Calculation of the standard deviation from a grouped frequency distribution • When calculating s from a grouped frequency distribution, we should assume that the observations in each class interval are concentrated at the midpoint of the interval • Where = the estimated mean of the sample m = the midpoint of the class interval f = the frequency of the class interval

  21. 6. The coefficient of variation • This is a measure of relative variability used to: • measure changes that have occurred in a population over time • compare variability of two populations that are expressed in different units of measurement • It is expressed as a percentage rather than in terms of the units of the particular data

  22. 6. The coefficient of variation • The formula for the coefficient of variation (V) is: • Where = the mean of the sample • s = the standard deviation of the sample

  23. 6. The coefficient of variation Example Calculate the coefficient of variation for the price of 400 g cans of pet food, given that the mean is 81 cents and s = 6.77 cents. Interpret the results. Solution This means that the standard deviation of the price of a 400g can of pet food is 8.36% of the mean price.

  24. 7. Remarks • Among the more important characteristics of the standard deviation are: • It is the most frequently used measure of dispersion, and because of its mathematical properties it has widespread use in problems involving statistical inference • If the mean cannot be calculated, neither can the standard deviation • Its value is affected by the value of every observation in the data • If the data have a number of extreme values, the value of the standard deviation may be distorted so as not to be a good ‘representative’ measure of dispersion

  25. Summary • Among the more important characteristics of the standard deviation are: • It is the most frequently used measure of dispersion, and because of its mathematical properties it has widespread use in problems involving statistical inference. • If the mean cannot be calculated. neither can the standard deviation. • Its value is affected by the value of every observation in the data. • If the data have a number of extreme values, the value of the standard deviation may be distorted so as not to be a good ‘representative’ measure of dispersion.

More Related