260 likes | 375 Views
STA 291 Spring 2010. Lecture 5 Dustin Lueker. Measures of Central Tendency. Mean - Arithmetic Average . Median - Midpoint of the observations when they are arranged in increasing order. Notation: Subscripted variables n = # of units in the sample N = # of units in the population
E N D
STA 291Spring 2010 Lecture 5 Dustin Lueker
Measures of Central Tendency Mean - Arithmetic Average Median - Midpoint of the observations when they are arranged in increasing order Notation: Subscripted variables n = # of units in the sample N = # of units in the population x = Variable to be measured xi= Measurement of the ith unit Mode - Most frequent value. STA 291 Spring 2010 Lecture 5
Symbols STA 291 Spring 2010 Lecture 5
Variance and Standard Deviation • Sample • Variance • Standard Deviation • Population • Variance • Standard Deviation STA 291 Spring 2010 Lecture 5
Variance Step By Step • Calculate the mean • For each observation, calculate the deviation • For each observation, calculate the squared deviation • Add up all the squared deviations • Divide the result by (n-1) Or N if you are finding the population variance (To get the standard deviation, take the square root of the result) STA 291 Spring 2010 Lecture 5
Empirical Rule • If the data is approximately symmetric and bell-shaped then • About 68% of the observations are within one standard deviation from the mean • About 95% of the observations are within two standard deviations from the mean • About 99.7% of the observations are within three standard deviations from the mean STA 291 Spring 2010 Lecture 5
Empirical Rule STA 291 Spring 2010 Lecture 5
Percentiles • The pth percentile (Xp) is a number such that p% of the observations take values below it, and (100-p)% take values above it • 50th percentile = median • 25th percentile = lower quartile • 75th percentile = upper quartile • The index of Lp • (n+1)p/100 STA 291 Spring 2010 Lecture 5
Quartiles • 25th percentile • lower quartile • Q1 • (approximately) median of the observations below the median • 75th percentile • upper quartile • Q3 • (approximately) median of the observations above the median STA 291 Spring 2010 Lecture 5
Example • Find the 25th percentile of this data set • {3, 7, 12, 13, 15, 19, 24} STA 291 Spring 2010 Lecture 5
Interpolation • Use when the index is not a whole number • Want to start with the closest index lower than the number found then go the distance of the decimal towards the next number • If the index is found to be 5.4 you want to go to the 5th value then add .4 of the value between the 5th value and 6th value • In essence we are going to the 5.4th value STA 291 Spring 2010 Lecture 5
Example • Find the 40th percentile of the same data set • {3, 7, 12, 13, 15, 19, 24} • Must use interpolation STA 291 Spring 2010 Lecture 5
Data Summary • Five Number Summary • Minimum • Lower Quartile • Median • Upper Quartile • Maximum • Example • minimum=4 • Q1=256 • median=530 • Q3=1105 • maximum=320,000. • What does this suggest about the shape of the distribution? STA 291 Spring 2010 Lecture 5
Interquartile Range (IQR) • The Interquartile Range (IQR) is the difference between upper and lower quartile • IQR = Q3 – Q1 • IQR = Range of values that contains the middle 50% of the data • IQR increases as variability increases • Murder Rate Data • Q1= 3.9 • Q3 = 10.3 • IQR = STA 291 Spring 2010 Lecture 5
Box Plot • Displays the five number summary (and more) graphical • Consists of a box that contains the central 50% of the distribution (from lower quartile to upper quartile) • A line within the box that marks the median, • And whiskersthat extend to the maximum and minimum values • This is assuming there are no outliers in the data set STA 291 Spring 2010 Lecture 5
Outliers • An observation is an outlier if it falls • more than 1.5 IQR above the upper quartile or • more than 1.5 IQR below the lower quartile STA 291 Spring 2010 Lecture 5
Box Plot • Whiskers only extend to the most extreme observations within 1.5 IQR beyond the quartiles • If an observation is an outlier, it is marked by an x, +, or some other identifier STA 291 Spring 2010 Lecture 5
Example • Values • Min = 148 • Q1 = 158 • Median = Q2 = 162 • Q3 = 182 • Max = 204 • Create a box plot STA 291 Spring 2010 Lecture 5
5 Number Summary/Box Plot • On right-skewed distributions, minimum, Q1, and median will be “bunched up”, while Q3 and the maximum will be farther away. • For left-skewed distributions, the “mirror” is true: the maximum, Q3, and the median will be relatively close compared to the corresponding distances to Q1 and the minimum. • Symmetric distributions? STA 291 Spring 2010 Lecture 5
Mode • Value that occurs most frequently • Does not need to be near the center of the distribution • Not really a measure of central tendency • Can be used for all types of data (nominal, ordinal, interval) • Special Cases • Data Set • {2, 2, 4, 5, 5, 6, 10, 11} • Mode = • Data Set • {2, 6, 7, 10, 13} • Mode = STA 291 Spring 2010 Lecture 5
Mean vs. Median vs. Mode • Mean • Interval data with an approximately symmetric distribution • Median • Interval or ordinal data • Mode • All types of data STA 291 Spring 2010 Lecture 5
Mean vs. Median vs. Mode • Mean is sensitive to outliers • Median and mode are not • Why? • In general, the median is more appropriate for skewed data than the mean • Why? • In some situations, the median may be too insensitive to changes in the data • The mode may not be unique STA 291 Spring 2010 Lecture 5
Example • “How often do you read the newspaper?” • Identify the mode • Identify the median response STA 291 Spring 2010 Lecture 5
Measures of Variation • Statistics that describe variability • Two distributions may have the same mean and/or median but different variability • Mean and Median only describe a typical value, but not the spread of the data • Range • Variance • Standard Deviation • Interquartile Range • All of these can be computed for the sample or population STA 291 Spring 2010 Lecture 5
Range • Difference between the largest and smallest observation • Very much affected by outliers • A misrecorded observation may lead to an outlier, and affect the range • The range does not always reveal different variation about the mean STA 291 Spring 2010 Lecture 5
Example • Sample 1 • Smallest Observation: 112 • Largest Observation: 797 • Range = • Sample 2 • Smallest Observation: 15033 • Largest Observation: 16125 • Range = STA 291 Spring 2010 Lecture 5