300 likes | 480 Views
Probability and Statistics. 1.3 The Normal Distributions. Density Curve. A density curve is a smooth function meant to approximate a histogram. The area under a density curve is one.
E N D
Probability and Statistics • 1.3 The Normal Distributions
Density Curve • A density curve is a smooth function meant to approximate a histogram. • The area under a density curve is one. • Since the density curve represents the entire distribution, the area under the curve on any interval represents the proportion of observations in that interval.
Density Curves • The mean of density curve is the point at which the curve would balance. • The median of a density curve is the equal-areas point. In other words the areas under the curve on either side of the median are equal. • For symmetric density curves, balance point (mean) and the equal-areas point (median) are the same.
Definitions • Symmetric Data is symmetric if the left half of its histogram (or density curve) is roughly a mirror of its right half. • Skewed Data is skewed if its histogram (or density curve) is not symmetric and if it extends more to one side than the other. 6
Mean Mean Mode Mode Mode = Mean = Median Median Median SYMMETRIC SKEWED LEFT (negatively) SKEWED RIGHT (positively) Skewness 7
Characterization • A normal distribution is bell-shaped and symmetric. • The distribution is determined by the mean (mu (μ)), and the standard deviation (sigma (σ)). • The mean controls the center and stdev controls the spread. Note: These two density curves have the same mean but different Standard Deviations.
68-95-99.7 Rule • For any normal curve with mean μ and standard deviation σ: • 68 percent of the observations fall within one standard deviation of the mean. (μ – 1σ < x < μ + 1σ) • 95 percent of observation fall within 2 standard deviations. (μ – 2σ < x < μ + 2σ) • 99.7 percent of observations fall within 3 standard deviations of the mean. (μ – 3σ < x < μ + 3σ)
6.5 4.2 6.6 5.4 6.7 5.8 6.8 6.2 7.1 6.7 7.3 7.7 7.4 7.7 7.7 8.5 Bank of Providence 7.7 9.3 7.7 10.0 Jefferson Valley Bank Mean Median Mode Midrange 7.15 7.20 7.7 7.10 7.15 7.20 7.7 7.10 Waiting Times of Bank Customers at Different Banks in minutes • Jefferson Valley Bank • Bank of Providence What is the Standard Deviation of the data from JV Bank? from BofP? 10
Dotplots of Waiting Times Visually, which one has the greater spread?
Measures of Variation • Range highest value – lowest value 12
Measures of Variation Standard Deviation • a measure of variation of the scores about the mean • (average deviation from the mean) 13
Σ(x - x)2 S= n -1 Sample Standard Deviation Formula • calculators can compute the sample standard deviation of data 14
Population s Sx xσn-1 Sample σ σ x xσn Book Textbook Some graphics calculators Some graphics calculators Some non-graphics calculators Some non-graphics calculators Symbols for Standard Deviation Articles in professional journals and reports often use SD for standard deviation and VAR for variance. 15
Understanding Standard Deviation Spot the Jack Russell weighs 19 pounds. The mean weight for a Jack Russell Terrier is 16 pounds with a std dev of 1.5 pounds. Desdi the Maine Coon cat also weighs 19 pounds and frequently kicks Spot’s butt around the house. The mean weight for a Maine Coon is 17 pounds with a std dev of 0.75 pounds. Which animal is most in need of a diet?
(y – y) z = s Understanding Standard Deviation The only way to compare values in different units is to standardize the deviations from the means. In other words, we first have to convert all of the values into similar units – standard deviations from the respective means. THEN, we can compare them directly. This is done through the application of a Z-score: Value of interest Mean of data Std dev of data
Understanding Standard Deviation • z-score • will have same units as the independent variable if the data in quantitative or unit-less if the independent variable is categorical • represents the number of standard deviations a given number in the data is from the mean
Understanding Standard Deviation Spot the Jack Russell weighs 19 pounds. The mean weight for a Jack Russell Terrier is 16 pounds with a std dev of 1.5 pounds. Desdi the Maine Coon cat also weighs 19 pounds and frequently kicks Spot’s butt around the house. The mean weight for a Maine Coon is 17 pounds with a std dev of .75 pounds. Which animal is most in need of a diet? z-score for Spot z-score for Desdi Desdi is farther from the mean for the typical weight of her breed than Spot is from his breed. What can you say about the spread of weights for the two breeds? Can you think of any extraneous factor that could explain Desdi’s weight other than being overweight?
Understanding Standard Deviation Desdi z=2.67 Spot z=2 What percent of Jack Russell terriers weigh less than Spot? more? What percent of Maine Coon cats weigh less than Desdi? more?
Using z-score and the normal distribution • Suppose it takes you 20 minutes to drive to school, with a standard deviation of 2 minutes. • How often will you arrive on school in less than 22 minutes? • How often will it take you more than 24 minutes? • 75% of the time you will arrive in x minutes or less. Solve for x. • 43% of the time you will arrive in y minutes or more. Solve for y.
Measures of Variation Variance standard deviation squared s2 or σ2 Notation 22
Σ(x-x )2 Σ(x-µ)2 σ2 = s2 = n -1 N Variance Sample Variance Population Variance 23
Round-off Rulefor measures of variation • Carry at least one more decimal place than is present in the original set of values. • Round only the final answer, never in the middle of a calculation. 24
highest value - lowest value Range 4 = s ≈ 4 Estimation of Standard Deviation Range Rule of Thumb x + 2s x - 2s x Range ≈ 4s or (minimum usual value) (maximum usual value) 25
Usual Sample Values • minimum ‘usual’ value ≈ (mean) - 2 (standard deviation) • minimum≈ x - 2(s) • maximum ‘usual’ value ≈ (mean) + 2 (standard deviation) • maximum≈ x + 2(s)
0.1% 68% within 1 standard deviation 95% within 2 standard deviations 99.7% of data are within 3 standard deviations of the mean 34% 34% x - s x+s 13.5% 2.4% 13.5% 2.4% 0.1% x - 2s x+2s x - 3s x+3s x The Empirical Rule (applies to bell-shaped distributions) 27
Chebyshev’s Theorem • applies to distributions of any shape. • the proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1 - 1/K2 , where K is any positive number greater than 1. • at least 3/4 (75%) of all values lie within 2 standard deviations of the mean. • at least 8/9 (89%) of all values lie within 3 standard deviations of the mean. 28
Measures of Variation Summary • For typical data sets, it is unusual for a score to differ from the mean by more than 2 or 3 standard deviations. 29
Assignment • Read Section 1.3 • p. 64-66 1.62, 1.64-1.69, 1.71