350 likes | 590 Views
Chapter 7: Normal Probability Distributions. In Chapter 7:. 7.1 Normal Distributions 7.2 Determining Normal Probabilities 7.3 Finding Values That Correspond to Normal Probabilities 7.4 Assessing Departures from Normality. § 7.1: Normal Distributions.
E N D
In Chapter 7: 7.1 Normal Distributions 7.2 Determining Normal Probabilities 7.3 Finding Values That Correspond to Normal Probabilities 7.4 Assessing Departures from Normality
§7.1: Normal Distributions • Normal random variables are the most common type of continuous random variable • First described de Moivre in 1733 • Laplace elaborated the mathematics in 1812 • Describe some (not all) natural phenomena • More importantly, describe the behavior of means
This is the age distribution of a pediatric population. The overlying curve represents its Normal pdf model Normal Probability Density Function • Recall the continuous random variables are described with smooth probability density functions (pdfs) – Ch 5 • Normal pdfs are recognized by their familiar bell-shape
Area Under the Curve • The darker bars of the histogram correspond to ages less than or equal to 9 (~40% of observations) • This darker area under the curve also corresponds to ages less than 9 (~40% of the total area)
μ controls location σ controls spread Parameters μ and σ • Normal pdfs are a family of distributions • Family members identified by parameters μ(mean)and σ(standard deviation)
σ μ Mean and Standard Deviation of Normal Density
Standard Deviation σ • Points of inflections (where the slopes of the curve begins to level) occur one σbelow and above μ • Practice sketching Normal curves to feel inflection points • Practice labeling the horizontal axis of curves with standard deviation markers (figure)
68-95-99.7 Rule forNormal Distributions • 68% of the AUC falls within ±1σ of μ • 95% of the AUC falls within ±2σ of μ • 99.7% of the AUC falls within ±3σ of μ
Wechsler adult intelligence scores are Normally distributed with μ = 100 and σ = 15; X ~ N(100, 15). Using the 68-95-99.7 rule: 68% of scores fall in μ ± σ= 100 ± 15 = 85 to 115 95% of scores fall in μ ± 2σ= 100 ± (2)(15) = 70 to 130 99.7% of scores in μ ± 3σ= 100 ± (3)(15) = 55 to 145 Example: 68-95-99.7 Rule
Because of the Normal curve is symmetrical and the total AUC adds to 1… … we can determine the AUC in tails, e.g., Because 95% of curve is in μ ± 2σ, 2.5% is in each tail beyond μ ± 2σ 95% Symmetry in the Tails
Example: Male Height • Male height is approximately Normal with μ= 70.0˝ and σ= 2.8˝ • Because of the 68-95-99.7 rule, 68% of population is in the range 70.0˝ 2.8˝ = 67.2 ˝ to 72.8˝ • Because the total AUC adds to 100%, 32% are in the tails below 67.2˝ and above 72.8˝ • Because of symmetry, half of this 32% (i.e., 16%) is below 67.2˝ and 16% is above 72.8˝
Example: Male Height 64% 16% 16% 70 67.2 72.8
Reexpression of Non-Normal Variables • Many biostatistical variables are notNormal • We can reexpress non-Normal variables with a mathematical transformationto make them more Normal • Example of mathematical transforms include logarithms, exponents, square roots, and so on • Let us review the logarithmic transformation
Logarithms • Logarithms are exponents of their base • There are two main logarithmic bases • common log10(base 10) • natural ln (base e) • Landmarks: • log10(1) = 0 (because 100 = 1) • log10(10) = 1 (because 101 = 10)
Example: Logarithmic Re-expression Since only 2.5% of population has values greater than 3.67 → use this as cut-point for suspiciously high results • Prostate specific antigen (PSA) not Normal in 60 year olds but the ln(PSA) is approximately Normal with μ = −0.3 and σ = 0.8 • 95% of ln(PSA) falls in μ ± 2σ= −0.3± (2)(0.8) = −1.9 to 1.3 • Thus, 2.5% are above ln(PSA) 1.3; take anti-log of 1.3: e1.3 = 3.67
§7.2: Determining Normal Probabilities To determine a Normal probability when the value does not fall directly on a ±1σ, ±2σ, or ±3σ landmark, follow this procedure: 1. State the problem 2. Standardize the value (z score) 3. Sketch and shade the curve 4. Use Table B to determine the probability
Example: Normal ProbabilityStep 1. Statement of Problem • We want to determine the percentage of human gestations that are less than 40 weeks in length • We know that uncomplicated human pregnancy from conception to birth is approximately Normally distributed with μ = 39 weeks and σ = 2 weeks. [Note: clinicians measure gestation from last menstrual period to birth, which adds 2 weeks to the μ.] • Let X represent human gestation: X ~ N(39, 2) • Statement of the problem: Pr(X ≤ 40) = ?
Standard Normal (Z) Variable • Standard Normal variable≡ a Normal random variable with μ = 0 and σ = 0 • Called “Z variables” • Notation: Z ~ N(0,1) • Use Table B to look up cumulative probabilities • Part of Table B shown on next slide…
Example: A Standard Normal (Z) variable with a value of 1.96 has a cumulative probability of .9750.
Normal ProbabilityStep 2. Standardize To standardize, subtract μ and divide by σ. The z-scoretells you how the number of σ-units the value falls above or below μ
Steps 3 & 4. Sketch and Use Table B 3. Sketch andlabel axes 4. Use Table B to lookup Pr(Z ≤ 0.5) = 0.6915
Probabilities Between Two Points Let a represent the lower boundary and b represent the upper boundary of a range: Pr(a ≤ Z ≤ b) = Pr(Z ≤ b) − Pr(Z ≤ a) Use of this concept will be demonstrate in class and on HW exercises.
§7.3 Finding Values Corresponding to Normal Probabilities • State the problem. • Use Table B to look up the z-percentile value. • Sketch 4. Unstandardize with this formula
Looking up the z percentile value • Use Table B to look up the z percentile value, i.e., the z score for the probability in questions • Look inside the table for the entry closest to the associated cumulative probability. • Then trace the z score to the row and column labels.
Suppose you wanted the 97.5th percentile z score. Look inside the table for .9750. Then trace the z score to the margins. Notation: Let zp represents the z score with cumulative probability p, e.g., z.975= 1.96
Finding Normal Values - Example Suppose we want to know what gestational length is less than 97.5% of all gestations? Step 1. State the problem! Let X represent gestations length Prior problem established X ~ N(39, 2) We want the gestation length that is shorter than .975 of all gestations. This is equivalent to the gestation that is longer than.025 of gestations.
Example, cont. Step 2. Use Table B to look up the z value. Table B lists only “left tails”. “less than 97.5%” (right tail) = “greater than 2.5%” (left tail). z lookup in table shows z.025 = −1.96
3. Sketch 4. Unstandardize “The 2.5th percentile gestation is 35 weeks.”
Normal “Q-Q” Plot of same distribution 7.4 Assessing Departures from Normality The best way to assess Normality is graphically Approximately Normal histogram A Normal distribution will adhere to a diagonal line on the Q-Q plot
Negative Skew A negative skew will show an upward curve on the Q-Q plot
Positive Skew A positive skew will show an downward curve on the Q-Q plot
Same data as previous slide but with logarithmic transform A mathematical transform can Normalize a skew
Leptokurtotic A leptokurtotic distribution (skinny tails) will show an S-shape on the Q-Q plot