Review of Top 10 Concepts in Statistics

Review of Top 10 Conceptsin Statistics NOTE: This Power Point file is not an introduction, but rather a checklist of topics to review

Top Ten 10. Qualitative vs. Quantitative Data 9. Population vs. Sample 8. Graphical Tools 7. Variation Creates Uncertainty 6. Which Distribution? 5. P-value 4. Linear Regression 3. Confidence Intervals 2. Descriptive Statistics 1. Hypothesis Testing

Top Ten #10 • Qualitative vs. Quantitative

Qualitative • Categorical data: success vs. failure ethnicity marital status color zip code 4 star hotel in tour guide

Qualitative • If you need an “average”, do not calculate the mean • However, you can compute the mode (“average” person is married, buys a blue car made in America)

Quantitative • integer values (0,1,2,…) • number of brothers • number of cars arriving at gas station • Real numbers, such as decimal values ($22.22) • Examples: Z, t • Miles per gallon, distance, duration of time

Hypothesis TestingConfidence Intervals • Quantitative: Mean • Qualitative: Proportion

Top Ten #9 • Population vs. Sample

Population • Collection of all items (all light bulbs made at factory) • Parameter: measure of population characteristic (1) population mean (average number of hours in life of all bulbs) (2) population proportion (% of all bulbs that are defective)

Sample • Part of population (bulbs tested by inspector) • Statistic: measure of sample = estimate of parameter (1) sample mean (average number of hours in life of bulbs tested by inspector) (2) sample proportion (% of bulbs in sample that are defective)

Top Ten #8: Graphical Tools • Pie chart or bar chart: qualitative • Joint frequency table: qualitative (relate marital status vs zip code) • Scatter diagram: quantitative (distance from ASU vs duration of time to reach ASU) • Histograms • Stem Plots

Graphical Tools • Line chart: trend over time • Scatter diagram: relationship between two variables • Bar chart: frequency for each category • Histogram: frequency for each class of measured data (graph of frequency distr.) • Box plot: graphical display based on quartiles, which divide data into 4 parts

Top Ten #7 • Variation Creates Uncertainty

No Variation • Certainty, exact prediction • Standard deviation = 0 • Variance = 0 • All data exactly same • Example: all workers in minimum wage job

High Variation • Uncertainty, unpredictable • High standard deviation • Ex #1: Workers in downtown L.A. have variation between CEOs and garment workers • Ex #2: New York temperatures in spring range from below freezing to very hot

Comparing Standard Deviations • Temperature Example • Beach city: small standard deviation (single temperature reading close to mean) • High Desert city: High standard deviation (hot days, cool nights in spring)

Standard Error of the Mean Standard deviation of sample mean = standard deviation/square root of n Ex: standard deviation = 10, n =4, so standard error of the mean = 10/2= 5 Note that 5<10, so standard error < standard deviation. As n increases, standard error decreases.

Sampling Distribution • Expected value of sample mean = population mean, but an individual sample mean could be smaller or larger than the population mean • Population mean is a constant parameter, but sample mean is a random variable • Sampling distribution is distribution of sample means

Example • Mean age of all students in the building is population mean • Each classroom has a sample mean • Distribution of sample means from all classrooms is sampling distribution

Central Limit Theorem (CLT) • If population standard deviation is known, sampling distribution of sample means is normal if n > 30 • CLT applies even if original population is skewed

Top Ten #6 • What Distribution to Use?

Normal Distribution • Continuous, bell-shaped, symmetric • Mean=median=mode • Measurement (dollars, inches, years) • Cumulative probability under normal curve : use Z table if you know population mean and population standard deviation • Sample mean: use Z table if you know population standard deviation and either normal population or n > 30

t Distribution • Continuous, mound-shaped, symmetric • Applications similar to normal • More spread out than normal • Use t if normal population but population standard deviation not known • Degrees of freedom = df = n-1 if estimating the mean of one population • t approaches z as df increases

Normal or t Distribution? • Use t table if normal population but population standard deviation (σ) is not known • If you are given the sample standard deviation (s), use t table, assuming normal population

Top Ten #5 • P-value

P-value • P-value = probability of getting a sample statistic as extreme (or more extreme) than the sample statistic you got from your sample, given that the null hypothesis is true

P-value Example: one tail test • H0: µ = 40 • HA: µ > 40 • Sample mean = 43 • P-value = P(sample mean > 43, given H0 true) • Meaning: probability of observing a sample mean as large as 43 when the population mean is 40 • How to use it: Reject H0 if p-value < α (significance level)

Two Cases • Suppose α = .05 • Case 1: suppose p-value = .02, then reject H0 (unlikely H0 is true; you believe population mean > 40) • Case 2: suppose p-value = .08, then do not reject H0 (H0 may be true; you have reason to believe that the population mean may be 40)

P-value Example: two tail test • H0 : µ = 70 • HA: µ≠ 70 • Sample mean = 72 • If two-tails, then P-value = 2  P(sample mean > 72)=2(.04)=.08 If α = .05, p-value > α, so do not reject H0

Top Ten #4 • Linear Regression

Linear Regression • Regression equation: • =dependent variable=predicted value • x= independent variable • b0=y-intercept =predicted value of y if x=0 • b1=slope=regression coefficient =change in y per unit change in x

Slope vs Correlation • Positive slope (b1>0): positive correlation between x and y (y increase if x increase) • Negative slope (b1<0): negative correlation (y decrease if x increase) • Zero slope (b1=0): no correlation(predicted value for y is mean of y), no linear relationship between x and y

Simple Linear Regression • Simple: one independent variable, one dependent variable • Linear: graph of regression equation is straight line

Example • y = salary (female manager, in thousands of dollars) • x = number of children • n = number of observations

Given Data

Totals

Slope (b1) = -6.5 • Method of Least Squares formulas not on BUS 302 exam • b1= -6.5 given Interpretation: If one female manager has 1 more child than another, salary is $6,500 lower; that is, salary of female managers is expected to decrease by -6.5 (in thousand of dollars) per child

Intercept (b0) • b0 = 44.33 – (-6.5)(2.33) = 59.5 • If number of children is zero, expected salary is $59,500

Regression Equation

59.5 –6.5(3) = 40 $40,000 = expected salary Forecast Salary If 3 Children

Standard Error of Estimate

Standard Error of Estimate Actual salary typically $1,900 away from expected salary

Coefficient of Determination • R2 = % of total variation in y that can be explained by variation in x • Measure of how close the linear regression line fits the points in a scatter diagram • R2 = 1: max. possible value: perfect linear relationship between y and x (straight line) • R2 = 0: min. value: no linear relationship

Sources of Variation (V) • Total V = Explained V + Unexplained V • SS = Sum of Squares = V • Total SS = Regression SS + Error SS • SST = SSR + SSE • SSR = Explained V, SSE = Unexplained

Coefficient of Determination • R2 =SSR SST • R2 = 197 = .98 200.5 • Interpretation: 98% of total variation in salary can be explained by variation in number of children

0 < R2< 1 • 0: No linear relationship since SSR=0 (explained variation =0) • 1: Perfect relationship since SSR = SST (unexplained variation = SSE = 0), but does not prove cause and effect

R=Correlation Coefficient • Case 1: slope (b1) < 0 • R < 0 • R is negative square root of coefficient of determination

Our Example • Slope = b1 = -6.5 • R2 = .98 • R = -.99

Case 2: Slope > 0 • R is positive square root of coefficient of determination • Ex: R2 = .49 • R = .70 • R has no interpretation • R overstates relationship

Review of Top 10 Concepts in Statistics