230 likes | 242 Views
Exploring Data. Describing Distributions with Numbers. Ways to Measure Center. Mean Add the values and divide by the number of observations (n) Not a resistant measure of center (sensitive to outliers) Used mostly with symmetric distributions. Ways to Measure Center.
E N D
Exploring Data Describing Distributions with Numbers
Ways to Measure Center Mean Add the values and divide by the number of observations (n) Not a resistant measure of center (sensitive to outliers) Used mostly with symmetric distributions
Ways to Measure Center Median (M) – midpoint of the distribution, the number such that half the observations are smaller and the other half are larger Resistant measure of center (best to use when outliers are present) Can be used with symmetric or skewed distributions Q2 = 50th percentile, 50% of the data falls below it
Ways to Measure Spread • Range • Highest value – Lowest value • Could be based off of outliers so be careful • Quartiles (for use with median) • pth – p percent of the observations fall at or below it • Q1 (quartile 1): 25th percentile = 25% of the data values fall below it, also known as the median of the first half of the data • Q3(quartile 3): 75th percentile = 75% of the data values fall below it, also known as the median of the second half of the data
Ways to Measure Spread • 5 Number Summary • Used for distributions which are skewed, or which have strong outliers • Minimum value, Q1, Median, Q3, Maximum Value • Used in a Boxplot- graph of 5 number summary • Central box spans the quartiles Q1 and Q3 • Line in the box marks the median (M) • Lines extend from the box out to the smallest and largest observations
IQR Interquartile Range- measures the range of the middle 50% of the data Q3 – Q1 Used to find outliers
1.5 x IQR Rule for Outliers If an observation falls more than 1.5 x IQR above Q3 or below Q1, then we consider it an outlier Pg. 81- 82 for example on Ti-84
Ways to Measure Spread Standard Deviation or Sx – measures the average distance each value is in a distribution from the mean Variance or S2x – the average squared deviation
Ti-84 Input numbers into L1 Then 1-VAR-STATS
Standard Deviation S gets larger as the data becomes more spread out Only use mean and standard deviation for symmetric distributions which are free of outliers S measures spread about the mean and should be used only when the mean is chosen as the measure of center S = 0 when there is no spread/variability. When each value is the same, otherwise s > 0 as the observations become more spread out S is not resistant and outliers greatly affect it
Choosing the Right Center and Spread Symmetric = use mean and standard deviation Skewed = Use Median and IQR IT ALL DEPENDS ON THE SHAPE OF THE DISRIBUTION!
Linear Transformation of Data Xnew= a + bx The shape of the distribution does not change. Multiplying each observation by a positive number, b, multiplies both measures of center and measures of spread by b. Adding the same number, a, to each observation adds, a, to measures of center and to quartiles, but does not change measures of spread. Does not change the overall shape of the distribution!
Linearizing Transformation Example The following list gives the approximate base salaries for the 15 members of the 2005 Miami Heat basketball team (in millions of dollars). Find the mean, median, standard deviation, 5 number summary, and IQR
Answers Mean = 3.859 Median = 1.13 Standard deviation = 7.338 5 number summary = 0.33, 0.75, 1.13, 2.50, 27.70 IQR = 1.75
Linear Transformation Suppose each player gets a $100,000 bonus for winning the NBA Championship. Recalculate the measurements. Adding $100,000 needs to be converted to millions. Also this is the a portion in our equation: Xnew= a + bx
Answers Mean = 3.959 Median = 1.23 Standard deviation = 7.338 5 number summary = .43, .85, 1.23, 2.6, 27.8 IQR = 1.75 What did you notice? Same spread, different center! The mean, median, Min, Q1, Q3, Max all went up 0.1 The standard deviation and IQR did not change!
Linear Transformation Suppose, instead, each player is offered a 10% increase in base salary. Recalculate the measurements. Multiplying each salary by 1.10. Also this is the bportion in our equation: Xnew= a + bx
Answers Mean = 4.245 Median = 1.243 Standard deviation = 8.072 5 number summary = .363, .825, 1.243, 2.75, 30.47 IQR = 1.925 What did you notice? Changes Center and Spread! The mean, median, standard deviation, and IQR were multiplied by a factor of 1.10
DATA ANALYSIS TOOLBOX Keys to a perfect free response answer! Data = organize and examine the data. 1. Who are the individuals described in the data? 2. What are the variables? In what units is each variable recorded? 3. Why were the data gathered? 4. When, where, how, and by whom were the data produced? Graphs= Construct appropriate graphical displays. Numerical summaries= Calculate relevant summary statistics. Interpretation= Discuss what the data, graphs, and numerical summaries tell you in the context of the problem. Answer the question!!!!