710 likes | 1.11k Views
AP Statistics. Chapter 1 Exploring Data. WHAT IS STATISTICS?. Statistics is the study of how we: -Collect Data -Organize Data -Analyze Data -Use data to make predictions Statistics is the tool we use to extract information from data!. Lesson Objectives.
E N D
AP Statistics Chapter 1 Exploring Data
WHAT IS STATISTICS? Statistics is the study of how we: -Collect Data -Organize Data -Analyze Data -Use data to make predictions Statistics is the tool we use to extract information from data!
Lesson Objectives Identify individuals and variables in a set of data. Classify a variable as being a quantitative or categorical variable. Identify the units of measurement for a quantitative value.
VARIABLES Individualsarethe objects described by a set of data. Variables are characteristics that can take different values from individual to individual. A variable can be considered either categorical or quantitative.
EXAMPLE Suppose we observed a bag of M&M candies and were studying the different colors of the pieces. What would be the individuals in the study? What would be the variable?
Categorical vs. Quantitative While ZIP codes are numeric in form, you would not use arithmetic to combine them in any form. Quantitative Categorical Categorical Categorical variables place an individual in to a group or category. Quantitative variables assign a numerical value to an individual. EXAMPLES: Which type of variable is each? A person’s height … A person’s eye color … A person’s ZIP code …
Distribution The type of data collected can be a determining factor of the way the values are organized. Quantitative values can be very close together or very spread out. The pattern of variation between these values in a set of data is called the distribution. Distribution – a description of the values a variable takes and how often it takes these values.
DISTRIBUTION Both quantitative and categorical data will have differences from individual to individual. The pattern of variation of a variable is referred to as its distribution. In order to get a grasp of a variable’s distribution, we may use a graphical display of the data.
AP EXAM Tip You will often be asked to “describe the distribution” of a set of data. When you are asked to do this, make sure that you have your SOCkS on! S – Shape O – Outliers C – Center S– Spread When you describe these four characteristics of the data, you will be effectively describing the distribution!
ACTIVITY: Sexual Discrimination????? 25 airplane pilots have applied to fill 8 positions to be pilots with an airline company. 15 of them are males and 10 are females. To be fair, the managers select the 8 pilots to be employed by a lottery. A day later, the managers announce the 8 pilots to be hired. 5 of them are female and only 3 are males. Many of the males claimed that the lottery had to have been “rigged” since there was no way that so many females were selected.
ACTIVITY CONTINUED To simulate the situation, select ten red cards and fifteen black cards. Use the cards within your group to conduct your own lottery by drawing 8 cards. Count the number of females, and record that number. Put the cards back, and shuffle the cards. Repeat the process four more times. Report your results to be recorded.
What do we see? Do you think that it is possible that the number of females hired in the problem was a coincidence???
HOMEWORK Complete the assignment listed in the packet. This assignment will be due at the beginning of the next class session.
Analyzing Categorical Data • In this section we will learn about: • Bar graphs/pie charts • Problems with graphs • Two-way Tables and Marginal Distribution • Conditional Distributions • Simpson’s Paradox
EXAMPLE The Radio Arbitron service places each of the contry’s 13,838 stations in categories that describe the type of music they play. Here is the distribution of the data.
Continued Sometimes, we may wish to use a graph instead of table to clarify relationships.
Be Careful!!! Because of their appeal to the eyes, graphical displays can sometimes be misleading. Always look for things like scaling and relevance. Pictographs can almost always be misleading.
Pictograph • What is the issue with this ad that was used by Apple Computers to show the people that were buying their new iMac Computer?
Activity Use the table below. A.) Make a well-labeled graph to display the data. B.) Would it be appropriate to make a pie chart here? Why?
Two-Way Tables A survey of 4826 randomly selected young adults (19-25 yrs old) asked, “What do you think are the chances you will have much more than a middle-class income at age 30?” This is an example of a two-way table.
Two-Way Table • A two-way table describes two categorical variables, organizing counts according to a row variable and a column variable. • Marginal Distribution – • The distribution of values of one of the variables among all individuals in that category of a two-way table. • To examine a marginal distribution: • Use the table data to compute percents of the row or column totals. • Make a graph to display the marginal distribution.
Two-Way Tables and Marginal Distributions Examine the marginal distribution of chance of getting rich.
Simpson’s Paradox Accident victims are often transported by hospital to a medical facility. Does this act help save lives? What are the percentage of deaths for each of the two categories? …not too positive, huh?!
Continued… Let’s look at the data differently. Compute the percentages now. …is that right? This phenomenon is referred to as Simpson’s Paradox. It is caused by what is referred to a lurking variable. What was the lurking variable here?
HOMEWORK Complete the assignment listed in the packet. This assignment will be due at the beginning of the next class session.
1.2 – Displaying Quantitative Data Dotplots are a commonly used method of displaying quantitative data. To make a dotplot, DRAW a horizontal number line, labeled with the name of the variable. SCALE the number line, including the minimum and maximum values. MARK a dot above the corresponding location on the axis for each data value.
EXAMPLE The table here displays goals scored by the US Women’s Soccer Team in 2004. Create a dotplot to represent the data.
EXAMPLE 2 The table and dotplot below displays the Environmental Protection Agency’s estimates of highway gas mileage in miles per gallon (MPG) for a sample of 24 model year 2009 midsize cars. Use the dotplot to describe the distribution.
Describing the Shape We can describe the shape of a distribution bby using the following terms. Symmetric - if the right and left sides of the graph are approximately mirror images of each other. Skewed Right - if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side. Skewed Left – just the opposite of skewed right. Bimodal – A set of data that has two peaks.
Identify Each Symmetric Skewed - left Bimodal Skewed - right
Applying the Concepts Complete the “Check Your Understanding” questions on pg. 31.
VIDEO #2 Decisions Through Data Stemplots
Stemplots Stemplots are often used as a means of representing quantitative values. The data is organized by separating each observation into a stem (all but the last digit) and a leaf (the last digit). The leaf values are then paired with their stem and ordered. Trends and patterns in the distribution can be seen here.
Caffeine content of an 8oz. serving of many popular soft drinks. A&W Cream 20 Diet Sun Drop 47 Barq’s Root Beer15 Diet Sunkist 28 Cherry Coke 23 Diet Cherry Pepsi 24 Cherry RC Cola 29 Dr. Nehi 28 Coke Classic 23 Dr. Pepper 28 Diet A&W Cream 15 IBC Cherry 16 Diet Cherry Coke 23 Kick 38 Diet Coke 31 KMX 36 Diet Dr. Pepper 28 Mello Yello 35 Diet Mello Yello 35 Mountain Dew 37 Diet Mtn Dew 37 Mr. Pibb 27 Diet Mr. Pibb 27 Nehi Wild Red 33 Diet Pepsi 24 Pepsi One 37 Diet Red Squirt 26 Pepsi 25
Arrange all of this data in to a stemplot, and observe the distribution. RC Edge 47 Red Flash 27 Royal Crown 29 Red Squirt 26 Sun Drop Cherry 43 Sun Drop 43 Sunkist 28 Surge 35 Tab 31 Cherry Pepsi 25
Stemplot from the Example Caffeine Content (mg) per 8oz. Serving of Various Soft Drinks Key: 3|5 means 35 mg of caffeine per 8 oz. serving
An Alternative (better) Plot Caffeine Content (mg) per 8oz. Serving of Various Soft Drinks Key: 3|5 means 35 mg of caffeine per 8 oz. serving
Tips for Stemplots When you split stems, make sure each part is assigned an equal number of possibilities. There is no set number of stems. Too few stems makes a “skyscaper” shape. Too many stems makes a “pancake” shape. As a rule, a minimum of five stems is good to follow. Always include a title and a key to show how the stems were formed.
HOMEWORK Complete the assignment listed in the packet. This assignment will be due at the beginning of the next class session.
VIDEO #3 Decisions Through Data Histograms
HISTOGRAMS Histograms are different than bar graphs as they are represented on a continuum of values. As with stemplots, there is no set number of classes to use. Five classes is a good minimum. Remember: area of each bar is what matters. Make sure width is constant and height varies.
Relative Frequency Histograms A relative frequency histogram is based on relative frequencies of each category. Relative Frequency = number of occurrences in the category/ total number of occurrences. Relative frequency is often used to find percentiles, or the portion of data that at or below a value.
Applying the Concepts Complete the “Check Your Understanding” questions on pg. 39.
HOMEWORK Complete the assignment listed in the packet. This assignment will be due at the beginning of the next class session.
VIDEO #4 Decisions Through Data Measures of Center
1.3 - Describing Distributions with Numbers To describe a distribution, we must identify its center. One measure of the center of a set of data is the mean. The mean is the sum of all observations in a set divided by the total number of observations.
FORMULA for Mean The mean of a set of data is: or…
Median The median of a set of data is the midpoint of the data. Arrange all of the numbers from least to greatest. If there is an odd number of observations, the median is the center of the list. If there is an even number of observations, the median is the mean of the two center observations.
Mean vs. Median The median is a more resistant measure than the mean. This means that the mean can be more easily influenced by extreme values. Differences between mean and median can indicate skewness in the data. Skewed Left data will have a mean that is less than the median. Skewed Right data will have a mean that is greater than the data. Mean/Median Applet