1 / 75

Statistics

Statistics. How can you best represent statistical information and draw conclusions from it?. What is statistics?. Statistics is the branch of mathematics that is concerned with the collection, organization, display and interpretation of data. Graphical Methods for Describing Data.

chynna
Download Presentation

Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics How can you best represent statistical information and draw conclusions from it?

  2. What is statistics? Statistics is the branch of mathematics that is concerned with the collection, organization, display and interpretation of data.

  3. Graphical Methods for Describing Data

  4. S.1 Organizing Data • How can data be shown on a table or in a graph and how can you read such data? • What is categorical data? • When should you use a pie chart and how are they made? • How do you organize a frequency distribution?

  5. Data types: categorical and numeric • Categorical—any non numeric data • Use frequency distributions • Bar charts • Pie charts • Numeric—anything that can be measured and list by number • Dotplots • Stem and leaf • Frequency distributions • histograms

  6. Does this data mean anything to you and can you answer questions about it in its current form? • Example • Leisure time activities • W T A W G T W W • C W T W A T T W • G W W C A W A W • W W T W W T • W=walking T=weight training C=cycling • G= gardening A=aerobics

  7. Displaying Catagoric Data • How can you display and interpret catagoric data? • catagoric—anything that can’t be measured and listed by number • Frequency distributions • Bar Charts • Pie Charts

  8. Frequency Distribution • Displays all categories and a tally for each • Relative frequency—the percentage as a decimal of time this category appears in the data / ---- / / / / ---- / / / / ---- / / / 15 .5 Leisure time activities W T A W G T W W C W T W A T T W G W W C A W A W W W T W W T ---- / / / / / / 2 7 / / / / / / / / / / / / / / / / / / / / / 2 / 2 / / / / / / / / / / 2 2 / / 4 2 / / / / Total = 1 Total = 30

  9. Bar Chart • Graphs the frequency of categorical data • Bars DO NOT touch • Categories are on the x-axis • Frequencies are on the y-axis Walking Wt Training Cycling Gardening Aerobic

  10. Pie Charts (circle graphs) • Used when there are not too many categories • Rule of thumb 8 or fewer • Each “slice” is determined by the relative frequency • Degrees in slice = rel freq x 360

  11. Homework • Worksheet 1

  12. S-2 Displaying Numeric Data EQ: How do you construct and read stem and leaf plots, dotplots, frequency distributions and histograms? • Numeric—anything that can be measured and list by number • Dotplots • Stem and leaf • Frequency distributions • histograms

  13. Dotplots • Simple way to represent small amounts of data • Each piece of data has its own dot • Dots stack vertically above the position on the x-axis • Depending on the data set, you may lose the exact value for each piece • 615 524 632 645 • 592 716 618 521 • 675 549 523 651 5 6 7

  14. Stem Plot • Works for a small to moderate set of data • Stems go in a vertical column • Stems may be split low and high (0-4 and 5-9) • Comparative or double stemplot—shows multiple data sets 5 6 7 • 51 61 52 63 64 • 57 59 71 61 52 • 68 67 54 52 65 1 2 2 2 4 7 9 1 2 2 2 4 7 9 1 1 5 7 8 1 1 3 4 5 7 8 • 51 61 52 73 54 • 57 59 71 61 52 • 68 67 74 52 65 1 3 4 1

  15. Histograms • A bar chart for numeric data • Center the rectangle over the indicated value on the x-axis—the bars touch • Can be drawn off of the frequency or the relative frequency distribution

  16. Shapes of Histograms • Unimodal—has one peak • Bimodal—has two peaks • Multimodal—has more than two peaks

  17. Types of Unimodal Curves • Symmetric Normal or Bell Shaped Light Tailed-- Having short tails Smaller Standard dev. Heavy tailed-- Having long tails Larger standard dev.

  18. Skewed Curves Lower (left) tail Upper (right) tail When there is an outlier to the right, the curve is skewed right When there is an outlier to the left, the curve is skewed left Skewness is judged by the tail not where the majority of the data lies. Skewness is judged by the tail not where the majority of the data lies. Skewness is judged by the tail not where the majority of the data lies.

  19. Frequency DistributionsContinuous and Discrete Data • Discrete Data • Individual data points • The range is always from the set of integers or whole numbers • Continuous Data • Data that may include decimals

  20. Frequency Distributions • There are no natural breaks for continuous data • We create our own • Ex. The fuel efficiency of a particular car ranges from 25.3 to 29.8 mpg we decide to use an interval of .5 • Note: • Always start at an even increment lower than the lowest piece of data and go to an even increment higher than the highest piece of data In which interval would you place 27.5 mpg?

  21. Homework Numeric Data • Worksheet 2

  22. Density Graphs • When data is unevenly distributed • You may want to use unequal groups or intervals • This may only be done if you graph the density

  23. Measures of Central Tendency

  24. S-3 Describing the Center of a Data Set EQ: What are the measures of central tendency and how can they be determined?

  25. Center and Spread • Two of the most critical descriptors of a data set • Graphical methods such as those in the last chapter give a general impression of both • Numerical methods give precise value that can be compared in detail

  26. The three M’s • Mean • Median • Mode • Also known as the average • Also called the middle • Most Frequent

  27. The Mean formula for the sample mean • x= each piece of data • xi= iindicates the position of the data from within the original data set • n= number of pieces of data in the data set • ∑ = Greek letter Sigma means to add what follows Always use more accuracy (more decimals) than any one piece of data has. µ is used for the population mean Greek letters are always used for population values

  28. The Median • The middle value in a list of ordered values • Median has no symbol but is often abbreviated • Med • If n is odd then the median is the exact middle number • If n is even then the median is the mean of the two middle numbers

  29. Comparison and Contrast of the Mean and Median • Median divides the data into two equal parts • 50% of the data is on either side of the median • Mean is where the fulcrum would cause the “data scale” to balance if the values had weight • It is very sensitive to outliers

  30. Balancing the “data scale” Normal/Bell curve mean median Skewed Right Skewed Left

  31. Trimmed Mean • Makes the mean less susceptible to outliers • Order the data • Remove the same number of pieces of data from each end • Recalculate the mean % x n = number of pieces to be removed from EACH end A small to moderate trim is 5% to 25%

  32. Trimmed Mean • Example: Find the 15% Trimmed mean of: 3, 6, 8, 2, 9, 10, 7, 15, 4, 12, 20, 36, 15, 5, 3, 7, 10, 16, 17, 12 Order the numbers: 2, 3, 3, 4, 5, 6, 7, 7, 8, 9, 10, 10, 12, 12, 15, 15, 16, 17, 20, 36, 20 items • .15 = 3 4, 5, 6, 7, 7, 8, 9, 10, 10, 12, 12, 15, 15, 16 =

  33. Weighted Mean • is similar to an arithmetic mean (the most common type of average), where instead of each of the data points contributing equally to the final average, some data points contribute more than others.

  34. Weighted Mean

  35. Homework worksheet 3

  36. Measures ofVariation

  37. S-4 Spread What are the quartiles, percentiles, and box plots?

  38. Range High - Low

  39. Interquartile Range • IQR IQR = upper quartile (Q3) – lower quartile (Q1) Lower quartile—the median of the lower half Upper quartile—the median of the upper half IF n is odd, the exact median is excluded from the quartiles Used because it is resistant to outliers There is no special name for the population IQR

  40. Boxplot • Can be used for many types of summarizations • Iqr = Q3 – Q1 • Outlier = data more than 1.5•iqr from the end of the box • Extreme=data more than 3•iqr from the end of the box 25% 25% 25% 25%

  41. Modified Boxplot Outlier (closed circle) Extreme Outlier (open circle)

  42. Percentages and percentiles: Percentage: “ the score “ * 100 total possible points Percentile: “The position of the score w/in an ordered list”*100 the total number of items EX: 10 students took a 90 point test 60, 65, 68, 74, 75, 80, 81, 81, 84, 90 (note: an ordered list) 1 2 3 4 5 6 7 8 9 10 What is the percent and the percentile for a score of 81? Percent: 81/90 *100=90% Percentile: 7/10*100= 70ieth percentile

  43. the median • the first quartile • the third quartile • the interquartile range • the mode • the percentile for .271 • the value closest to the 60th percentile EXAMPLE: Given a stem and leaf plot FIND: • 2 5 7 • 1 6 • 5 8 9 9 • 2 3 5 7 8 • 2 • 60 3 6

  44. Homework worksheet 4

  45. S-5 Measures of Variability • How do the measures of variability help us to better understand what our data set might look like?

  46. S-5 Measures of Variability • Range = high – low • Deviation from the mean= xi – • if positive then xi is larger than the mean • if negative then xi is smaller than the mean • Mean deviation is the average of the deviations • Sample Variance

  47. Sample Standard Deviation • “average distance” the items fall from the mean • A small s or s2 indicates low variability • A high s or s2 indicates large variability

  48. Population Variance (knowing all the data) • Population Standard Deviation compute to the same accuracy as the population

  49. Uses of the IQR • Standard deviation can be approximated by • SD = IQR/1.35 • If SD > IQR/1.35 it suggests heavier or longer tails than the normal curve

  50. Example • 20, 15, 12, 18, 17, 15, 17, 16, 18, 25 • Reorder 12, 15, 15, 16, 17, 17, 18, 18 20, 25 range = iqr = sd = Median= 17 Q1= 15 Q3= 18

More Related