1 / 35

Stat 501

Stat 501. Experimental Statistics I. Data, Data, Data, all around us !. We use data to answer research questions What evidence does data provide? How do I make sense of these numbers without some meaningful summary?. Example 2.

napua
Download Presentation

Stat 501

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stat 501 Experimental Statistics I

  2. Data, Data, Data, all around us ! • We use data to answer research questions • What evidence does data provide? • How do I make sense of these numbers without some meaningful summary?

  3. Example 2 • Study to assess the effect of exercise on cholesterol levels. One group exercises and other does not. Is cholesterol reduced in exercise group? • people have naturally different levels • respond differently to same amount of exercise (e.g. genetics) • may vary in adherence to exercise regimen • diet may have an effect • exercise may affect other factors (e.g. appetite, energy, schedule)

  4. What is statistics? • Recognize the randomness: the variability in data. • …“the science of understanding data and making decisions in face of variability” Three steps to the process of statistics: • Design the study • Analyze the collected Data • Discover what data is telling you…

  5. Section 1.2 Displaying Distributions with Graphs

  6. Individuals and Variables • Individuals – objects described by a set of data • people, animals, things • also called Cases • called Subjects if they are human • Variable – characteristic of an individual, takes different values for different subjects. • The three questions to ask : • Why: Purpose of study? • Who: Members of the sample, how many? • What: What did we measure (the variables) and in what units?

  7. Key Characteristics of a Data Set • Every data set is accompanied by important background information. In a statistical study, always ask the following questions: • Who? What cases do the data describe? How many cases does a data set have? • What? How many variables does the data set have? How are these variables defined? What are the units of measurement for each variable? • Why? What purpose do the data have? Do the data contain the information needed to answer the questions of interest?

  8. Categorical and Quantitative Variables • A categorical variable places each case into one of several groups, or categories. • A quantitative variable takes numerical values for which arithmetic • operations such as adding and averaging make sense. • The distribution of a variable tells us the values that a variable takes and how often it takes each value.

  9. Some Student Data

  10. Distribution of a Variable To examine a single variable, we graphically display its distribution. • The distribution of a variable tells us what values it takes and how often it takes these values. • Distributions can be displayed using a variety of graphical tools. The proper choice of graph depends on the nature of the variable. Categorical variable Pie chart Bar graph Quantitative variable Histogram Stemplot

  11. Categorical Variables • The distribution of a categorical variable lists the categories and gives the count or percent of individuals who fall into each category. • Pie charts show the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percents for the categories  Have to know the whole pie • Bar graphs represent categories as bars whose heights show the category counts or percents more flexible

  12. Bar Graph

  13. Pie Chart

  14. Quantitative Variables • The distribution of a quantitative variable tells us what values the variable takes on and how often it takes those values. • Histograms show the distribution of a quantitative variable by using bars. The height of a bar represents the number of individuals whose values fall within the corresponding class. • Stemplots separate each observation into a stem and a leaf that are then plotted to display the distribution while maintaining the original values of the variable. • Time plots plot each observation against the time at which it was measured.

  15. Stemplots • To construct a stemplot: • Separate each observation into a stem(first part of the number) and a leaf(the remaining part of the number). • Write the stems in a vertical column; draw a vertical line to the right of the stems. • Write each leaf in the row to the right of its stem; order leaves if desired.

  16. Stemplots

  17. 151516161717 Stemplots • If there are very few stems (when the data cover only a very small range of values), then we may want to create more stems by splittingthe original stems. • Example: If all of the data values are between 150 and 179, then we may choose to use the following stems: Leaves 0–4 would go on each upper stem (first “15”), and leaves 5–9 would go on each lower stem (second “15”).

  18. Example: Numbers of home runs that Hank Aaron hit in each of his 23 years in the Major Leagues: 13 27 26 44 30 39 40 34 45 44 24 32 44 39 29 44 38 47 34 40 20 12 10

  19. Step 1: Identify all the stems • 1 2 3 4 • Step 2: Write the stems in increasing order (usually from top to bottom) 1 2 3 4

  20. Step 3: Draw a line next to the stem and write the leaves against the stem 1 3 2 0 2 7 6 4 9 0 3 0 9 4 2 9 8 4 4 4 0 5 4 4 4 7 0

  21. Step 4: Rewrite the stemplot rearranging the leaves in ascending order (this can be done simultaneously with step 3): 1 0 2 3 2 0 4 6 7 9 3 0 2 4 4 8 9 9 4 0 0 4 4 4 4 5 7

  22. Back-to-Back stemplot • Compare the numbers of Hank Aaron to Barry Bonds: 5 16 19 24 25 25 26 28 33 33 34 34 37 37 40 42 45 45 46 46 49 73 0 5 3 2 0 1 6 9 9 7 6 4 0 2 4 5 5 6 8 9 9 8 4 4 2 0 3 3 3 4 4 7 7 7 5 4 4 4 4 0 0 4 0 2 5 5 6 6 9 5 6 7 3

  23. Examining distributions • Describe the pattern • Shape • How many modes (peaks)? • Symmetric or skewed in one direction? • Center – midpoints? • Mean/average; median • Spread • range between the smallest and the largest values, standard deviation, 5-number summary, quartiles • Look for outliers – individual values that do not match the overall pattern.

  24. Histograms

  25. Frequency Table

  26. What do you see? • Shape: Somewhat symmetric, unimodal • Center: about 110 or 115 • Spread : values between 80 and 150 • Remember! • Histograms only meaningful for quantitative data

  27. Quantitative Example • Breaking strength of connections for electronic components: • Need to discuss variation • How to group these items with so many different values?

  28. Dealing with outliers

  29. Outliers • Check for recording errors • Violation of experimental conditions • Discard it only if there is a valid practical or statistical reason, not blindly!

  30. Time plots

  31. Time plots

  32. Time Series or Time plots • We care about two important parts • Trend – persistent, long-term rise or fall • Seasonal variation – a pattern that repeats itself at known regular intervals of time. • Mississippi data: • Increasing trend • Large seasonal variations –there is usually a large spike every few years

  33. Example: Gasoline Price Data

  34. Summary • Categorical and Quantitative variables • Graphical tools for categorical variables • Bar Chart • Pie Chart • Graphical tools for quantitative variables • Stem and leaf plot • Histogram • Maybe timeplot if appropriate • Distributions • Describe: Shape, center, spread • Watch for patterns and/or deviations from patterns.

More Related