1 / 50

Data observation and Descriptive Statistics

Data observation and Descriptive Statistics. Organizing Data. Frequency distribution Table that contains all the scores along with the frequency (or number of times) the score occurs. Relative frequency: proportion of the total observations included in each score. . Frequency distribution.

finn
Download Presentation

Data observation and Descriptive Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data observation and Descriptive Statistics

  2. Organizing Data • Frequency distribution • Table that contains all the scores along with the frequency (or number of times) the score occurs. • Relative frequency: proportion of the total observations included in each score.

  3. Frequency distribution

  4. Organizing data • Class interval frequency distribution • Scores are grouped into intervals and presented along with frequency of scores in each interval. • Appears more organized, but does not show the exact scores within the interval. • To calculate the range or width of the interval: • (Highest score – lowest score) / # of intervals • Ex: 120 – 0 / 5 = 24

  5. Class interval frequency distribution

  6. Graphs • Bar graphs • Data that are collected on a nominal scale. • Qualitative variables or categorical variables. • Each bar represents a separate (discrete) category, and therefore, do not touch. • The bars on the x-axis can be placed in any order.

  7. Bar Graph

  8. Graphs • Histograms • To illustrate quantitative variables • Scores represent changes in quantity. • Bars touch each other and represent a variable with increasing values. • The values of the variable being measured have a specific order and cannot be changed.

  9. Histogram

  10. Frequency polygon • Line graph for quantitative variables • Represents continuous data: (time, age, weight)

  11. Frequency Polygon AGE 22.06 24.05 25.04 25.04 25.07 25.07 26.03 26.11 27.03 27.11 29.03 29.05 29.05 34 37.1 53

  12. Descriptive Statistics • Numerical measures that describe: • Central tendency of distribution • Width of distribution • Shape of distribution

  13. Central tendency • Describe the “middleness” of a data set • Mean • Median • Mode

  14. _ X = ∑ X _____ n Mean • Arithmetic average • Used for interval and ratio data • Formula for population mean ( µ pronounced “mu”) µ = ∑ X _____ N • Formulas for sample mean

  15. Mean

  16. Mean • Not a good indicator of central tendency if distribution has extreme scores (high or low). • High scores pull the mean higher • Low scores pull the mean lower

  17. Median • Middle score of a distribution once the scores are arranged in increasing or decreasing order. • Used when the mean might not be a good indicator of central tendency. • Used with ratio, interval and ordinal data.

  18. Median

  19. Mode • The score that occurs in the distribution with the greatest frequency. • Mode = 0; no mode • Mode = 1; unimodal • Mode = 2; bimodal distribution • Mode = 3; trimodal distribution

  20. Mode

  21. Measures of Variability • Range • From the lowest to the highest score • Variance • Average square deviation from the mean • Standard deviation • Variation from the sample mean • Square root of the variance

  22. Measures of Variability • Indicate the degree to which the scores are clustered or spread out in a distribution. • Ex: Two distributions of teacher to student ratio. Which college has more variation?

  23. Range • The difference between the highest and lowest scores. • Provides limited information about variation. • Influenced by high and low scores. • Does not inform about variations of scores not at the extremes. • Examples: • Range = X(highest) – X (lowest) • College A: range = 41- 4 = 37 • College B: range = 22-16 = 6

  24. Variance • Limitations of range require a more precise way to measure variability. • Deviation: The degree to which the scores in a distribution vary from the mean. • Typical measure of variability: standard deviation (SD) • Variance The first step in calculating standard deviation

  25. Variance • X = Number of therapy sessions each student attended. • M = 4.2 “Deviation” Sum of deviations = 0

  26. Variance • In order to eliminate negative signs, we square the deviations. • Sum the deviations = sum of squares or SS

  27. Variance • Take the average of the SS • Ex: SS = 48.80 • SD2 = Σ(X-M)2 N • That is the average of the squared deviations from the mean • SD2 = 9.76

  28. Standard Deviation • Standard deviation • Typical amount that the scores vary or deviate from the sample mean • SD = Σ(X-M)2 N • That is, the square root of the variance • Since we take the square root, this value is now more representative of the distribution of the scores. ____ √

  29. Standard Deviation • X = 1, 2, 4, 4, 10 • M = 4.2 • SD = 3.12 (standard deviation) • SD2 = 9.76 (variance) • Always ask yourself: do these data (mean and SD) make sense based on the raw scores?

  30. ____ √ σ =∑( X - µ ) ² _________ N Population Standard Deviation • The average amount that the scores in a distribution vary from the mean. • Population standard deviation: (σpronounced “sigma”)

  31. σ = ∑( X - µ ) ² _________ N Sample Standard Deviation • Sample is a subset of the population. • Use sample SD to estimate population SD. • Because samples are smaller than populations, there may be less variability in a sample. • To correct for this, we divide the sample by N – 1 • Increases the standard deviation of the sample. • Provides a better estimate of population standard deviation. √ √ s = ∑( X - X ) ² _________ N - 1 Unbiased Sample estimator standard deviation Population standard deviation

  32. Sample Standard Deviation

  33. Types of Distributions • Refers to the shape of the distribution. • 3 types: • Normal distribution • Positively skewed distribution • Negatively skewed distribution

  34. Normal Distribution • Normal distributions: Specific frequency distribution • Bell shaped • Symmetrical • Unimodal • Most distributions of variables found in nature (when samples are large) are normal distributions.

  35. Normal Distribution • Mean, media and mode are equal and located in the center.

  36. Normal Distribution

  37. Skewed distributions • When our data are not symmetrical • Positively skewed distribution • Negatively skewed distribution Memory hint: skew is where the tail is; also the tail looks like a skewer and it points to the skew (either positive or negative direction)

  38. Skewed Distributions

  39. Kurtosis • Kurtosis - how flat or peaked a distribution is. • Tall and skinny versus short and wide • Mesokurtic: normal • Leptokurtic: tall and thin • Platykurtic: short and fat (squatty like a platypus!)

  40. Kurtosis leptokurtic platykurtic mesokurtic

  41. Skewness, Number of Modes, and Kurtosis in Distribution of Housing Prices

  42. z - Scores • In which country (US vs. England) is Homer Simpson considered overweight? • How can we make this comparison? • Need to convert weight in pounds and kilograms to a standardized scale. • Z- scores: allow for scores from different distributions to be compared under standardized conditions. • The need for standardization • Putting two different variables on the same scale • z-score: Transforming raw scores into standardized scores z = (X - µ) σ • Tell us the number of standard deviations a score is from the mean.

  43. z- Scores • Class 1: M = $46.53 SD = $41.87 X = $54.76 • Class 2: M = $53.67 SD = $18.23 X = $89.07 • In which class did I have more money in comparison to the distribution of the other students? Sample z-score: z = (X - M) s • When we convert raw scores from different distributions to z-scores, these scores become part of the same z distribution and we can compare scores from different distributions.

  44. z Distribution • Characteristics: (regardless of the original distributions) • z score at the mean equals 0 • Standard deviation equals 1

  45. z distribution of exam scores M = 70 s = 10

  46. Standard normal distribution • If a z-distribution is normal, then we refer to it as a standard normal distribution. • Provides information about the proportion of scores that are higher or lower than any other score in the distribution.

  47. Standard Normal Curve Table • Standard normal curve table (Appendix A) • Statisticians provided the proportion of scores that fall between any two z-scores. • What is the percentile rank of a z score of 1? • Percentile rank = proportion of scores at or below a given raw score. • Ex: SAT score = 1350 M = 1120 s = 340 • 75th percentile

  48. Percentile Rank The percentage of scores that your score is higher than. • 89th percentile rank for height • You are taller than 89% of the students in the class. (you are tall!) • Homer Simpson: 4th percentile rank for intelligence. he is smarter than 4% of the population (or 96% of the population is smarter than Homer). • GRE score: 88th percentile rank • Reading scores of grammar school: 18th percentile rank

  49. Review • Data organization • Frequency distribution, bar graph, histogram and frequency polygon. • Descriptive statistics • Central tendency = middleness of a distribution • Mean, median and mode • Measures of variation = the spread of a distribution • Range, standard deviation • Distributions can be normal or skewed (positively or negatively). • Z- scores • Method of transforming raw scores into standard scores for comparisons. • Normal distribution: mean z-score = 0 and standard deviation = 1 • Normal curve table: shows the proportions of scores below the curve for a given z-score.

More Related