570 likes | 1.08k Views
Statistics. By: M. Yasir Ali Umer Saeed Luqman Bashir Waqas Hussain Ahsan Raza. Introduction. Meaning of Statistics Observations and Variable Collection of Data. Meaning of Statistics.
E N D
Statistics By: M. Yasir Ali Umer Saeed Luqman Bashir Waqas Hussain Ahsan Raza
Introduction • Meaning of Statistics • Observations and Variable • Collection of Data
Meaning of Statistics • Meaning of Statistics: Statistics is concerned with scientific methods for collecting, organizing, summarizing, presenting and analyzing data as well as deriving valid conclusions and making reasonable decisions on the basis of this analysis. Statistics is concerned with the systematic collection of numerical data and its interpretation. The word ‘statistic’ is used to refer to • 1. Numerical facts, such as the number of people living in particular area. • 2. The study of ways of collecting, analyzing and interpreting the facts
Meaning of Statistics • The word “Statistics” comes from Latin word Status, meaning a political state, originally meant information useful to the state,for example information about the sizes of population and armed forces. • The word statistics refers to “numerical facts systematically arranged. • Use of Statistical information to inform public; to explain things happened; to justify a claim; to provide general comparisons.
Observations and Variables • In statistics, observation means any sort of numerically recording of information. A classification such as head or tail. • Variable is a characteristics that varies with an individual or an object, is called a variable. For example age is variable as it varies from person to person. • A quantitative variable may be classified as discrete or continuous. • A discrete variable is one that can take only a discrete set of integers or whole numbers, that is the values are taken by jumps or breaks. • A variable is called a continuous variable if it can take on any value—fractional or integer—within a given interval.
Discrete Data – US Crime Statistics; Counts of Occurrences.
Collection of Data • The most important part of statistical work is perhaps the collection of data. • Statistical data is collected either by a complete enumeration of the whole field, called census, which in many cases would be too costly and too time consuming. • Data that have been originally collected (raw data) and have not undergone any sort of statistical treatment, are called primary data, • While data that have undergone any sort of treatment by statistical methods at least once, i.e. the data have been collected, classified, tabulated or presented in some form for a certain purpose, are called secondary data. • Editing of data, uses and misuses of data.
Data Presentation Agenda • Data and Data Types • Representing Data: pie chart, bar chart. • Summarizing Data: box plot, histogram • Central tendency • Spread • Distribution (shape)
Data = A Set of FactsA picture of some aspect of the world Pizza Sales by Type What do the data tell you? How can you use the information? What additional information would make these data more informative?
Data Types and Measurement • Quantitative • Discrete = count: Number of car accidents by city by time • Continuous = measurement: Housing prices • Qualitative • Categorical: Shopping mall, car brand, trip mode • Ordinal: Survey data on attitudes; “How do you feel about…?” Strongly disagree Disagree Neutral Agree Strongly agree Moody’s bond ratings: Aaa, Aa, A, Bbb, Bb, B, and so on.
Ordered Qualitative DataGerman Health Satisfaction Survey; 27,326 individuals. On a scale from 0 to 10, how do you feel about your health?
Unordered Qualitative DataTravel Mode Between Sydney and Melbourne by 210 Travelers
Quantitative vs. Qualitative Data Qualitative Data: No units of measurement Arithmetic manipulation is usually meaningless. The average of Air and Bus is not Train Quantitative Data: Units of measurement make sense. Arithmetic computations make sense.
General types of statistical studies • Statistics is concerned with the collection and analysis of data. • There are several different types of statistical studies that are used to collect data. • Let's take a look at surveys, experimental studies and observational studies.
Survey • 1. Survey - Statistical surveys are used to collect quantitative information from a specific population. A survey may focus on opinions or factual information depending upon the purpose of the study. Surveys may involve answering a questionnaire or being interviewed by a researcher. The census is a type of survey.
Experimental Study • 2. Experimental study - In an experimental study, the researcher takes measurements, or surveys, the sample population. The researcher then manipulates the sample population in some manner. After the manipulation, the researcher re-measures, or re-surveys, using the same procedures to determine if the manipulation possibly changed the measurements. • During a "controlled" experiment, the researcher will separate the sample population into groups with one group established as the control group. All groups will be manipulated in some manner, except for the control group which will remain the same.
Observational Study • 3. Observational study - In an observational study, the sample population being studied is measured, or surveyed, as it is. The researcher does not influence the population in any way or attempt to intervene in the study. There is no experimental manipulation. Instead, data is simply gathered and correlations are investigated.
Representing Data • In raw form • Transformed to a visual form • Summarized graphically • Summarized statistically
Pie Chart Pizza Pies Sold, by Type
Data Representation BARCHARTPIE CHART Same data. Which is easier to understand?
A Box Plot Describes the Distributionof Values in a Set of Data Hawaii Box and Whisker Plot for House Price Listings
Making a Box Plot for Per Capita Income Maximum=31136 3rdQuartile = 24933 Interquartile Range = IQR= 24933-21677 = 3256 Median=22610 1stQuartile = 21677 Minimum=17043
Histogramfor House Price Listings A histogram describes the sample data and suggests the nature of the underlying data generating process. Note the “skewness” of the distribution of listings. HOG, pp. 16-18
Distribution of House Price Listings … shows up in the box and whisker plot. Note the long whisker at the top of the figure. Asymmetry (skewness) in the histogram of listing prices…
A Caution About Graphical Data Summaries Graphical tools can be very badly behaved when: (1) The data have only a few observations. (2) There are wild observations in the data set. The box and whisker plot is distorted (and dominated) by one wildly errant observation.
Measures of location • Often it is not possible to list all the data or draw a histogram; it would be nice to have one number which best represents a data set • Often where the data lies is of interest, for which purpose a measure of location is useful.
Measure of location • Mean – arithmetic average = x/n • Median – the halfway point • Mode – the most common answer • Every value in the list is a MODE: If each value occurs exactly once, so all are "most common."
Measure of location 1 2 2 2 3 3 Mean = 5.3 Median = 3 Mode = 2 4 4 27
Measure of location 0 1 2 3 4 5 Mean = 8.3 Median = 4 Mode = 27 6 27 27
Measure of location 1 1 1 2 2 23 Mean = 9.4 Median = 2 Mode = 1 24 26 27
Measure of Variability • Range – Overall difference between the highest and lowest scores. • SET OF SCORES: 7, 2, 7, 6, 5, 6, 2 RANGE = 7 - 2 = 5 • Variance – Average difference from the mean. • CALCULATED BY SQUARING THE STANDARD DEVIATION (S2) • STANDARD DEVIATION = S = 4 • VARIANCE = S2 = 42 = 16
Variability Identical Range 1 9 9 9 9 9 11 11 11 11 11 19 • 1 1 • 1 1 • 1 1 • 19 19 • 19 19 • 19 19
Variability Identical Variance 1 9 9 9 9 9 11 11 11 11 11 19 • 6 6 • 6 6 • 6 7 • 13 14 • 14 14 • 14 14
Conclusions • Statistics are useful for figuring out random noise from real effects • 2) Numbers are not absolute, and they can be easily manipulated • 3) Always scrutinize data closely, and draw your own conclusions. • 4) 85% of all statistics are made up on the spot: the rest are all wrong
Frequency Distribution • A frequency distribution is a table that organises data into classes • A class is a group of values describing ONE characteristic of the data • It shows the number of observations from the data that fall into each class • Frequency distribution can be constructed by determining how often ('with what frequency') values occur inside each class of a data set • Fewer classes mean more data compression
Relative Frequency Distribution • Frequency of each value can be expressed as a fraction or percentage of the total number of observations • This could help us compare data from samples that are of different sizes
Discrete & Continuous Classes • DISCRETE : In this case, the data in a class can take ONE discrete value : • 0, 1, 2, ... • CONTINUOUS : In this case, the data in a class can take any value in a range • > 0; <= 1 • > 1; <= 2 • > 2; <= 3 • And so on
Qualitative & Continuous Classes • Discrete Classes can also be used to model Qualitative Classes • Where the data does not take specific numerical values but falls into certain qualitative that is non-numeric categories • Continuous classes cannot have qualitative data • Unless you want to prove a point !!
Characteristics of Classes • All Inclusive • All the data must fall into or other class • Sum of relative frequencies must add up to 1 • Mutually Exclusive • Greater Than ( > ) Lower Class Boundary • Less Than OR Equal to ( <=) Upper Class Boundary • First and Last Class open ended 0 count for ratings <= 10 1 count for ratings > 90 ratings <= 100 0 count for ratings > 100
Constructing a Frequency Distribution • Decide on Type of Class • Quantitative or Qualitative measure ? • Decide on Number of Classes • More classes : give more information • Fewer classes : easier to interpret • Rule of Thumb : Between 6 and 15 classes • Determine width of class interval [Largest Value] – [Unit Value before Smallest Value] Total Number of Class Intervals • Determine the number of points in each class • Illustrate the data in a chart
Using a spreadsheet toConstruct a Frequency Distribution • Functions used • Max • Min • Roundup • Round down • Sum • Frequency
Grouped Data • Grouped data is data that has been organized into groups known as classes. Grouped data has been 'classified' and thus some level of data analysis has taken place, which means that the data is no longer raw. • A data class is group of data which is related by some user defined property. For example, if you were collecting the ages of the people you met as you walked down the street, you could group them into classes as those in their teens, twenties, thirties, forties and so on. Each of those groups is called a class.
Grouped Data • Each of those classes is of a certain width and this is referred to as the Class Interval or Class Size. This class interval is very important when it comes to drawing Histograms and Frequency diagrams. All the classes may have the same class size or they may have different classes sizes depending on how you group your data. The class interval is always a whole number.
Below is an example of grouped data where the classes have the same class interval.