Introduction to Basic Statistics

Introduction to Basic Statistics

Sx x = n Mean The mean is the sum of the values of a set of data divided by the number of values in that data set. (pronounced “X-bar”)

Mean Sx x = n x = individual data value n = # of data values in the data set S= summation of a set of values

Mean Data Set: 3 7 12 17 21 21 23 27 32 36 44 Sum of the values = 243 Number of values = 11 Sx 243 x = = Mean = = 22.09 n 11

Mode The most frequently occurring value in a set of data is the mode. Symbol… M Data Set: 27 17 12 7 21 44 23 3 36 32 21

Mode The most frequently occurring value in a set of data is the mode. Data Set: 3 7 12 17 21 21 23 27 32 36 44 Mode = 21

Mode The most frequently occurring value in a set of data is the mode. Note: If two numbers of equal frequency stand out, then the data set is “bimodal.” If more than two numbers of equal frequency stand out, then the data set is “multi-modal.”

Median The median is the value that occurs in the middle of a set of data that has been arranged in chronological order. ~ Symbol… x pronounced “X-tilde”

Median The median is the value that occurs in the middle of a set of data that has been arranged in chronological order. Data Set: 27 17 12 7 21 44 23 3 36 32 21 Median = 21

Median Note: A data set that contains an odd # of values always has a Median. For an even # of values, the two middle values are averaged with the result being the Median. Data Set: 3 7 12 17 21 21 23 27 32 36 44 Median = 21

Range The range is the difference between the largest and smallest values that occur in a set of data. Symbol… R Data Set: 3 7 12 17 21 21 23 27 32 36 44 Range = 44-3 = 41

Standard Deviation

Two classes took a recent quiz. There were 10 students in each class, and each class had an average score of 81.5

Since the averages are the same, can we assume that the students in both classes all did pretty much the same on the exam?

The answer is… No.The average (mean) does not tell us anything about the distribution or variation in the grades.

Here are Dot-Plots of the grades in each class:

Mean

So, we need to come up with some way of measuring not just the average, but also the spread of the distribution of our data.

Why not just give an average and the range of data (the highest and lowest values) to describe the distribution of the data?

Well, for example, lets say from a set of data, the average is 17.95 and the range is 23. But what if the data looked like this:

Here is the average But really, most of the numbers are in this area, and are not evenly distributed throughout the range. And here is the range

The Standard Deviation is a number that measures how far away each number in a set of data is from their mean.

If the Standard Deviation is large,it means the numbers are spread out from their mean.If the Standard Deviation is small, it means the numbers are close to their mean. large, small,

Here are the scores on the math quiz for Team A: Average: 81.5

The Standard Deviation measures how far away each number in a set of data is from their mean. For example, start with the lowest score, 72. How far away is 72 from the mean of 81.5? 72 - 81.5 = - 9.5 - 9.5

Or, start with the lowest score, 89. How far away is 89 from the mean of 81.5? 89 - 81.5 = 7.5 - 9.5 7.5

Distance from Mean So, the first step to finding the Standard Deviation is to find all the distances from the mean.

Distance from Mean Next, you need to square each of the distances to turn them all into positive numbers Distances Squared

Distance from Mean Add up all of the distances Distances Squared Sum: 214.5

Distance from Mean Divide by (n - 1) where n represents the amount of numbers you have. Distances Squared Sum: 214.5 (10 - 1) = 23.8

Distance from Mean Finally, take the Square Root of the average distance Distances Squared Sum: 214.5 (10 - 1) = 23.8 = 4.88

Distance from Mean This is the Standard Deviation Distances Squared Sum: 214.5 (10 - 1) = 23.8 = 4.88

Distance from Mean Now find the Standard Deviation for the other class grades Distances Squared Sum: 2280.5 (10 - 1) = 253.4 = 15.91

Now, lets compare the two classes again 81.5 81.5 4.88 15.91

Histogram A histogram is a common data distribution graph that is used to show the frequency with which specific values, or values within ranges, occur in a set of data. An forensic engineer might use a histogram to show the most common, or average, dimension that exists among a group of identical manufactured parts.

0 3 -1 -3 3 2 1 0 -1 -1 2 1 0 1 -1 -2 1 2 1 0 -2 -4 0 0 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

Histogram Specific values, called data elements, are plotted along the X-axis of the graph. -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 Data Elements

Histogram Large sets of data are often divided into limited number of groups. These groups are called classintervals. -6 to -16 6 to 16 -5 to 5 Class Intervals

Histogram The number of data elements is shown by the frequency, which occurs along the Y-axis of the graph. 7 5 Frequency 3 1 -6 to -16 6 to 16 -5 to 5

Normal Distribution “Is the data normal?” Translation…does the greatest frequency of the data values occur about the mean value?

Normal Distribution Mean Value Frequency -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 Data Elements

Normal Distribution “Is the process normal?” Further Translation…does the data form a bell shape curve when plotted on a histogram?

Normal Distribution Frequency -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 Data Elements

Chapter 5: Probability Concepts 5: Probability Concepts

In Chapter 5: 5.1 What is Probability? 5.2 Types of Random Variables 5.3 Discrete Random Variables 5.4 Continuous Random Variables 5.5 More Rules and Properties of Probability 5: Probability Concepts

Definitions • Random variable ≡ a numerical quantity that takes on different values depending on chance • Population ≡ the set of all possible values for a random variable • Event ≡ an outcome or set of outcomes • Probability ≡ the relative frequency of an event in the population … alternatively… the proportion of times an event is expected to occur in the long run 5: Probability Concepts

Example • In a given year: 42,636 traffic fatalities (events) in a population of N = 293,655,000 • Random sample population • Probability of event = relative freq in pop= 42,636 / 293,655,000 = .0001452 ≈ 1 in 6887 5: Probability Concepts

Introduction to Basic Statistics

Introduction to Basic Statistics

Presentation Transcript

Basic Statistics

Introduction to Basic Statistics for Clinical Research

Basic statistics

Basic Statistics

Basic Statistics

Basic Statistics

Basic statistics

Basic Statistics

Statistics A Basic Introduction and Review

Basic Statistics

Basic Statistics

Basic Statistics

Basic Statistics

Basic Statistics

Basic Statistics

Basic Statistics

Basic statistics

Basic Statistics

Basic Statistics

Basic Statistics

BASIC STATISTICS