Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses

Fundamentals of Data AnalysisLecture 4Testing of statistical hypotheses

Program for today • Statistical hypothesis; • Parametric (standard) tests; • Nonparametric (distribution free) tests.

Statistical hypothesis • While attempting to make decisions some necessary assumptions or guesses about the populations or statements about the probability distribution of the populations made are called statistical hypothesis. • These assumptions are to be proved or disproved • A predictive statement usually put in the form of a null hypothesis and alternate hypothesis • Capable of being tested by scientific methods, that relates anindependent variable to some dependent variable

Testing of statistical hypotheses • Researcher bets in advance of his experiment that the results will agree with his theory and cannot be accounted for by the chance variation involved in sampling • Procedures which enable researcher to decide whether to accept or reject hypothesis or whether observed samples differ significantlyfromexpectedresults .

Testing of statistical hypotheses • Plan & conduct experiment so that if the results are not explained by the chance variation, theory is confirmed • Collect data • Set null hypotheses i.e. assume that results are due to chance alone • Use a theoretical sampling distribution • Obtain probability of sample data as if it is chance variation • If probability at 5 is less than some predetermined small percentage (say 1% or 5%) reject the null hypothesis and accept the alternate hypothesis

Procedure for hypothesis testing • Plan & conduct experiment so that if the results are not explained by the chance variation, theory is confirmed • Collect data • Set null hypotheses i.e. assume that results are due to chance alone • Use a theoretical sampling distribution • Obtain probability of sample data as if it is chance variation • If probability at 5 is less than some predetermined small percentage (say 0.1%, 1% or 5%) reject the null hypothesis and accept the alternate hypothesis

Type I and type II errors • Error is determined in advance as level of significance for a given sample size • If we try to reduce type I error, the probability of committing type II error increases • Both type errors cannot be reduced simultaneously • Decision maker has to strike a balance / trade off examining thecosts & penalties of both type errors

Null (H0) & Alternative (Ha) hypotheses H0-while computing two methods assuming that both are equally good Ha - a set of alternative to H0or rejecting the H0(what one wishes to prove)

The level of significance • Some percentage (usually 5%) chosen with great care, thought & reason so that how will be rejected when the sampling result (observed evidence) has a probability of <0.05 of occurring ifH0istrue • Researcher is willing to take as much as a 5% risk of rejectingH0 • Significance level is the maximum value of the probability of rejecting H0when it is true • It is usually determined in advance, I.e., the probability of type I error (α)is assigned in advance and hence nothing can be done about it

Parametric tests Parametric tests allow you to make a number of requests for various statistical parameters. Examination of phenomena by calculating the parameters is a very effective way to learn, this is due to a concise and accurate form of the description. Parametric tests, despite its diversity, do not give answers to all the important questions, mainly because these tests can be applied if the tested quantity (the population) has normal distributionor very close to it. In addition, parametric tests, as the name suggests, describe a property of the phenomenon under study (test results), without giving sufficient grounds to formulate general conclusions.

Student's t test • It is based on t-distribution and only incase of small samples • Used for testing difference between means of two samples, coefficient of simple & partialcorrelations, etc • Using this test, we can test the null hypothesis as: • H0 : m = m0 • while the alternative hypothesis is as follows: • H1: mm0

Student's t test In fact very few know the mean valueand standard deviation of the general population, so we must be satisfied with estimatevalueusing most frequently applied estimators - the average of the sample : and standard deviationinsidethesamplecalculatewiththeaid of equation:

Student's t test We mustcalculatethestatistics: whichhastheStudent’s t distributionwithn - 1 degrees of freedom (n – number of samples), provided that the population is normal or very close to it.

Student's t test So, if you want to check the null hypothesis of equality of the mean value for the sample with the average for the population, we use the Student's t-distribution tables and for ithe assumed level of confidence and read the critical valueta, suchthat: Now compare this value t with the critical value ta and if:  |t|  tathen reject the null hypothesis;  |t| < tathen there is no reason to reject the null hypothesis.

Student’s t test Example We know that the average light time of thebulb is m0 = 1059 hours. After making changes in the technology decided to see if these changes have not shortened the lighttime. The null hypothesis is therefore of the formH0 : m1 = m0, ie: the average burn time has not changed bulbs. For testing random sample of 10 light bulbs was taken, the results of these studies are presented in Table.

Student’s t test Example Lighting time

Student’s t test Example Read from the tables for a confidence level 0.95 critical valueta= 1.833, therefore there is no reason to reject the null hypothesis.

Student’s t test Exercise 12 farms was independently drawn in a village and the following values of crops of oats was obtainedfor them: 23.3, 22.1, 21.8, 19.9, 23.7, 22.3, 22.6, 21.5, 21.9, 22.8, 23.0, 22.2 At the level of significance 5% test the hypothesis that the value of the average yield of oats in the whole village is 22,6 q/ha, alternative hypothesis is that the value of the average yield of oats is higher.

To be continued … !

Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses