00:00

Chi-Square Test of Independence in Contingency Tables

Contingency tables, Chi-Square Test of Independence, hypothesis testing, test statistic calculation, and requirements for conducting the test are discussed. The test determines if there is an association between categorical variables using observed and expected frequencies in a contingency table. It is important for analyzing relationships between variables but does not imply causation.

minet
Download Presentation

Chi-Square Test of Independence in Contingency Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chi-Square Test B.N.CHATHURANI RN, M . Sc in Bio-Statistics, B . Sc in Nursing, University of Peradeniya bnchathurani@gmail.com

  2. Contingency tables  Frequency tables of two variables presented simultaneously are called contingency tables. Contingency tables are constructed by listing all the levels of one variable as rows in a table and the levels of the other variables as columns, then finding the joint or cell frequency for each cell. The cell frequencies are then summed across both rows and columns. The sums are placed in the margins, the values of which are called marginal frequencies. The lower right hand corner value contains the sum of either the row or column marginal frequencies, which both must be equal to N.

  3. Contingency tables

  4. Chi-Square Test of Independence  The Chi-Square Test of Independence determines whether there is an association between categorical variables (i.e., whether the variables are independent or related). It is a nonparametric test.  This test is also known as Chi-Square Test of Association.  This test utilizes a contingency table to analyze the data. A contingency table (also known as a cross- tabulation, crosstab, or two-way table) is an arrangement in which data is classified according to two categorical variables. The categories for one variable appear in the rows, and the categories for the other variable appear in columns. Each variable must have two or more categories. Each cell reflects the total count of cases for a specific pair of categories.

  5. The Chi-Square Test of Independence is commonly used to test the following:  Statistical independence or association between two or more categorical variables.  The Chi-Square Test of Independence can only compare categorical variables. It cannot make comparisons between continuous variables or between categorical and continuous variables. Additionally, the Chi-Square Test of Independence only assesses associations between categorical variables, and can not provide any inferences about causation.  If your categorical variables represent "pre-test" and "post-test" observations, then the chi-square test of independence is not appropriate. This is because the assumption of the independence of observations is violated. In this situation, McNemar's Test is appropria

  6. Your data must meet the following requirements:  Two categorical variables.  Two or more categories (groups) for each variable.  Independence of observations.  There is no relationship between the subjects in each group.  The categorical variables are not "paired" in any way (e.g. pre-test/post-test observations).  Relatively large sample size.  Expected frequencies for each cell are at least 1.  Expected frequencies should be at least 5 for the majority (80%) of the cells.

  7. Hypothesis  The null hypothesis (H0) and alternative hypothesis (H1) of the Chi-Square Test of Independence can be expressed in two different but equivalent ways:  H0: "[Variable 1] is independent of [Variable 2]"  H1: "[Variable 1] is not independent of [Variable 2]"  OR  H0: "[Variable 1] is not associated with [Variable 2]"  H1: "[Variable 1] is associated with [Variable 2]"

  8. Test Statistic  The test statistic for the Chi-Square Test of Independence is denoted Χ2, and is computed as:  where  oij is the observed cell count in the ith row and jth column of the table  eij is the expected cell count in the ith row and jth column of the table, computed as  The quantity (oij - eij) is sometimes referred to as the residual of cell (i, j), denoted rij.  The calculated Χ2 value is then compared to the critical value from the Χ2 distribution table with degrees of freedom df = (R - 1)(C - 1) and chosen confidence level. If the calculated Χ2 value > critical Χ2 value, then we reject the null hypothesis

  9. How to calculate  For a contingency table that has r rows and c columns, the chi square test can be thought of as a test of independence. In a test ofindependence the null and alternative hypotheses are;  Ho: The two categorical variables are independent.  Ha: The two categorical variables are related.

  10. Conti….  We can use the equation Chi Square = the sum of all the (fo - fe)2 / fe  Here fo denotes the frequency of the observed data and fe is the frequency of the expected values. The general table would look something like the one below Now we need to calculate the expected values for each cell in the table and we can do that using the the row total times the column total divided by the grand total (N). For example, for cell a the expected value would be (a+b+c)(a+d+g)/N. Once the expected values have been calculated for each cell, we can use the same procedure are before for a simple 2 x 2 table.

  11. To check whether there is association between Malaria and location

  12. We could now set up the following table Chi Square = 125.516 Degrees of Freedom = (c - 1)(r - 1) = 2(2) = 4

  13. Chi square table Reject Ho because 125.516 is greater than 9.488 (for alpha = 0.05) Thus, we would reject the null hypothesis that there is no relationship between location and type of malaria. Our data tell us there is a relationship between type of malaria and location

  14. PROBLEM STATEMENT  In the sample dataset, respondents were asked their gender and whether or not they were a cigarette smoker. There were three answer choices: Nonsmoker, Past smoker, and Current smoker. Suppose we want to test for an association between smoking behavior (nonsmoker, current smoker, or past smoker) and gender (male or female) using a Chi-Square Test of Independence (we'll use α = 0.05

  15. HOW TO RUN CHI SQUARE TEST IN SPSS  Open the Crosstabs dialog (Analyze > Descriptive Statistics > Crosstabs).  Select Smoking as the row variable, and Gender as the column variable.  Click Statistics. Check Chi-square, then click Continue.  (Optional) Check the box for Display clustered bar charts.  Click OK.

  16. INTERPRETATION OF RESULTS  OUTPUT  TABLES  The first table is the Case Processing summary, which tells us the number of valid cases used for analysis. Only cases with no missing values for both smoking behavior and gender can be used in the test

  17. crosstabulation and chi-square test results

  18. CONTI…  The key result in the Chi-Square Tests table is the Pearson Chi-Square.  The value of the test statistic is 3.171.  The footnote for this statistic pertains to the expected cell count assumption (i.e., expected cell counts are all greater than 5): no cells had an expected count less than 5, so this assumption was met.  Because the test statistic is based on a 3x2 crosstabulation table, the degrees of freedom (df) for the test statistic is df=(R−1)∗(C−1)=(3−1)∗(2−1)=2∗1=2  The corresponding p-value of the test statistic is p = 0.205.

  19. DECISION AND CONCLUSIONS  Since the p-value is greater than our chosen significance level (α = 0.05), we do not reject the null hypothesis. Rather, we conclude that there is not enough evidence to suggest an association between gender and smoking.  Based on the results, we can state the following:  No association was found between gender and smoking behavior (Χ2(2)> = 3.171, p = 0.205).

  20. Summary  For all chi-square tests, the chi-square test statistic χ2 is the same. It measures how far the observed data are from the null hypothesis by comparing observed counts and expected counts. Expected counts are the counts we expect to see if the null hypothesis is true.  The chi-square model is a family of curves that depend on degrees of freedom. For a one-way table the degrees of freedom equals (r – 1). For a two-way table, the degrees of freedom equals (r – 1)(c – 1). All chi-square curves are skewed to the right with a mean equal to the degrees of freedom.  A chi-square model is a good fit for the distribution of the chi-square test statistic only if the following conditions are met:  The sample is randomly selected.  All expected counts are 5 or greater.

  21. Thank You

More Related