1 / 16

Testing for a Relationship Between 2 Categorical Variables

Testing for a Relationship Between 2 Categorical Variables. The Chi-Square Test …. Rel’nship between owning a bike and having a significant other?. Rows: Bike Columns: SigOther No Yes All No 37 27 64 57.81 42.19 100.00

allan
Download Presentation

Testing for a Relationship Between 2 Categorical Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Testing for a Relationship Between 2 Categorical Variables The Chi-Square Test …

  2. Rel’nship between owning a bike and having a significant other? Rows: Bike Columns: SigOther No Yes All No 37 27 64 57.81 42.19 100.00 Yes 10 18 28 35.71 64.29 100.00 All 47 45 92 51.09 48.91 100.00 Cell Contents -- Count % of Row

  3. Our Hypotheses • If there is no relationship, we’d expect the percentages (proportions) in each group to be equal. So: • H0: There is no relationship between owning a bike and having a significant other. Or, pN = pY. • HA: There is a relationship. Or, pN pY.

  4. What would the table look like if there was no relationship? Rows: Bike Columns: SigOther No Yes All No 37 27 64 Yes 10 18 28 All 47 45 92 Cell Contents -- Observed Counts 45/92, or 48.9%, would have an SO regardless of owning a bike. 64-31.3 32.7 31.3 So, 0.489(64), or 31.3, non-bikers would have SO. 14.3 13.7 28-13.7 And, 0.489(28), or 13.7, bikers would have SO. ExpectedCounts

  5. Calculate (observed - expected)2/expected for each of the cells. Are observed counts very different from expected counts? For first cell: (37 - 32.7)2/32.7 = 0.565 For second cell: (27 - 31.3)2/31.3 = 0.591 For third cell: (10 - 14.3)2/14.3 = 1.293 For fourth cell: (18 - 13.7)2/13.7 = 1.350

  6. Add up the resulting quantities to get the value of the “chi-square statistic” for the table. Are observed counts very different from expected counts? Chi-square statistic = 0.565 + 0.591 + 1.293 + 1.350 = 3.80 If the chi-square statistic is large, then the observed counts are very different than the counts we’d expect to get if there is no relationship.

  7. The P-value • How likely is it that we’d get a chi-square statistic as large as we did if the proportions are equal? • The chi-square statistic follows the chi-square distribution with (r-1)(c-1) degrees of freedom, where r and c are the number of rows and columns, respectively, in the table. • We’ll let Minitab calculate the P-value.

  8. Rel’nship between owning a bike and having a significant other? Rows: Bike Columns: SigOther No Yes All No 37 27 64 32.70 31.30 64.00 Yes 10 18 28 14.30 13.70 28.00 All 47 45 92 47.00 45.00 92.00 Chi-Square = 3.807, DF = 1, P-Value = 0.051 Cell Contents -- Count Exp Freq DF= (2-1)(2-1) = 1

  9. Chi-Square Test in Minitab when data are notsummarized • Select Stat >> Tables >> Cross Tabulation • Select two Classification Variables. The first (second) variable you select will be the row (column) variable. • Under Display, select what you want shown--perhaps, counts and row percents. • Click on box labeled Chi-Square Analysis. • Select OK.

  10. Chi-Square Test in Minitab when data are summarized • Enter observed counts in table format. • Select Stat >> Tables >> Chi-Square Test • Specify the columns containing the table. • Select OK.

  11. Miscellaneous issues • Relationship of chi-square test to Z test • Significant relationships not necessarily true relationships. • Assumptions

  12. Rel’nship between owning a bike and having a significant other? Success = Having Significant Other Bike X N Sample p No 27 64 0.421875 Yes 18 28 0.642857 Estimate for p(No) - p(Yes): -0.220982 95% CI for p(No) - p(Yes): (-0.435780, -0.00618412) Test for p(No) - p(Yes) = 0 (vs not = 0): Z = -1.95 P-Value = 0.051

  13. Relationship between Z test and chi-square test • Two-tailed Z-test for two proportions (using a pooled estimate of p) and a chi-square test for a 2-by-2 table will give exactly same P-value. • Use Z-test for one-tailed tests (to see if one proportion is larger than other.) • Use chi-square test for two-tailed tests and for larger than 2-by-2 tables.

  14. Rel’nship between owning bike and having a significant other? Rows: bike Columns: steady No Yes All No 67 49 116 57.76 42.24 100.00 Yes 33 26 59 55.93 44.07 100.00 All 100 75 175 57.14 42.86 100.00 100 75 175 Chi-Square = 0.053, DF = 1, P-Value = 0.817 Cell Contents -- Count % of Row Using Fall 1998 data, conclude no relationship.

  15. If test suggests relationship exists... • Is there a reasonable explanation for a relationship? • If not, consider possibility of having made a Type I error. • If so, collect data on another random sample and see if new data suggest relationship. If so, start believing it … but still go collect more data …

  16. Ah, those darn assumptions ... • P-value will only be accurate if you have large enough sample. • “Large enough” here means: • no cells have an expected count less than 1 • no more than 20% of the cells have an expected count less than 5 (in a 2-by-2, means no cells). • Minitab will print a warning if assumptions are violated.

More Related