1 / 84

Categorical Outcomes Making Comparisons

Categorical Outcomes Making Comparisons. Chapter 4. Outline. Describing: Numerical summaries Graphical summaries One-sample comparisons: Historical controls Multiple-sample comparisons: Dichotomous outcome Categorical outcomes Measures of association. Categorical Outcomes. Gaps:

lilith
Download Presentation

Categorical Outcomes Making Comparisons

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Categorical Outcomes Making Comparisons Chapter 4

  2. Outline • Describing: Numerical summaries Graphical summaries • One-sample comparisons: Historical controls • Multiple-sample comparisons: Dichotomous outcome Categorical outcomes • Measures of association

  3. Categorical Outcomes • Gaps: Only limited number of values/categories possible Nothing “in-between” • Examples: Dichotomous (two categories) Nominal (categories without order) Ordinal (categories with order)

  4. Learning Objectives • How do I describe categorical data? • How do I make comparisons? • How do I investigate associations?

  5. Public Health Application • More than three-quartersof global malaria deaths occur in under-five children living in malarious countries in sub-Saharan Africa. 25% of all childhood mortality below the age of five is attributable to malaria. About 30–40% of all fevers seen in health centers in Africa are due to malaria with huge seasonal variability between rainy and dry seasons.

  6. Data Description • Cross-sectional study conducted to investigate factors related to insecticide-treated net (ITN) use: 1876 households with an ITN: • Demographic variables (age of the head of household, household wealth, miles to the nearest healthcare facility, rural/urban, family size, etc.) • Children under the age of five? • Was an ITN used the previous night?

  7. Research Question What factors are associated with ITN use?

  8. DESCRIBING THE DATA

  9. Describing the Data • Numerical summaries: Counts, proportions, and percentages • Graphical summaries: Pie charts Bar graphs

  10. Most Important Step in Data Analysis • Describe the data: Before making conclusions or inferences, an investigator needs to fully understand what the data looks like. • Numerical and graphical summaries cannot be skipped! Need this information to choose the most appropriate statistical method Need this information for valid statistical inferences

  11. Graphical Summaries

  12. Bar Graphs • Provide a visual comparison among groups. Vertical axis represents the number of subjects. • The higher the bar, the more the subjects. Horizontal axis represents categories. • Ordinal: Order matters. • Nominal: Order does not matter.

  13. Bar Graphs Ordinal Variable Nominal Variable

  14. Bar Graphs • Graphically compare groups for some categorical outcome.

  15. Pie Charts • Provides a visual description of how parts compare to a whole

  16. Numerical Summaries

  17. Numerical Summaries • Categorical variables are described by reporting the number of subjects within each category. Counts Proportions Percentages

  18. Proportion • The fraction of the subjects belonging to a particular category. • The proportion of the population is a parameter. • The proportion of the sample is a statistic

  19. Commonly Used Numerical Summaries

  20. One-Sample Comparisons

  21. Description of the Sample • A sample 1876 of households living in a tropical region where malaria is problematic: • The majority (51%) of the households are more than 50 miles from a healthcare facility and live in a rural area (53%). • Almost half (44%) of the households have a child under the age of five. • The average age for the head of the household is 48 (SD = 7.4). • Median family size of 6 with a range of 1–12. • Most (73%) of the households did not use an ITN the previous night.

  22. Why a One-Sample Study? • Obtaining an additional group or sample for comparisons may not be practical. Comparisons involve historical control(s).

  23. Historical Controls • Want to compare what you found in the sample to something: Do your results differ from what has been previously published/reported? • Historical controls: Control data are not collected concurrently within the same study. • different time period • different region • different population • different kind of exposure • Seems economical—why not use historical controls all the time?

  24. One-Sample StudyITN Utilization • Data for this study were collected during the rainy season. How do the results compare with those of the dry season? • Is the season (rainy or dry) associated with the utilization of ITN?

  25. Inference for the One-Sample Study • Hypothesis tests • Assume the null parameter is the true parameter Historical control study: Null parameter = Historical value • Decide whether the data support this assumption • Confidence intervals • Estimate the true parameter using interval • Can use the interval estimate to determine if assumptions about the parameter are reasonable

  26. Inference for the One-Sample StudyHistorical Controls • Research hypothesis: The true proportion (p) in the rainy season is not 0.20. • Null hypothesis: The true proportion (p) in the rainy season is 0.20.

  27. Inference for the One-Sample StudyITN Utilization

  28. Planning • Estimation: Width of the interval Estimate of the proportion • Comparison of proportions: Power Significance level Effect size

  29. Exact Tests • When the sample size is large (and the proportion is not too small), the normal approximation is used. What if this is not reasonable? • Exact tests allow for comparisons without using the normal distribution. Use binomial distribution.

  30. Comparing a dichotomous outcome between two groups MULTIPLE-SAMPLE Comparisons

  31. Description of the Sample • Households with children under five (n = 833) and without (n = 1043): • Similar with respect to age and family size. • Those with children under five in the household report more net use than those without children under 5 (34% vs 21%).

  32. Description of the Sample Households using ITN (n = 500) • Report a higher percentage of children under five • Are more likely to live in a thatched roof • Have a higher percentage of households living within 15 miles of a healthcare facility • Are more likely to live in a rural area • Have, on average, younger household heads • Have larger families

  33. Why a Two-Sample Study? • Provides an independent comparator group: Treatment vs control Exposed vs unexposed • Different outcomes between the groups may mean that the group is associated with the outcome.

  34. 2 x 2 Contingency Table

  35. Contingency Table

  36. Conditional Probabilities • Proportion of subjects with a category given some other condition is true • Really an issue of what is the denominator • Makes a difference how you interpret Row proportion Column proportion

  37. Total, Row, and Column Proportions

  38. Difference in Proportions • Statistical test does not care if you are comparing differences between column proportions and row proportions. • A difference in proportions translates to the two categorical variables being dependent.

  39. Two-Sample Study • Does having a child under the age of five impact the utilization of ITN?

  40. Inference for the Two-Sample Study Hypothesis tests • Assume the null parameter is the true parameter • The groups have the same proportion. • The true difference between proportions is 0. • The two categorical variables are independent. • Decide whether the data support this assumption

  41. Inference for the Two-Sample Study pU5 = The true proportion of ITN use in households with children under five pO5 = The true proportion of ITN use in households with no children under five • Null hypothesis pU5 = pO5 Using ITN and having children under the age of five are independent • Research hypothesis pU5 ≠ pO5 Using ITN and having children under the age of five are dependent.

  42. Inference for the Two-Sample Study

  43. Planning • Balanced design? • Overall test or comparison between groups? • Estimation: Width of the interval Amount of variability • Comparison of means: Power Significance level Effect size

  44. Comparing categorical outcomes between two or more groups Multiple-Sample Comparisons

  45. Categorical Variables • Different research questions result in different types of categorical variables. The outcome does not have to be dichotomous. There can be more than two groups to compare.

  46. R x C Contingency Table

  47. Contingency Table

  48. Conditional Probabilities • Proportion of subjects with a category given some other condition is true • Really an issue of what is the denominator • Make a difference how you interpret Row proportion Column proportion • Same as when there were only two groups and only two categories in the outcome

  49. Inference • As categorical variables can be dichotomous, nominal, or ordinal, different hypotheses are possible. May require different tests • Hypothesis tests Assume the null hypothesis is true Decide whether the data support this assumption

  50. Two Nominal Variables • Is there an association between the type of roof and the type of net used?

More Related