1 / 53

The MHSIP: A Tale of Three Centers

Presentation Overview. Accountability in mental healthDescription and intended use of the MHSIPReview of constructs of measurementPurpose and Methods Results of the psychometric investigationReliabilitiesMeasurement invarianceDifferential item functioningDiscussion of resultsFuture directio

zachary
Download Presentation

The MHSIP: A Tale of Three Centers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. The MHSIP: A Tale of Three Centers P. Antonio Olmos-Gallo, Ph.D. Kathryn DeRoche, M.A. Mental Health Center of Denver Richard Swanson, Ph.D., J.D. Aurora Research Institute John Mahalik, Ph.D., M.P.A. Jefferson Center for Mental Health Presented at the Organization for Program Evaluation in Colorado Annual Meeting, May 15, 2008 1

    2. Presentation Overview Accountability in mental health Description and intended use of the MHSIP Review of constructs of measurement Purpose and Methods Results of the psychometric investigation Reliabilities Measurement invariance Differential item functioning Discussion of results Future directions for accountability in mental health 2

    3. Accountability in Mental health 3

    4. Accountability in Mental Health 4

    5. How does accountability work in MH? Accountability has changed from Formative- to more Summative-oriented Grant funding (Federal, Private) requires that outcomes be demonstrated (NOMS, GPRA) State-based requirements (CCAR, MHSIP, YSSF) Stakeholders are more in-tune with accountability

    6. Description and Intended Uses of the MHSIP What is the MHSIP? What is it used for? 6

    7. 7

    8. 8

    9. Domains of the MHSIP 9

    10. 10

    11. 11

    12. Measurement Constructs 12

    13. 13

    14. Reliability of the MHSIP 14

    15. What are we comparing? 15

    16. Rasch Modeling Perspective 16

    17. Purpose and Methods Participants, Procedures, and Data Analysis 17

    18. Purpose of the Investigation 18

    19. Participants 19

    20. Procedures 20

    21. Psychometric Examination of the MHSIP Reliability, Measurement Invariance, and Differential item Functioning 21

    22. Comparing Subscales 22

    23. Reliability Estimates in 2007 among Subscales and Centers 23

    24. Reliability Summary 24

    25. Invariance Testing Across Centers 25

    26. Confirmatory Factor Analysis A model with all five domains could not be fit Some of the parameters could not be estimated (Variance-Covariance matrix may not be identified) Exploratory analyses using only Outcomes and Participation showed that Outcomes was the major culprit

    28. Invariance with 3 domains We tested invariance on three domains only: Satisfaction, Access and Quality We ran separate models for every center to have an idea up-front of their similarities/differences Trouble can be expected based on the fit Center 2 had the worst fit, Center 3 had a not-so-bad fit; Center 1 was in between the other two centers

    32. Measurement Invariance Whether or not, we can assert that we measured the same attribute under different conditions If there is evidence of variability, any findings reporting differences between individuals and groups cannot be interpreted Differences in average scores can be just as easily interpreted as indicating that different things were measured Correlations between variables will be for different attributes for different groups

    33. Factorial Invariance One way to test measurement invariance is FACTORIAL INVARIANCE The main question it addresses: Do the items making a particular measuring instrument work the same across different populations (e.g., Males and Females)? The measurement model is group-invariant Tests for Factorial Invariance (in order of difficulty):

    34. Steps in Factor Invariance testing Equivalent Factor structure Same number of factors, items associated with the same factors (Structural model invariance) Equivalent Factor loading paths Factor loadings are identical for every item and every factor

    35. Steps in Factor Invariance testing (cont) Equivalent Factor variance/ covariance Variances and Covariances (correlations) among factors are the same across populations Equivalent Item reliabilities Residuals for every item are the same across populations

    36. Results Factorial Invariance

    37. Conclusions Factorial Invariance The model does not provide a good fit for the different centers Most of the discrepancy is centered on loadings and how the domains interact with each other (variance-covariance) Since the model is incremental, (later tests are more challenging than early ones), we did not run equivalent item reliabilities (the most stringent test)

    38. Differential Item Functioning (DIF) 38

    39. Differential Item Functioning 39

    40. 40

    42. 42

    44. 44

    45. Summary of DIFF Analysis 45

    46. Discussion 46

    47. What did we learn about the MHSIP? Some items and subscales (domains) do not seem to measure equally across centers Therefore comparing centers using these items/domains may not reflect true differences in performance It is more likely that they reflect differences in measurement (including error, difficulty, reliability) 47

    48. Some domains are reliable, some are not Satisfaction was Ok from all 3 perspectives Quality had some good characteristics, but some items were bad Participation is not very reliable (only two items; but the items were good) Outcomes is overall, a real bad domain (bad items, lots of cross-loading, correlated errors) Employment/education may not be a desired outcome for all consumers

    49. Discussion Despite the fact that the samples may not be appropriate (biases, sampling frameworks that can be improved), the data at hand suggests that there are some intrinsic problems with the MHSIP But the analyses also suggest some very specific ways to improve it 49

    50. Suggestions Revise the Outcomes Scale (differentiate between recovery/resiliency) Add items to participation scale Some items in Access need to be reviewed (Q4 and Q6) How do we deal with all these cross-loading factors? Is it one domain (satisfaction) that we artificially broke into many domains (outcomes, access, )? How does the factor structure for the entire sample (EFA included in the annual report) holds up for individual centers? More research is needed in this area

    51. More suggestions Sampling Suggestions: Attempt to Stratify the sample by Consumers needs level At MHCD, we have developed a measure of consumers recovery needs level (RNL) Equating Suggestions: Use some form of equating procedures to equate scores across centers Using Item Response Theory techniques: IRT could help learn more about how the MHSIP measures satisfaction/performance within/among mental health centers

    52. More suggestions Mixed Method Design: Conducting focus groups at each center would provide a cross-validation to quantitative measurement This would also enhance the utilization of the results for quality improvement Include in the annual reports the psychometrics (reliability) for every center Helps to know how much confidence we should have in the scores

    53. Questions??? 53

    54. ?2 (Chi-Square): in this context, it tests the closeness of fit between the unrestricted sample covariance matrix and the restricted (model) covariance matrix. Very sensitive to sample size: The statistic will be significant when the model fits approximately in the population and the sample size is large. RMSEA (Root Mean Square Error of Approximation): Analyzes the discrepancies between observed and implied covariance matrices. Lower bound of zero indicates perfect fit with values increasing as the fit deteriorates. Suggested that values below 0.1 indicate a good fit to the data, and values below 0.05 indicate a very good fit. It is recommended not to use models with RMSEA values larger than 0.1 GFI (Goodness of Fit Index): Analogous to R2 in that it indicates the proportion of variance explained by the model. Oscillates between 0 and 1 with values exceeding 0.9 indicating a good fit to the data. CFI (Comparative Fit Index): Indicates the proportion of improvement of the overall fit compared to a null (independent) model. Sample size independent, and penalizes for model complexity. It uses a 0-1 norm, with 1 indicating perfect fit. Values of about 0.9 or higher reflect a good fit

More Related