Do We Have To Choose Between Accountability and Program Improvement?

Do We Have To Choose Between Accountability and Program Improvement? NECTAC’s Measuring Child and Family Outcomes Conference 2006 Kristie Pretti-Frontczak Kent State University kprettif@kent.edu Jennifer Grisham-Brown University of Kentucky jgleat00@uky.edu

Overview of Session • Discuss the need for measuring child outcomes as it relates to programming and accountability purposes • Discuss three issues and associated recommendations and related research • Discussion is encouraged throughout • Time will remain at the end for questions and further discussion of what was presented.

Introductions and Setting a Context • Kristie Pretti-Frontczak • Kent State University • Jennifer Grisham-Brown • University of Kentucky • Belief/Bias/Recommended Practice • Authentic assessment is critical regardless of purpose

CENTRAL QUESTION FOR TODAY’S PRESENTATION • Can instructional data be used for accountability purposes? • The Short Answer: Yes (IF) ….

Linked System Approach Goal Development Assessment • Based upon children’s emerging skills • Will increase access and participation • Authentic • Involves families • Comprehensive • Common • Developmentally and individually appropriated • Comprehensive and common • Systematic • Ongoing • Guides decision-making Evaluation Instruction

If you…. • If you assess young children using a high quality authentic assessment… • Then you’ll be able to develop high quality individualized plans to meet children’s unique needs…. • If you identify the individual needs of children….

You’ll want to use the information to guide curriculum development… If you have a curriculum framework that is designed around the individual needs of the children… Then you’ll want to document that children’s needs are being met…

Then you’ll need to monitor children’s performance over time using your authentic assessment… And when you have done the authentic assessment for a second or third time, you’ll want to jump for joy because all of the children will have made progress!

Three Issues • Selection • Implementation • Interpretation

Questions around Selecting an Assessment • Which tools/processes? • Which characteristics should be considered? • What about alignment to state standards or Head Start Outcomes? • Use a single/common assessment or a list? • Allow for choice or be prescriptive? • Who should administer? • Where should the assessment(s) be administered?

Recommendations • Use an assessment for its intended purpose • Avoid comparing assessments to one another – rather compare them to stated/accepted criteria • Alignment to local/state/federal standards • Reliable and valid • Comprehensive and flexible • Link between assessments purposes • Link between assessment and intervention

Recommendations Continued • Allow for state/local choice if possible • Increases likelihood of a match • Increases fidelity and use • Avoids a one size fits all approach • if assessment is flexible and comprehensive 1 might work • Authentic, authentic, authentic • People who are familiar • Settings that are familiar • Toys/materials that are familiar

Generic Validation Process • Step 1– Create a Master Alignment Matrix • Experts create a master matrix • Establish inclusion and exclusion criteria • Step 2 –Create Expert Alignment Matrixes • Experts blind to the master matrix create their own alignment matrixes • Step 3 – Validate Master Alignment Matrix • Compare master and expert matrixes • Ensure that all items that should be considered were placed on the final matrixes • Examine the internal consistency of the final matrixes Allen, Bricker, Macy, & Pretti-Frontczak, 2006; Walker, & Pretti-Frontczak, 2005 For more information on crosswalks visit: http://www.fpg.unc.edu/~ECO/crosswalks.cfm or http://aepslinkedsystem.com

Concurrent Validity • Purpose: • To examine the concurrent validity between a traditional norm-referenced standardized test (BDI-2) and an curriculum-based assessment (AEPS®) • Subjects: • 31 Head Start children • Ranged in age from 48 months to 67 months (M=60.68, SD=4.65) • Methods: • Six trained graduate students administered the BDI-2 and six trained Head start teachers administered the AEPS® during a two-week period. Conducted seven (7) bivariate 2-tailed correlations (Pearson’s and Spearman’s) • Results: • Five correlations suggested a moderate to good relationship between the BDI-2 and the AEPS • Two correlations suggested a fair relationship between the BDI-2 and the AEPS Hallam, Grisham-Brown, & Pretti-Frontczak, 2005

Concurrent Validity Results • Adaptive • Self Care items from the BDI (M = 66.03, SD = 6.67) were moderately correlated with Adaptive items from the AEPS (M = 62.03, SD = 13.57), r = .57, n = 31, p =.01. • Social • Personal Social items from the BDI (M = 175.15, SD = 22.74) had a fair correlation with Social items from the AEPS (M = 80.06, SD = 16.33), r = .50, n = 31, p =.01. • Communication • Communication items from the BDI (M = 121.06, SD = 16.22) were moderately correlated with Social Communication items from the AEPS (M = 88.61, SD = 14.20), r = .54, n = 31, p =.01.

Concurrent Validity Results Continued • Motor • Gross Motor items from the BDI (M = 82.76, SD = 4.70) had a fair correlation with Gross Motor items from the AEPS (M = 30.10, SD = 6.62), r = .48, n = 31, p =.01. • Fine Motor items from the BDI (M = 52.45, SD = 5.30) were moderately correlated with Fine Motor items from the AEPS (M = 26.39, SD = 5.68), r = .58, n = 31, p =.01. • Perceptual Motor items from the BDI (M = 27.73, SD = 3.63) were moderately correlated with Fine Motor items from the AEPS (M = 26.39, SD = 5.68), r = .58, n = 31, p =.01. • Cognitive • Cognitive items from the BDI (M = 135.85, SD = 23.44) were moderately correlated with Cognitive items from the AEPS (M = 81.26, SD = 24.26), r = .71, n = 31, p =.01.

Project LINK • Head Start/University Partnership grant (Jennifer Grisham-Brown/Rena Hallam) • Purpose: To build the capacity of Head Start programs to link child assessment and curriculum to support positive outcomes for preschool children • Focus on mandated Head Start Child Outcomes • Concepts of Print • Oral Language • Phonological Awareness • Concepts of Number Grisham-Brown, Hallam, & Brookshire, in press; Hallam, Grisham-Brown, Gao, & Brookshire, in press

PRELIMINARY RESULTS FROM PROJECT LINK: Classroom Quality • No significant differences between control and intervention classrooms on global quality (ECERS-R) • The quality of the language and literacy environment (ELLCO) was superior in intervention classrooms; significant in pilot classrooms

PRELIMINARY RESULTS FROM PROJECT LINK: Child Outcomes • Change scores in Intervention classrooms are significantly higher than Control classrooms on letter-word recognition subscale of FACES battery. • The mean change scores were higher (although not significantly so) on seven additional subscales (11 total) of FACES battery - nearing significance on PPVT • Results would probably have been greater with larger sample • Results will be duplicated this year

Questions Around Training, Implementation, and Use • Who will implement? • What level of training and support will staff need? • What will be topics of training? • Who will provide training and support? • How will you know if staff are reliably collecting data? • How ill you know if staff are procedurally collecting data with fidelity?

Recommendations: • Training/Follow-up • Format • Topics • Classroom and administrative • Valid and reliable • Will require training and support • Will require seeing assessment as a critical part of intervention/curriculum planning

What it takes! • Who? • All classroom staff • Administrators/consultants • What? • Instrument • Methods (e.g., observations, anecdotals, work samples) • Data entry/management • Relationship to everything else (I.e., Linked system)

What it takes (cont.) • How? • Training that is “chunked” • Self-assessment • Follow-up, follow-up, follow-up • Mentoring • On-site technical assistance • Access to someone to call! • Involvement of administration

Can preschool teachers (with appropriate training) collect reliable data with fidelity? • Reliability study • Fidelity study • Accuracy study Brown, Kowalski, Pretti-Frontczak, Uchida, & Sacks, 2002 ; Grisham-Brown, Hallam, & Pretti-Frontczak, in preparation

Inter-Rater Reliability • Subjects: • 7 Head Start Teachers • 7 Head Start Teaching Assistants • Method: • Practiced scoring AEPS items from video • Scored AEPS items; Checked against master score provided by author • Results: • 7 of 7 teachers reached reliability at 80% or higher (range 85% - 93%) • 5 of 7 teaching assistants reached reliability at 80% or higher (range 75% - 90%)

Fidelity Study • Subjects: • Six (6) Head Start teachers/teaching assistants who reached 80% or higher on interrater reliability study • Method: • Used fidelity measure to check teachers’ implementation of authentic assessment within seven (7) planned activities • Six (6) Authentic Assessment Variables • set up and preparation; decision making; materials; choice; embedding; and procedure • Procedures • Observed participants collecting AEPS® data during each 7 small group activities • Observed participants 7 times for up to 10 minutes per activity

Average Ratings on Six Authentic Assessment Variables across Observations and Activities by Teacher

Average Ratings on Six Authentic Assessment Variables across Observations for Seven Different Activities

Accuracy Study • Study designed to investigate the accuracy of teachers’ assessments of children’s skills and abilities using observational assessment • Examined the degree of agreement between assessments of children’s Language and Literacy and Early Math skills made by their teachers using an observational assessment instrument and assessments of the same skills made by researchers using a demand performance instrument. Brown, Kowalski, Pretti-Frontczak, Uchida, & Sacks, 2002

Measures • Observational Measure - Galileo System’s Scales (Bergan, Bergan, Rattee, & Feld, 2001) • Language & Literacy-Revised Ages 3-5 (n=68 items full scale) • Early Math-Revised Ages 3-5 (n=68 items full scale) • Demand Performance Measure • Items that could be readily assessed in individual, one-session, performance-based interviews with children were selected from the Galileo System’s scales and converted into demand performance tasks to create two performance measures • Language & Literacy (n=21 items) • Early Math (n=23 items). • Items varied in difficulty and knowledge domain assessed. • Standardized sets of materials for administering tasks were also developed (e.g., index cards with printed objects, books, manipulatives, etc.). • The performance measures were piloted with preschoolers in two regions of the state and revised accordingly.

Procedures • Trained research assistants visited sites across the state: • collected data teachers entered into the relevant observation scales of the Galileo System; and • administered the Performance Measures. • In order to ensure that the most up-to-date information was obtained from the Galileo System, data were collected during the 2 weeks prior to and following a state mandated entry date. • Order of administration of Performance Measures was counterbalanced across assessment domains.

Participants • 122 children • ranged in age from 3 to 6 years (M=4 years, 11 months) • 100% in state-funded Head Start programs • 66 teachers • Areas in which children are served • 47% urban • 41% suburban/small town • 11% rural • Representation by use of the Galileo System • 38% first-year users • 32% second-year users • 23% third-year users

Conclusions • Overall, levels of concordance were moderate. • In the domain in which teachers were most conservative in attributing abilities to children, Language & Literacy, there was the most amount of agreement between data teachers entered into the Galileo System and the Performance Measure (71%). • In the domain in which teachers were most generous in attributing abilities to children, Early Math, there was the least amount of agreement between the data teachers entered into the Galileo System and the Performance Measure (66%). • Reliability • Teachers using the naturalistic observation instrument (the Galileo System) are not providing inflated estimates of children’s skills and abilities. • However, they may be underestimating children’s skills and abilities in the domain of Language & Literacy.

Questions Around Interpreting the Evidence • What is evidence? • Where should the evidence come from? • What is considered “performing as same age peers”? • How should decisions be made? • Who should interpret the evidence? • How can the ECO child summary form be used?

What is Evidence? • Information (observations, scores, permanent products) about a child’s performance across the three OSEP outcomes • Positive social-emotional skills (including social relationships) • Acquisition and use of knowledge and skills • Use of appropriate behaviors to meet their needs • The amount of type of evidence for each outcome will vary

Where should the evidence come from? • Multiple time periods • Multiple settings • Multiple people • Parents • Providers • Those familiar with the child • Multiple measures (should be empirically aligned) • Observations • Interviews • Direct tests

Required Decisions • Decision for Time 1 • Is the child performing as same age peers? • Yes • No • Decision for Time 2 • Did the child make progress? • YES – and performance is as you would expect of same age peers • YES – and performance is not as you would expect of same age peers • NO progress was made

Things to Keep in Mind • “Typical/performing as same age peers” is NOT average • “Typical” includes a very broad range of skills/abilities • Child can be “typical” in one OSEP area and not another • Progress is any amount of change • Raw score changed by 1 point • A single new skill was reached • Child needs less assistance at time two • If using the Child Outcome Summary Form • Child’s rating score does NOT have to change from time 1 to time 2 to demonstrate progress • Progress can be continuing to develop at a typical rate (i.e., maintain typical status)

How Should the Required Decisions be Made? • Some assessments will make the decision • Standard score • Residual Change Scores • Goal Attainment Scaling • Number of objectives achieved/Percent objectives achieved • Rate of Growth • Item Response Theory (cutoff score) • Proportional Change Index

Making Decisions Continued • Regardless - Team conclusions…. • should be based on multiple sources • should be based on valid and reliable information • should be systematic • Can use the Child Outcome Summary Form • Will help with required decision and provide more information for use at the local or state level

Child Outcome Summary Form • Single rating scale that can be used to systematize information and make decisions • After reviewing the evidence rate the child’s performance on each of the 3 outcomes from 1 to 7 • Currently a score of 6 or 7 is considered to be performance that is similar to same age peers.

Getting from 7 to 3 • Seven point rating scale just summarizes the evidence • The required interpretation is still needed • % of children who reach or maintain functioning at a level comparable to same-age peers • % of children who improve functioning but are not in “a” • % of children who did not improve functioning

Example • During a play-based assessment, IFSP/IEP team administered • a norm-referenced test • a curriculum-based assessment • an interview with relevant caregivers • The team then summarized the child’s performance using each method’s internal summary procedures • Calculated a standard score • Derived a cutoff score • Narratively summarized interview • Lastly the team rated the child’s overall performance using ECO’s Child Outcome Summary Form for each of the 3 OSEP outcomes • Two years later as the child was being transitioned out of the program, the results from a comprehensive curriculum-based assessment were reviewed • The child’s performance rated using ECO’s Child Outcome Summary Form • The team made a determination of progress

Time One Outcome One Rating = 6 Interpretation = “Typical” Outcome Two Rating = 5 Interpretation = “Not typical” Outcome Three Rating = 3 Interpretation = “Not typical” Time Two Outcome One Rating = 6 Interpretation = a Outcome Two Rating = 5 Interpretation = b* Outcome Three Rating = 5 Interpretation = b Example Continued *Remember the Child Outcome Summary Form 7 point rating is a summary of performance not of progress. At time two, teams are also prompted to consider progress.

Fact or Fiction • Someone has the answers and if I look long enough I’ll have them too. • Everything has to be perfect this first time around. • Research doesn’t matter – just getting the data submitted. • I really do believe that garbage in is garbage out but at the end of the day – just want the data.

Overall Synthesis and Recommendations • Rigorous implementation of curriculum-based assessments requires extensive professional development and support of instructional staff. • Findings suggest that CBAs, when implemented with rigor, have the potential to provide meaningful child progress data for program evaluation and accountability purposes.

“And that’s our outcomes measurement system. Any questions?”

References • Allen, D., Bricker, D., Macy, M., & Pretti-Frontczak, K. (2006, February). Providing Accountability Data Using Curriculum-Based Assessments. Poster presented at the Biannual Conference on Research Innovations in Early Intervention, San Diego, California. • Brown, R. D., Kowalski, K., Pretti-Frontczak, K., Uchida, C., & Sacks, D. (2002, April). The reliability of teachers’ assessment of early cognitive development using a naturalistic observation instrument. Paper presented at the 17th Annual Conference on Human Development, Charlotte, North Carolina. • Grisham-Brown, J., Hallam, R., & Brookshire, R. (in press). Using authentic assessment to evidence children’s progress towards early learning standards. Early Childhood Education Journal. • Grisham-Brown, J., Hallam, R., & Pretti-Frontczak, K. Measuring child outcomes using authentic assessment practices. Journal of Early Intervention (Innovative Practices). Manuscript in preparation. • Hallam, R., Grisham-Brown, J., Gao, X., & Brookshire, R. (in press). The effects of outcomes-driven authentic assessment on classroom quality. Early Childhood Research and Practice. • Hallam, R., Grisham-Brown, J., & Pretti-Frontczak, K. (2005, October). Meeting the demands of accountability through authentic assessment. Paper presented at the International Division of Early Childhood Annual Conference, Portland, OR. • Walker, D., & Pretti-Frontczak, K. (2005, December). Issues in selecting assessments for measuring outcomes for young children. Paper presented at the OSEP National Early Childhood Conference, Washington, D.C. (http://www.nectac.org/~meetings/nationalDec05/mtgPage1.asp?enter=no)

Do We Have To Choose Between Accountability and Program Improvement?

Do We Have To Choose Between Accountability and Program Improvement?

Presentation Transcript

ABET Program Evaluator Re-Training – Materials Engineering

PROGRAM BUDGETING EXPERIENCES FROM SLOVAKIA

Medicare Advantage Chronic Care Improvement Program Training for Medicare Advantage Organizations

Medicare Advantage Quality Improvement Project Medicare Advantage Industry Training

Mission: School Improvement

BE A DRIVING FORCE FOR SAFETY. Driver Improvement Training Program

LOCAL CONTROL ACCOUNTABILITY PLANS

Leadership Accountability

Know Your Accountability 2013

Budget Transparency Accountability and Service Delivery

Trauma Performance Improvement

Accountability Leadership Institute

Responsibility, Accountability, and Liability:

Florida Continuous Improvement Model FCIM

Well Aware Health Improvement Program HYPERTENSION High Blood Pressure

Creating a Culture of Accountability

AY 2013 CTE Instructional Program Review Process

Measuring Child and Family Outcomes

Wendy Davis, MD, FAAP Senior Faculty, Vermont Child Health Improvement Program

Capital Improvement program