450 likes | 489 Views
Growth, Value-Added and Teacher Effectiveness Measures. Philip R. Fletcher Senior Research Scientist Pearson. Teacher opinion. A recent international survey of teachers shows: --That the vast majority of teachers welcome appraisal and feedback on their work.
E N D
Growth, Value-Added and Teacher Effectiveness Measures Philip R. Fletcher Senior Research Scientist Pearson
Teacher opinion • A recent international survey of teachers shows: • --That the vast majority of teachers welcome appraisal and feedback on their work. • --That it improves their job satisfaction and effectiveness as teachers. • --But too many teachers do not receive any feedback on their work at all. • --Moreover, evaluation is perceived to be an instrument of compliance rather than development.
Teacher ratings • Most school districts use pass-fail ratings where nearly all teachers pass. • 99% of teachers in districts using binary ratings are rated satisfactory. • 94% of teachers in districts using multiple points are in the top two categories. • As Arne Duncan noted, “Ninety-nine percent of our teachers are above average.”
Teacher salaries • Teacher compensation is very predictable. • Based on the teacher’s highest degree and years of seniority. • Almost completely unrelated to variations in teacher effectiveness.
Effectiveness varies • Anecdotal and empirical evidence suggests that teachers differ dramatically in effectiveness. • An effective teacher will raise student test scores by ten percentiles per year. • Three years of effectively teachers raise test scores by thirty percentiles. • Traditional teacher evaluation systems fail to recognize these differences.
Teacher recognition • The need to recognize teachers who make magnificent contributions to student learning. • The need to motivate people to gain expertise. • And the need to leverage expert teachers and reward them for their efforts. • To ensure that students are taught successfully, there is need to differentiate teacher effectiveness in terms of their impact on student learning.
Status, growth and effectiveness • Student achievement is the status of accumulated subject matter knowledge at one point in time—a lagging indicator. • Student learning is growth in subject matter knowledge over time—a leading indicator. • It is student learning—not student achievement—that is most relevant in defining and assessing teaching effectiveness.
Status, growth and effectiveness • Achievement provides evidence of the status of student knowledge and understanding at one point in time. • Learning is demonstrated by growth in student achievement from one point in time to another point in time–not by status at either point time alone. • Effectiveness is demonstrated by above-average student learning and growth.
Status, growth and effectiveness • Schematically: • Status = Achievement • Growth = Learning • Relative Growth = Effectiveness
Why growth? • Growth reflects learning,and we care about student learning. • Because the principle role of teachers is to enhance student learning. • Teacher effectiveness should be reflected in how much their students learn.
Official incentives • Teacher Incentive Fund (TIF) grants require school districts to evaluate teachers. • Race to the Top (RttT) funds require a state commitment to measuring teacher effectiveness. • No Child Left Behind (NCLB) required testing of all students in reading in mathematics, leading to the development of longitudinal data systems linked to individual teachers.
Student testing • Most states have test data linked to specific schools and teachers that can be used to track student growth. • Many assessment systems are based on student test score growth over time: • Value-added models • Student growth percentiles • Both address effectiveness in terms of learning rather than status.
Value-added assessment • Value-added models are designed to assess school and teacher contributions to student growth. • A value-added assessment model is designed to demonstrate the impact of individual schools and teachers. • It is designed to distinguish between teacher effects and other outside influences.
Value-added assessment • Value-added captures the growth that classes of students achieve during a single year of schooling. • To estimate classroom effects, student data include only the students enrolled in a particular class.
Value-added assessment • Key idea is to statistically isolate the contribution of individual teachers from all other sources of influence. • Value-added analyses attempt to determine the amount of student growth that can be attributed to an individual teacher. • Value-added models quantify teacher effectiveness—the teacher’s contribution to student learning and growth.
Value-added assessment • Value-added attributes causality to the teacher. • Teachers are responsible for the learning and growth of their students. • Under conditions of high stakes accountability, student growth has been directed toward cause and responsibility.
Value-added assessment • Some statisticians would argue that value-added unsuited for drawing causal inferences that a given teacher is responsible for the increase in student test scores. • “We do not think that their analyses are estimating causal quantities, except under extreme and unrealistic assumptions.” –Rubin, Stuart, and Zanutto (2004). • “…it does not appear possible to separate teacher and school effects using currently available accountability data.” –Raudenbush (2004).
Value-added assessment • Policymakers and school administrators generally express no such reservations and offer strong support for the value-added. • “If quality instruction is essential for student learning, then student learning should tell us something about the quality of instruction.”
Descriptive accountability • Accountability system results may have value without making causal inferences. • From this perspective, accountability results should not be used to sanction teachers in schools. • Instead, they should be used to make sound judgments about quality and needed improvements. • Descriptive information and identification of schools, teachers, and students that may require further attention.
Describing student growth • The Colorado Growth Model was designed to describe student growth and learning. • Quantile regression is used to model the complete distribution of student achievement over time. • The model quantifies distance = growth rate time, probabilistically. • Growth percentiles describe the rarity of a student’s current growth, given their prior achievement.
Student growth percentiles • Examining growth with achievement sheds new light on school performance. • Median growth above the 50th percentile identifies best practices and sources that can offer support. • Median growth below the 50th percentile identifies greatest needs and targets that need to receive support. • A gap-closing strategy is built around a consensus of school improvement.
Common yardstick • Most states have administrative data that can be used as a common yardstick to identify the 25% most effective teachers. • Supervisor ratings and classroom observations provide no such common yardstick. • Local implementation of these other measures varies in 1600 school districts nationwide. • More importantly, they do not directly reflect student learning.
Value-added and growth limitations • Value-added and growth percentiles are only available for teachers in certain subject matter areas. • Value-added and growth percentiles are available for only a small subset of teachers. • Value-added and growth percentiles are limited by the test. • Growth metrics are too narrow to provide information about how teachers can improve.
Value-added and growth shortcomings • Value-added metrics and growth percentiles for individual teachers fluctuate from year to year. • They can be influenced by factors beyond the teacher’s control. • They are imperfect measures with a relatively large error component.
Concern • How well does value-added predict the top 25% from year-to-year? • How well do alternative measures of teacher effectiveness predict the same top 25% from year to year? • Classroom observations? • Principals’ ratings? • Student surveys?
Value-added and growth compare favorably • Value-added metrics and growth percentiles compare favorably with performance measures in other fields. • The correlation between SAT test scores and freshman success in college is 0.35. • The correlation in batting averages between years in professional baseball is 0.36. • The correlation between value-added estimates this year and next lies between 0.20 and 0.60. • While most value-added estimates correlate 0.30 and 0.40 between years.
Value-added and growth prognosis • Recommend the use of value-added measures and growth percentiles, principally because they are related to student learning and growth. • Are mindful of their limitations and imperfections. • Strive to continually improve these growth measures.
Suggestion • Use multiple measures—not only value-added metrics and growth percentiles. • Alternate measures should meaningfully supplement state test score data and increase prediction. • Alternate measures should be applicable to a broader range of teachers. • Provide direct information and feedback suggesting how teachers can improve teaching.
Suggestion • Use core and non-core measures to validatethe full range of teacher effectiveness for a broader range of teachers. • Where growth measures benchmark the reliability of other teacher effectiveness measures. • Key idea is to predict benchmark growth measures. • Weight different measures based on their power to predict student learning and growth.
Observational measures • What is needed is not so much an accounting of teacher time or a rating of teacher performance, but rather higher level inferences about the teacher’s ultimate purposes and effects. • Making holistic judgments requires higher levels of inference. • In short, we need a method to obtain holistic rankings reliably and validly. • Procedures must minimize rater effects and coding errors.
Classroom Interactions • A complex situation, difficult to characterize unassisted. • Teacher practice and student-teacher interactions—from the participants’ point of view. • How do students and teachers interact in a practical and personal sort of way? • How do they approach and solve problems together? • Are there different classroom profiles?
Concourse of meaning • The first challenge is to figure out what makes great teaching. • This is difficult and controversial from an educational perspective. • Yet relatively straightforward from a managerial perspective. • Find the best educators and give them an opportunity to debate and create the best pedagogy and teaching practice.
Danielson Framework • Charlotte Danielson’s Framework serves as a source of statements about teacher effectiveness. • The Framework is divided into: • --4 Domains • --23 Components • --76 Elements • --304 Items
Danielson Framework • The 4 Domains include: • --Planning and Preparation • --The Classroom Environment • --Instruction • --Professional Responsibilities
Danielson Framework • The 2 Domains that students actually see: • --The Classroom Environment • --Instruction
Danielson Framework • Scoring rubrics:
Danielson Framework • Items:
Danielson Framework • The Danielson Framework is prescriptive. • Unsatisfactory and basic performance are often just the negation of proficient and distinguished performance. • No guide to what teachers do when under stress. • Good behavior follows rules. Lacks insight from control theory and negative feedback. • “Students help set high standards.”
Danielson Framework • A good basis for a limited number of items. • These items can be readily supplemented with items from other sources, by other authors. • Use these sources and create new items to fully cover what students and teachers actually do.