Secretary of Education Arne Duncan put it, “Today in our country, 99 percent of our teachers are above average.”
he evaluation of teachers based on the contribution they make to the learning of their students, value-added, is an increasingly popular but controversial education reform policy. We highlight and try to clarify four areas of confusion about value-added. The first is between value-added information and the uses to which it can be put. One can, for example, be in favor of an evaluation system that includes value-added information without endorsing the release to the public of value-added data on individual teachers. The second is between the consequences for teachers vs. those for students of classifying and misclassifying teachers as effective or ineffective — the interests of students are not always perfectly congruent with those of teachers. The third is between the reliability of value-added measures of teacher performance and the standards for evaluations in other fields — value-added scores for individual teachers turn out to be about as reliable as performance assessments used elsewhere for high stakes decisions. The fourth is between the reliability of teacher evaluation systems that include value-added vs. those that do not — ignoring value-added typically lowers the reliability of personnel decisions about teachers. We conclude that value-added data has an important role to play in teacher evaluation systems, but that there is much to be learned about how best to use value-added information in human resource decisions.
Teacher evaluation at a crossroads
The vast majority of school districts presently employ teacher evaluation systems that result in all teachers receiving the same (top) rating. This is perhaps best exemplified by a recent report by the New Teacher Project focusing on thousands of teachers and administrators spanning twelve districts in four states.[1] The report revealed that even though all the districts employed some formal evaluation process for teachers, all failed to differentiate meaningfully among levels of teaching effectiveness. In districts that used binary ratings more than 99 percent of teachers were rated satisfactory. In districts using a broader range of ratings, 94 percent received one of the top two ratings and less than 1 percent received an unsatisfactory rating. As Secretary of Education Arne Duncan put it, “Today in our country, 99 percent of our teachers are above average.”[2]
There is an obvious need for teacher evaluation systems that include a spread of verifiable and comparable teacher evaluations that distinguish teacher effectiveness. We know from a large body of empirical research that teachers differ dramatically from one another in effectiveness. Evaluation systems could recognize these differences but they generally don’t. As a consequence, the many low stakes and high stakes decisions that are made in the teacher labor market occur without the benefit of formalized recognition of how effective (or ineffective) teachers are in the classroom. Is there any doubt that teacher policy decisions would be better informed by teacher evaluation systems that meaningfully differentiate among teachers?
There is tremendous support at both the federal and state levels for the development and use of teacher evaluation systems that are more discerning.[3] And the two national teachers unions, the AFT and the NEA, support teacher evaluation systems that recognize and reward excellence and improve professional development. This is consistent with their long-term support of the National Board for Professional Teaching Standards, which is designed to identify excellent teachers and provide them a salary bonus.
The latest generation of teacher evaluation systems seeks to incorporate information on the value-added by individual teachers to the achievement of their students. The teacher’s contribution can be estimated in a variety of ways, but typically entails some variant of subtracting the achievement test score of a teacher’s students at the beginning of the year from their score at the end of the year, and making statistical adjustments to account for differences in student learning that might result from student background or school-wide factors outside the teacher’s control. These adjusted gains in student achievement are compared across teachers. Value-added scores can be expressed in a number of ways. One that is easy to grasp is a percentile score that indicates where a given teacher stands relative to other teachers. Thus a teacher who scored at the 75th percentile on value-added for mathematics achievement would have produced greater gains for her students than the gains produced by 75 percent of the other teachers being evaluated.
Critics of value-added methods have raised concerns about the statistical validity, reliability, and corruptibility of value-added measures. We believe the correct response to these concerns is to improve value-added measures continually and to use them wisely, not to discard or ignore the data. With that goal in mind, we address four sources of concern about value-added evaluation of teachers
Value-added information vs. what you do with it
There is considerable debate about how teacher evaluations should be used to improve schools, and uncertainty about how to implement proposed reforms. For example, even those who favor linking pay to performance face numerous design decisions with uncertain consequences. How a pay for performance system is designed—salary incentives based on team performance vs. individual performance, having incentives managed from the state or district level vs. the building level, or having incentives structured as more rapid advancement through a system of ranks vs. annual bonuses—can result in very good or very ineffective policy.[4]
Similar uncertainty surrounds other possible uses of value-added information. For example, tying tenure to value-added evaluation scores will have immediate effects on school performance that have been well modeled, but these models cannot predict indirect effects such as those that might result from changes in the profiles of people interested in entering the teaching profession. Such effects on the general equilibrium of the teacher labor market are largely the subject of hypothesis and speculation. Research on these and related topics is burgeoning,[5] but right now much more is unknown than known.
However, uncertainties surrounding how best to design human resource policies that take advantage of meaningful teacher evaluation do not bear directly on the question of whether value-added information should be included as a component of teacher evaluation. There is considerable confusion between issues surrounding the inclusion of value-added scores in teacher evaluation systems and questions about how such information is used for human resource decisions. This is probably because the uses of teacher evaluation that have gained the most public attention or notoriety have been based exclusively on value-added. For example, in August 2010, the Los Angeles Times used several years of math and English test data to identify publicly the best and the worst third- to fifth-grade teachers in the Los Angeles Unified School District. The ensuing controversy focused as much on value-added evaluation as the newspaper’s actions. But the question of whether these kinds of statistics should be published is separable from the question of whether such data should have a role in personnel decisions. It is routine for working professionals to receive consequential evaluations of their job performance, but that information is not typically broadcast to the public.
A place for value-added
Much of the controversy surrounding teacher performance measures that incorporate value-added information is based on fears about how the measures will be used. After all, once administrators have ready access to a quantitative performance measure, they can use it for sensitive human resources decisions including teacher pay, promotion, or layoffs. They may or may not do this wisely or well, and it is reasonable for those who will be affected to express concerns.
We believe that whenever human resource actions are based on evaluations of teachers they will benefit from incorporating all the best available information, which includes value-added measures. Not only do teachers typically receive scant feedback on their past performance in raising test scores, the information they usually receive on the average test scores or proficiency of their students can be misleading or demoralizing. High test scores or a high proficiency rate may be more informative of who their students are than how they were taught. Low test scores might mask the incredible progress the teachers made. Teachers and their mentors and principals stand to gain vast new insight if they could see the teachers’ performance placed in context of other teachers with students just like their own, drawn from a much larger population than a single school. This is the promise of value-added analysis. It is not a perfect system of measurement, but it can complement observational measures, parent feedback, and personal reflections on teaching far better than any available alternative. It can be used to help guide resources to where they are needed most, to identify teachers’ strengths and weaknesses, and to put a spotlight on the critical role of teachers in learning.
Evaluating Teachers: The Important Role of Value-Added | Brookings Institution.