Highlights
- Evaluating teachers to group them in performance categories will inevitably lead to mistakes, regardless of: when in a career a teacher is evaluated, whether the rating is for high or low stakes, and how the evaluation is conducted.
- Mistakes occur because true teacher performance is not fully observable. As a result, we have to rely on imperfect measures.
- While classification errors are an inescapable part of decisions that rely on grouping by teacher performance, better measurement of performance can reduce these errors.
- Classification errors can be separated into “false positives” – putting a teacher into a group to which he does not belong – or “false negatives” – not classifying him into a group into which he does belong.
- Classification errors associated with performance measures based on student test scores are quite high. But the error rate of these “value-added measures” may be lower than the error rate of classifications based on traditional measures of teacher effectiveness such as licensure status or years of experience.
- Current evaluation systems rarely classify teachers as ineffective or needing improvement. Thus, the “false negative” rate for these classifications is probably high.
- We have little research to draw upon for designing systems or for predicting the effects of emerging evaluation systems. This calls for caution and for a better understanding of new systems in action.
Introduction
Better teacher evaluation should lead to better instruction and improved outcomes for students. But more accurate classification of teachers requires better information than we now have. Because existing measures of performance are incomplete and imperfect, measured performance does not always reflect true performance. Teachers who are truly high-performing will be classified as lower-performing and vice versa.
Teacher classification is of course a necessity: policymakers and practitioners must decide who is eligible to teach in which classrooms and how teachers are promoted or compensated. School leaders have commonly used such measures as academic degrees, teaching experience, classroom observation ratings, and assessments by principals to make these decisions. Less common is the use of “value-added” estimates.
But value-added measures are being used more and more as data systems are better able to make that student–teacher link. As they are, the measures are raising concerns about the implications for misclassification.
Classification errors are troubling in and of themselves, but they are particularly worrisome when classification is used to sanction teachers. The improper sanctioning of good teachers not only hurts those teachers, it may also keep capable people from entering the teaching profession. On the flip side, poor teachers who are misclassified as higher performers will continue to teach ineffectively, often doing their students harm.
...