“The Widget Effect,” a widely read 2009 report from The New Teacher Project, surveyed the teacher evaluation systems in 14 large American school districts and concluded that status quo systems provide little information on how performance differs from teacher to teacher. The memorable statistic from that report: 98 percent of teachers were evaluated as “satisfactory.” Based on such findings, many have characterized classroom observation as a hopelessly flawed approach to assessing teacher effectiveness.
The ubiquity of “satisfactory” ratings stands in contrast to a rapidly growing body of research that examines differences in teachers’ effectiveness at raising student achievement. In recent years, school districts and states have compiled datasets that make it possible to track the achievement of individual students from one year to the next, and to compare the progress made by similar students assigned to different teachers. Careful statistical analysis of these new datasets confirms the long-held intuition of most teachers, students, and parents: teachers vary substantially in their ability to promote student achievement growth.
The quantification of differences has generated a flurry of policy proposals to promote teacher quality over the past decade, and the Obama administration’s recent Race to the Top program only accelerated interest. Yet, so far, little has changed in the way that teachers are evaluated, in the content of pre-service training, or in the types of professional development offered. A primary stumbling block has been a lack of agreement on how best to identify and measure effective teaching.
A handful of school districts and states—including Dallas, Houston, Denver, New York, and Washington, D.C.—have begun using student achievement gains as indicated by annual test scores (adjusted for prior achievement and other student characteristics) as a direct measure of individual teacher performance. These student-test-based measures are often referred to as “value-added” measures. Yet even supporters of policies that make use of value-added measures recognize the limitations of those measures. Among the limitations are, first, that these performance measures can only be generated in the handful of grades and subjects in which there is mandated annual testing. Roughly one-quarter of K–12 teachers typically teach in grades and subjects where obtaining such measures is currently possible. Second, test-based measures by themselves offer little guidance for redesigning teacher training or targeting professional development; they allow one to identify particularly effective teachers, but not to determine the specific practices responsible for their success. Third, there is the danger that a reliance on test-based measures will lead teachers to focus narrowly on test-taking skills at the cost of more valuable academic content, especially if administrators do not provide them with clear and proven ways to improve their practice.
Student-test-based measures of teacher performance are receiving increasing attention in part because there are, as yet, few complementary or alternative measures that can provide reliable and valid information on the effectiveness of a teacher’s classroom practice. The approach most commonly in use is to evaluate effectiveness through direct observation of teachers in the act of teaching. But as “The Widget Effect” reports, such evaluations are a largely perfunctory exercise.
In this article, we report a few results from an ongoing study of teacher classroom observation in the Cincinnati Public Schools. The motivating research question was whether classroom observations—when performed by trained professionals external to the school, using an extensive set of standards—could identify teaching practices likely to raise achievement.
We find that evaluations based on well-executed classroom observations do identify effective teachers and teaching practices. Teachers’ scores on the classroom observation components of Cincinnati’s evaluation system reliably predict the achievement gains made by their students in both math and reading. These findings support the idea that teacher evaluation systems need not be based on test scores alone in order to provide useful information about which teachers are most effective in raising student achievement.