Don't use student test scores to evaluate teachers

Up until now, most of the warnings about using student test scores to evaluate teachers—so-called “value-added” measures, or VAM—have come in the form of opinion. That’s not necessarily invalid, but peer-reviewed research has a much more trustworthy ring to it.

Enter two researchers who have found little or no correlation between quality teaching and the appraisals teachers received based on student standardized test scores. Their study, published today in the journal Educational Evaluation and Policy Analysis, casts further doubt on the possibility of using empirical data to identify good and bad teachers.

“Recent years have seen the convergence of two major policy streams in U.S. K–12 education: standards/accountability and teacher quality reforms,” write Morgan S Polikoff of the University of Southern California and Andrew C Porter of the University of Pennsylvania.

Work in these areas has led to the creation of multiple measures of teacher quality, including measures of their instructional alignment to standards/assessments, observational and student survey measures of pedagogical quality, and measures of teachers’ contributions to student test scores. This article is the first to explore the extent to which teachers’ instructional alignment is associated with their contributions to student learning and their effectiveness on new composite evaluation measures using data from the Bill & Melinda Gates Foundation’s Measures of Effective Teaching study. Finding surprisingly weak associations, we discuss potential research and policy implications for both streams of policy.

“The concern is that these state tests and these measures of evaluating teachers don’t really seem to be associated with the things we think of as defining good teaching,” the Washington Post quoted Dr Polikoff, an assistant professor of education at the Rossier School of Education, as saying.

Although the time frame varies from state to state for implementing new teacher evaluation models, 35 states and the District of Columbia have required student achievement or growth to play at least a significant role in teacher evaluation scores. The list of states includes both Maryland and Illinois.

The study, which looked at more than 300 teachers in New York, Dallas, Denver, Charlotte-Mecklenburg, Memphis, and Hillsborough County in Florida, was funded by a grant from the Bill and Melinda Gates Foundation. Researchers found some teachers held in high regard, based on student surveys, classroom observances by principals, and other indicators of quality, had students who scored poorly on tests, and vice versa.

Other reports have issued similar warnings against using VAM scores to make personnel decisions at the teacher level. For example, the American Statistical Association said last month, “Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality.”

Editorial on using test scores and VAM to grade teachers

Standardized tests were designed to measure student progress and diagnose areas of certain subjects in which they’re struggling. They were created with the intention of getting additional help to students in those areas.

The use of standardized test data for any other purpose, including teacher evaluation, is limited and should be severely restricted by our laws and policies. The Danielson Framework, which provides an extensive scheme for evaluating teachers, includes objective measures of student progress as one part of a very large domain. Any public policies that require the use of VAM data should be replaced with policies that require the use of teacher evaluation systems that take more of what teachers do into account.

The tests have very little to do with what makes a good teacher and much more to do with student-based parameters, such as family income, life in impoverished communities, home life, and so on. Statisticians say they “correct” for these out-of-school factors, but the amount of noise in the data is just too great for anyone, even the most talented statisticians, to find the signal.

Therefore, VAM data should be used with great care for teacher evaluations and in the most limited ways. Better to rely on the entire Danielson Framework than to focus 20, 30, or up to 50 percent of a teacher’s evaluation on scores achieved by her or his students on a one- to three-hour test.

Don’t use student test scores to evaluate teachers

Editorial on using test scores and VAM to grade teachers

Recent Posts

Banned from prom? Mom fought back and won.

10,000 schools & counting: The Unified Movement

Trump targets Harvard in Title VI civil rights suit

More than just tutus and arias at the 2026 Oscars

What parents need to know about classroom AI

Straws, ping pong balls reveal invisible harm of vaping

Movie review: Melania

Red Raiders lose gear but save lives in highway blaze

Curiosity to the capital: Colorado’s poetry champ

Swimming upstream against the wave of fake news

Student Loan Office moves to Treasury Dept.

Girl in England talks exam-based depression in teens

Ravitch still opposes Common Core: NYT

Variation seen in testing time in Maryland schools