Skip to main content

Un-“MET” Goals


Gates Foundation’s MET Study
 Fails to Solve the Teacher Evaluation Challenge


William J. Mathis, (802) 383-0058,
Jesse Rothstein, (510) 643-8561, rothstein@berkeley.eduZ
Jamie Horwitz, (202) 549-4921,

URL for this press release:

BOULDER, CO (January 31, 2013) – A review by the National Education Policy Center (NEPC) of a newly released and long-awaited study on teacher evaluation strongly questions the spin that has been put on the findings.

The Measures of Effective Teaching (MET) project, funded by the Bill and Melinda Gates Foundation, released its final set of reports this month. Those reports are supposed to advise schools and districts about how to design teacher evaluations. 

However, a careful look at the MET research – an ambitious, multi-year study of thousands of teachers in six school districts – finds that the study’s results were inconclusive and provide little usable guidance. 

“The MET research does little to settle longstanding debates over how best to evaluate teachers comprehensively,” said Jesse Rothstein of the University of California Berkeley. 

Rothstein and William Mathis, NEPC’s managing director, conducted the review for the policy center’s Think Twice think tank review project.

The MET study compared three types of teacher performance measures: Student test scores, classroom observations, and student surveys. The project concluded that the three should be given roughly equal weight in teacher evaluations.

Rothstein and Mathis found that the data do not support that conclusion. Instead, the data indicate that each measure reflects a distinct dimension of teaching. Rothstein said, “Any evaluation system needs to be founded on a judgment about what constitutes effective teaching, and that that judgment will drive the choice of measures. Nothing in the MET project’s results helps in forming that judgment.”

“While we commend The Bill and Melinda Gates Foundation for investing millions of dollars in tackling critical education issues, the conclusions in this case do not jibe with the data,” said Mathis.

Teacher evaluation has emerged as a prominent educational policy issue; it was, for instance, one of several contested points during the Chicago teachers’ strike in September 2012. And it is a key element of the Obama administration’s education policy. Debate over teacher compensation, hiring and firing, which once centered on traditional salary matrices and teacher observation systems, is increasingly focused on finding concrete outcome measures – particularly, student test score gains. But these measures are controversial, as critics claim that they miss important dimensions of teacher effectiveness.

Here are some of the issues NEPC’s reviewers found with the MET study:

Samples Were Not Representative of the Teaching Force

The centerpiece of the MET study was an experiment that randomly assigned students to teachers. This experimental approach was meant to determine once and for all whether value-added (VA) scores are biased by student assignments. That is, do teachers who are assigned more successful students benefit in terms of their VA scores? But the group of teachers who participated in the MET experiment turned out not to be representative of teachers as a whole, and many participating schools failed to comply with their experimental assignments. As a result, the experiment did little to resolve the question.

No Single “Quality” Factor

Each type of measure explored in the MET study (student test scores, classroom observations, and student surveys) captures an independent dimension of teaching practice. But each measure provides only minimal information about the others. These results indicate that there is no single general teaching “quality” factor—or that if there is any such factor it accounts for only a small share of the variation in each of the measures. Rather, there are a number of distinct factors, and policymakers must choose how to weight them in designing evaluations.

MET Results Will Not Lead to More Effective Teachers

None of the three types of performance measures captures much of the variation in teachers’ impacts on alternative, conceptually demanding tests. There is little reason to believe that an evaluation system based on any of the measures considered in the MET project will do a good job of identifying teachers who are effective (or ineffective) at raising students’ performance on these more conceptually demanding assessments.


The MET review is published by the National Education Policy Center, housed at the University of Colorado Boulder School of Education.

Find the full review at:

The Think Twice think tank review project (, a project of the National Education Policy Center, provides the public, policy makers, and the press with timely, academically sound, reviews of selected publications. The project is made possible in part by the support of the Great Lakes Center for Education Research and Practice.

The mission of the National Education Policy Center is to produce and disseminate high-quality, peer-reviewed research to inform education policy discussions. We are guided by the belief that the democratic governance of public education is strengthened when policies are based on sound evidence.  For more information on NEPC, please visit

This review is also found on the GLC website at

NEPC Reviews ( provide the public, policymakers, and the press with timely, academically sound reviews of selected publications. NEPC Reviews are made possible in part by support provided by the Great Lakes Center for Education Research and Practice: