Skip to main content

Ed in the Apple: Billions of Dollars Later Researchers Find Teacher Evaluation Reforms Have No Positive Effects on Student Outcomes

I was working on a Network Team in New York City, about a dozen of us “supporting” twenty-five schools who had chosen to work with a specific network leader. The Bloomberg administration selected Joel Klein, an attorney with no education experience as chancellor and Klein morphed from reorganization to reorganization, from ten mega districts each with hundreds of schools evolving in affinity networks, schools selecting with whom they wanted to work, sort of educational speed dating.

Charlotte Danielson’s Frameworks, published in the nineties, was taught in teacher preparation programs and the rumor was going to be adopted by New York City as the teacher evaluation tool.

The network leader invited Danielson to make a presentation to principals and staff, for me, her Frameworks were much too complicated. At the end of the presentation I asked Danielson, “Judge Potter Stewart said you can’t define pornography but you know it when you see it, isn’t it the same for effective teaching?” (See Justice Stewart’s opinion here).

Danielson demurred, rather adamantly.

For many years I was the teacher union guy on Schools Under Registration Review teams. The lowest performing schools in the state were cited and a team led by a Regional Superintendent visited the school, from Monday to Thursday we reviewed data, interviewed students, teacher, and school leadership, observed most of the school staff and wrote a detailed findings/recommendation report based on a state template.  I observed scores of teachers.

The vast majority of teachers were hard-working, conscientious, dedicated and pretty much on their own. Teacher observations were few and far between; schools placed staff development way down the list. School leadership was overwhelmed with discipline, covering classes of absent teachers, filling vacancies, and the innumerable ukases from the District Offices.

Occasionally I’d see a dud, and occasionally a star, working in a SURR school was challenging, and frequently characterized by ineffective leadership.

Our reports were well-written and ignored.

National research reports rarely impact national policy, the Widget Effect report (2008) from The New Teacher Project (TNTP) was the exception.

If teachers are so important, why do we treat them like widgets?

Effective teachers are the key to student success, yet our school systems treat all teachers as interchangeable parts, not professionals. Excellence goes unrecognized and poor performance goes unaddressed. This indifference to performance disrespects teachers and gambles with students’ lives.

The Widget Effect is a wide-ranging report that studies teacher evaluation and dismissal in four states and 12 diverse districts and reflects survey responses from approximately 15,000 teachers and 1,300 administrators.

Key Findings:

  • All teachers are rated good or great. Less than 1 percent of teachers receive unsatisfactory ratings, making it impossible to identify truly exceptional teachers.
  • Professional development is inadequate. Almost 3 in 4 teachers did not receive any specific feedback on improving their performance in their last evaluation.
  • Novice teachers are neglected. Low expectations for beginning teachers translate into benign neglect in the classroom and a toothless tenure process.
  • Poor performance goes unaddressed. Half of the districts studied have not dismissed a single tenured teacher for poor performance in the past five years.

The report gives policymakers and school leaders recommendations for acquiring better information about instructional quality to give great teachers the recognition they deserve.

The Report exploded across the education landscape. Teacher evaluation was at the core of all innovations. The Obama administration’s $4 billion Race to the Top competitive state grants required teacher evaluation tied to pupil achievement and most states followed suit. School districts created merit pay plans.

The Denver plan, Paycomp, was a model for Race to the Top; the pay-for-performance plan resulted in a teacher strike in 2019. Teachers demanded salary increases not “bonuses” based on obscure formulas.

A new acronym entered our vocabulary, VAM, value-added measurement, a mathematical formula, an algorithm comparing student achievement to similar students across a district and “grading” teachers based on student growth scores. In New York State, called Annual Professional Performance Review and (APPR) was embedded into state education law.

The Gates Foundation embarked on a six year $500 project called Measures of Effective Teaching, in four school districts,

… school sites agreed to design new teacher-evaluation systems that incorporated classroom-observation rubrics and a measure of growth in student achievement. They also agreed to offer individualized professional development based on teachers’ evaluation results, and to revamp recruitment, hiring, and placement. Schools also implemented new career pathways for effective teachers and awarded teachers with bonuses for good performance.

Six years later the lead author of the report opines,

“The initiative itself tried to pull a bunch of levers to have a big impact on student performance,” said Brian Stecher, a RAND researcher and the lead author of the report. “The sites did in fact modify all of these levers, some more than others, but in the end, there were no big payoffs in terms of improved graduation [rates] or achievement of students in general, and low-income and minority students in particular.”

Read the final report here.

Charlotte Danielson is rethinking the use of her Frameworks and agrees with me, re the deleterious impact of the Widget Effect Report,

The immediate challenge is that those with the responsibility to ensure good teaching in schools—primarily building administrators—don’t always have the skill to differentiate great teaching from that which is merely good, or perhaps even mediocre. This idea was highlighted in “The Widget Effect,” a 2009 report from the organization TNTP that had enormous influence on the design of Race to the Top, the federal initiative that required states to implement rigorous systems of teacher evaluation to qualify for billions of dollars in federal grant money.

Danielson is sharply critical of the educational establishment,

There is also little consensus on how the profession should define “good teaching.” Many state systems require districts to evaluate teachers on the learning gains of their students. These policies have been implemented despite the objections from many in the measurement community regarding the limitations of available tests and the challenge of accurately attributing student learning to individual teachers.

Even when personnel policies define good teaching as the teaching practices that promote student learning and are validated by independent research, few jurisdictions require their evaluators to actually demonstrate skill in making accurate judgments. But since evaluators must assign a score, teaching is distilled to numbers, ratings, and rankings, conveying a reductive nature to educators’ professional worth and undermining their overall confidence in the system.

I’m deeply troubled by the transformation of teaching from a complex profession requiring nuanced judgment to the performance of certain behaviors that can be ticked off on a checklist. In fact, I (and many others in the academic and policy communities) believe it’s time for a major rethinking of how we structure teacher evaluation to ensure that teachers, as professionals, can benefit from numerous opportunities to continually refine their craft.

A just released paperThe Effect of Teacher Evaluation on Achievement and Attainment: Evidence from Statewide Reform (12/2021) reinforces Danielson’s concerns, the movement from treating teaching as a complex activity to checking off boxes on a checklist. Joshua Bleiberg and his colleagues looked at eight years of data (2009-17) and asked: how did the implementation of teacher evaluation reforms impact student achievement? The Report’s answer: it didn’t.

Education Week reports,

More than a decade ago, policymakers made a multi-billion-dollar bet that strengthening teacher evaluation would lead to better teaching, which in turn would boost student achievement. But new research shows that, overall, those efforts failed: Nationally, teacher evaluation reforms over the past decade had no impact on student test scores or educational attainment.

The research is the latest indictment of a massive push between 2009 and 2017, spurred by federal incentives, philanthropic investments, and a nationwide drive for accountability in K-12 education, to implement high-stakes teacher evaluation systems in nearly every state.

“There was a tremendous amount of time and billions of dollars invested in putting these systems into place, and they didn’t have the positive effects reformers were hoping for,” said Joshua Bleiberg, an author of the study…

The evaluation reforms were largely unpopular among teachers and their unions, who argued that incorporating certain metrics, like student test scores, was unfair and would drive good educators out of the profession. Yet proponents—including the Obama administration—argued that tougher evaluations could identify, and potentially weed out, the weakest teachers while elevating the strongest ones.

 “It took away the overall focus on the kid and the overall focus on teaching,” said Erin Scholes, an innovation coordinator … “I felt like [the reforms] hit the science of teaching rather than the art of teaching and tried to fit everyone in the same box.”

Researchers found no positive effects on student outcomes

A team of researchers from Brown and Michigan State Universities and the Universities of Connecticut and North Carolina at Chapel Hill analyzed the timing of states’ adoption of the reforms alongside district-level student achievement data from 2009 to 2018 on standardized math and English/language arts test scores. They also analyzed the impact of the reforms on longer-term student outcomes, including high school graduation and college enrollment. The researchers controlled for the adoption of other teacher accountability measures and reform efforts taking place around the same time, and found that their results remained unchanged.

They found no evidence that, on average, the reforms had even a small positive effect on student achievement or educational attainment.

The study’s authors noted that the design and implementation of the reforms fell short of the recognized best practices for performance management systems …

… implementation proved difficult in most places, with most teachers still receiving satisfactory ratings under the new evaluation systems. Performance-based dismissals were still rare, and states that linked evaluation ratings to compensation often offered only small bonuses or set the bar so low that most teachers qualified.

Also, the reforms decreased job satisfaction among new teachers who felt like they had little autonomy to do their best work, the paper noted. And they added significant demands to administrators’ already burdensome workload.

“It was really the worst of all worlds,” said Michael Petrilli, the president of the Thomas B. Fordham Institute, a conservative education think tank that advocated for more teacher accountability. “It was just a big paperwork exercise. It led to a lot of anxiety and bad morale. Not only did it have no findings [of positive effects on student outcomes], it had real-world consequences that were almost entirely negative.”

Will US Secretary of Education Cardona say Arne Duncan was wrong; the emphasis should be on creating more Community Schools?


If you ask bureaucracies why you’re carrying out a specific policy the answer is “… that’s the way we’ve always done it,” in other words Newton’s First Law of Motion drives policy decisions.

An object at rest stays at rest and an object in motion stays in motion with the same speed and in the same direction unless acted upon by an unbalanced force.

The “unbalanced force” could be the President’s Build Back Better legislation.

Stay Tuned

This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:

The views expressed by the blogger are not necessarily those of NEPC.

Peter Goodman

Peter Goodman is a career NYC high school teacher, education consultant, and district representative for the United Federation of Teachers. ...