Skip to main content

The Answer Sheet: How One Great Teacher Was Wronged by Flawed Evaluation System


evalPrincipal Carol Burris of South Side High School in New York has for some time been chronicling the consequences of standardized test-driven reform in her state (here, and here and here, for example). Burris was named New York’s 2013 High School Principal of the Year by the School Administrators Association of New York and the National Association of Secondary School Principals, and in 2010,  tapped as the 2010 New York State Outstanding Educator by the School Administrators Association of New York State. She is the co-author of the New York Principals letter of concern regarding the evaluation of teachers by student test scores. It has been signed by more than 1,535 New York principals and more than 6,500 teachers, parents, professors, administrators and citizens. You can read the letter by clicking here. 

In this new post, Burris tells the story of a New York state teacher who was just unfairly smacked by the state’s flawed new teacher and principal evaluation system, known as APPR, which in part uses student standardized test scores to evaluate educators. The method isn’t reliable or valid, as Burris shows here.

By Carol Burris

Jenn is a teacher of middle-school students.  Her school is in a small city district that has limited resources.  The majority of kids in the school receive free or reduced priced lunch and about 40% are black or Latino.  Many are English language learners. Lots of them are homeless.

After learning that she was rated less than effective because of her students’ standardized test scores, she wrote to Diane Ravitch, who posted her letter on her blog. She wrote:

I’m actually questioning whether I can teach for the next 20 years. It’s all I’ve ever wanted to do, but this APPR garbage is effectively forcing out some of the best teachers I’ve worked with. I may be next.

I contacted Jenn to better understand her story.  I encountered the kind of teacher I love to hire.  She has never imagined herself as anything but a teacher—teaching is not a stepping stone to a career in law or business.  She does all the extras. She comes in early and leaves late. She coaches. She understands that she must be a counselor, a nurse and a surrogate parent to her students—the most at-risk students in the seventh-grade.  Jenn is their support teacher for English Language Arts.

She is valued by her principal who gave her 58 out of 60 points on the measure of teaching behaviors—instruction, lesson plans, professional obligations, understanding of child development, communication with parents—all of the things that matter and that Jenn can truly control.

And then came the test score measures.  The grade-level teachers and the principal had to create a local measure of student performance.  They chose a group measure based on reading growth on a standardized test.  They were required to set targets from a pre-test given in the winter to a post-test given in the spring.  The targets were a guess on the part of the teachers and principal.  How could they not be? The team was shooting in the dark—making predictions without any long-term data.  Such measures can never be reliable or valid.

The state of Massachusetts requires that measures of student learning be piloted and that teachers be evaluated not by one set of scores, but rather by trends over time.  That state’s evaluation model will not be fully implemented for several years because they are building it using phase-in and revision. But New York does not believe in research or caution.  New York is the state where the powerful insist that teachers “perform,” as though they were trained circus seals.  There is no time for a pilot in the Empire State.  Our students and we must jump, as our chancellor advises, “into the deep end of the pool.” In New York, our commissioner warns that we can never let the perfect be the enemy of the good.  We don’t even let nonsense be the enemy of the good.  And so Jenn hoped that she and her colleagues made a reasonable gamble when they set those targets.

Many of the seventh-grade students did not take the standardized reading test seriously. Middle schoolers are savvy—they knew the test didn’t count. So they quickly filled in the bubbles as teachers watched in horror.  Luckily, enough students took their time so that their teachers were able to get 10/20 points on that local measure of learning, which put Jenn in the Effective range.

The final piece in her evaluation was her score from new Common Core-aligned tests that the state gave to students this past spring. The tests were far too difficult for Jenn’s Academic Intervention Services (AIS) students.  They were too long.  The reading passages were dense and many of the questions were confusing. We know that only about 1 in 5 students across the state, who are like the majority of Jenn’s students, scored proficient on the Common Core tests.  Even more importantly, we know that about half of all students like Jenn’s scored in level 1—below Basic. These are the students who, overwhelmed by frustration, give up or guess. The test did not measure their learning—it measured noise.

So Jenn’s students’ scores, along with all the other seventh-grade scores on the Common Core tests, were put in a regression model and the statisticians cranked the model, and they entered their covariates and set confidence levels and scratched their heads over Beta reports and did all kinds of things that make most folks’ eyes glaze over.  And all of that cranking and computing finally spit out all of the teachers’ and principals’ places on the modified bell curve.  Jenn got 5 points out of 20 on her state growth score along with the label, Developing.

When all of the points were added up, it did not matter that she received 58/60 points in the most important category of all, which is based on New York’s teaching standards. And it did not matter that she was Effective in the local measure of student learning. 5+10+58 = 73 which meant that Jenn was two points short of being an Effective teacher. Jenn was labeled, Developing, and put on a mandated improvement plan.

This seven-year dedicated teacher feels humiliated.  She knows that parents will know and possibly lose confidence in her. She is angry because the label is unfair. She will be under scrutiny for a year. Time she would spend on her students and her lessons will be wasted in meetings and improvement plan measurement. The joy of teaching is gone. It has been replaced by discouragement and fear.

Her principal also knows it is not fair—she gave Jenn 58/60 points.  Over time, however, she may begin to doubt her own judgment—the scores may influence how she rates teachers. After all, principals get a growth score too, and the teachers with low scores will become a threat to principals’ own job security over time.  Those who created this system put Machiavelli to shame.

Jenn is not alone.  There are hundreds, if not thousands, of good teachers and principals across the state who are receiving poor ratings they do not deserve based on a flawed model and flawed tests.  Slowly, stories will come out as they gain the courage to speak out.  There will be others who suffer in silence, and still others who leave the profession in disgust.  None of this is good for children.

During the July 2013 hearing of the Governor’s New Education Reform Commission, David Steiner, the previous New York State Commissioner of Education, said,

 There is a risk, and I want to be honest about this, that very, very, mature, effective teachers are saying you are treating me like a kid.  In the name of getting rid of the bottom 5 percent, we risk losing the top 5 percent….We do not want to infantilize the profession in order to save it.

Steiner directed those remarks to his former deputy, now state commissioner, John. B. King.  Did King understand what his former mentor was trying to tell him?  Because he did not respond to Steiner’s observation, we do not know.

John King told districts to use caution when using this year’s scores to evaluate teachers and principals. He claimed that the tests did not negatively impact teacher’s accountability ratings. Perhaps he should ask Jenn if she agrees. We already know that 1 in 4 teachers and principals moved down at least one growth score category from last year — hardly the hallmark of a reliable system.

There is much that King and the Board of Regents can do. They can ask the governor to pass legislation so that the evaluations remain private.  They can request that teachers like Jenn, who are more than effective in the eyes of their principals, be spared an improvement plan this year.  I hold no hope, however, that John King will do that. He lives in “the fierce urgency of now.”  But for Jenn and her students, now quickly becomes tomorrow. The risk that David Steiner explained is real.  We need to make sure that we have our best teachers tomorrow and not lose them in the deep end of the pool.

This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:

The views expressed by the blogger are not necessarily those of NEPC.

Valerie Strauss

Valerie Strauss is the Washington Post education writer.

Carol C. Burris

Carol Corbett Burris became Executive Director of the Network for Public Education Foundation in August 2015, after serving as principal of South Side High School...