Gadfly on the Wall Blog: Top 10 Reasons You Can’t Fairly Evaluate Teachers on Student Test Scores

Steven Singer

August 8, 2018

I’m a public school teacher.

Am I any good at my job?

There are many ways to find out. You could look at how hard I work, how many hours I put in. You could look at the kinds of things I do in my classroom and examine if I’m adhering to best practices. You could look at how well I know my students and their families, how well I’m attempting to meet their needs.

Or you could just look at my students’ test scores and give me a passing or failing grade based on whether they pass or fail their assessments.

It’s called Value-Added Measures (VAM) and at one time it was the coming fad in education. However, after numerous studies and lawsuits, the shine is fading from this particularly narrow-minded corporate policy.

Most states that evaluate their teachers using VAM do so because under President Barack Obama they were offered Race to the Top grants and/or waivers.

Now that the government isn’t offering cash incentives, seven states have stopped using VAM and many more have reduced the weight given to these assessments. The new federal K-12 education law – the Every Student Succeeds Act (ESSA) – does not require states to have educator evaluation systems at all. And if a state chooses to enact one, it does not have to use VAM.

That’s a good thing because the evidence is mounting against this controversial policy. An evaluation released in June of 2018 found that a $575 million push by the Bill and Melinda Gates Foundation to make teachers (and thereby students) better through the use of VAM was a complete waste of money.

Meanwhile a teacher fired from the Washington, DC, district because of low VAM scores just won a 9-year legal battle with the district and could be owed hundreds of thousands of dollars in back pay as well as getting his job back.

But putting aside the waste of public tax dollars and the threat of litigation, is VAM a good way to evaluate teachers?

Is it fair to judge educators on their students’ test scores?

Here are the top 10 reasons why the answer is unequivocally negative:

1) VAM was Invented to Assess Cows.

I’m not kidding. The process was created by William L. Sanders, a statistician in the college of business at the University of Knoxville, Tennessee. He thought the same kinds of statistics used to model genetic and reproductive trends among cattle could be used to measure growth among teachers and hold them accountable. You’ve heard of the Tennessee Value-Added Assessment System (TVAAS) or TxVAAS in Texas or PVAAS in Pennsylvania or more generically named EVAAS in states like Ohio, North Carolina, and South Carolina. That’s his work. The problem is that educating children is much more complex than feeding and growing cows. Not only is it insulting to assume otherwise, it’s incredibly naïve.

2) You can’t assess teachers on tests that were made to assess students.

This violates fundamental principles of both statistics and assessment. If you make a test to assess A, you can’t use it to assess B. That’s why many researchers have labeled the process “junk science” – most notably the American Statistical Association in 2014. Put simply, the standardized tests on which VAM estimates are based have always been, and continue to be, developed to assess student achievement and not growth in student achievement nor growth in teacher effectiveness. The tests on which VAM estimates are based were never designed to estimate teachers’ effects. Doing otherwise is like assuming all healthy people go to the best doctors and all sick people go to the bad ones. If I fail a dental screening because I have cavities, that doesn’t mean my dentist is bad at his job. It means I need to brush more and lay off the sugary snacks.

3) There’s No Consistency in the Scores.

Valid assessments produce consistent results. This is why doctors often run the same medical test more than once. If the first try comes up positive for cancer, let’s say, they’re hoping the second time will come up negative. However, if multiple runs of the same test produce the same result, that diagnosis gains credence. Unfortunately, VAM scores are notoriously inconsistent. When you evaluate teachers with the same test (but different students) over multiple years, you often get divergent results. And not just by a little. Teachers who do well one year may do terribly the next. This makes VAM estimates extremely unreliable. Teachers who should be (more or less) consistently effective are being classified in sometimes highly inconsistent ways over time. A teacher classified as “adding value” has a 25 to 50% chance of being classified as “subtracting value” the next year, and vice versa. This can make the probability of a teacher being identified as effective no different than the flip of a coin.

4) Changing the test can change the VAM score.

If you know how to add, it doesn’t matter if you’re asked to solve 2 +2 or 3+ 3. Changing the test shouldn’t have a major impact on the result. If both tests are evaluating the same learning and at the same level of difficulty, changing the test shouldn’t change the result. But when you change the tests used in VAM assessments, scores and rankings can change substantially. Using a different model or a different test often produces a different VAM score. This may indicate a problem with value added measures or with the standardized tests used in conjunction with it. Either way, it makes VAM scores invalid.

5) VAM measures correlation, not causation.

Sometimes A causes B. Sometimes A and B simply occur at the same time. For example, most people in wheelchairs have been in an accident. That doesn’t mean being in a wheelchair causes accidents. The same goes for education. Students who fail a test didn’t learn the material. But that doesn’t mean their teacher didn’t try to teach them. VAM does not measure teacher effectiveness. At best it measures student learning. Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model. For instance, the student may have a learning disability, the student may have been chronically absent or the test, itself, may be an invalid measure of the learning that has taken place.

6) VAM Scores are Based on Flawed Standardized Tests.

When you base teacher evaluations on student tests, at very least the student tests have to be valid. Otherwise, you’ll have unfairly assessed BOTH students ANDteachers. Unfortunately standardized tests are narrow, limited indicators of student learning. They leave out a wide range of important knowledge and skills leaving only the easiest-to-measure parts of math and English curriculum. Test scores are not universal, abstract measures of student learning. They greatly depend on a student’s class, race, disability status and knowledge of English. Researchers have been decrying this for decades – standardized tests often measure the life circumstances of the students not how well those students learn – and therefore by extension they cannot assess how well teachers teach.

7) VAM Ignores Too Many Factors.

When a student learns or fails to learn something, there is so much more going on than just a duality between student and teacher. Teachers cannot simply touch students’ heads and magically make learning take place. It is a complex process involving multiple factors some of which are poorly understood by human psychology and neuroscience. There are inordinate amounts of inaccurate or missing data that cannot be easily replaced or disregarded – variables that cannot be statistically controlled for such as: differential summer learning gains and losses, prior teachers’ residual effects, the impact of school policies such as grouping and tracking students, the impact of race and class segregation, etc. When so many variables cannot be accounted for, any measure returned by VAMs remains essentially incomplete.

8) VAM Has Never been Proven to Increase Student Learning or Produce Better Teachers.

That’s the whole purpose behind using VAM. It’s supposed to do these two things but there is zero research to suggest it can do them. You’d think we wouldn’t waste billions of dollars and generations of students on a policy that has never been proven effective. But there you have it. This is a faith-based initiative. It is the pet project of philanthrocapitalists, tech gurus and politicians. There is no research yet which suggests that VAM has ever improved teachers’ instruction or student learning and achievement. This means VAM estimates are typically of no informative, formative, or instructional value.

9) VAM Often Makes Things Worse.

Using these measures has many unintended consequences that adversely affect the learning environment. When you use VAMs for teacher evaluations, you often end up changing the way the tests are viewed and ultimately the school culture, itself. This is actually one of the intents of using VAMs. However, the changes are rarely positive. For example, this often leads to a greater emphasis on test preparation and specific tested content to the exclusion of content that may lead to better long-term learning gains or increasing student motivation. VAM incentivizes teachers to wish for the most advanced students in their classes and to push the struggling students onto someone else so as to maximize their own personal VAM score. Instead of a collaborative environment where everyone works together to help all students learn, VAM fosters a competitive environment where innovation is horded and not shared with the rest of the staff. It increases turnover and job dissatisfaction. Principals stack classes to make sure certain teachers are more likely to get better evaluations or vice versa. Finally, being unfairly evaluated disincentives new teachers to stay in the profession and it discourages the best and the brightest from ever entering the field in the first place. You’ve heard about that “teacher shortage” everyone’s talking about. VAM is a big part of it.

10) An emphasis on VAM overshadows real reforms that actually would help students learn.

Research shows the best way to improve education is system wide reforms – not targeting individual teachers. We need to equitably fund our schools. We can no longer segregate children by class and race and give the majority of the money to the rich white kids while withholding it from the poor brown ones. Students need help dealing with the effects of generational poverty – food security, psychological counseling, academic tutoring, safety initiatives, wide curriculum and anti-poverty programs. A narrow focus on teacher effectiveness dwarfs all these other factors and hides them under the rug. Researchers calculate teacher influence on student test scores at about 14%. Out-of-school factors are the most important. That doesn’t mean teachers are unimportant – they are the most important single factor inside the school building. But we need to realize that outside the school has a greater impact. We must learn to see the whole child and all her relationships –not just the student-teacher dynamic. Until we do so, we will continue to do these children a disservice with corporate privatization scams like VAM which demoralize and destroy the people who dedicate their lives to helping them learn – their teachers.

NOTE: Special thanks to the amazingly detailed research of Audrey Amrein-Beardsley whose Vamboozled Website is THE on-line resource for scholarship about VAM.

This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:

Gadfly on the Wall Blog

The views expressed by the blogger are not necessarily those of NEPC.