About Those Dice… Ready, Set, Roll! on the VAM-ification of Tenure

Bruce D. Baker

March 2, 2012

Accountability and Testing

A while back I wrote a post (and here) in which I explained that the relatively high error rates in Value-added modeling might make it quite difficult for teachers to get tenure under some newly adopted and other proposed guidelines and much easier to lose it, even after waiting years to get lucky [& yes I do mean LUCKY] enough to obtain it.

The standard reformy template is that teachers should only be able to get tenure after 3 years of good ratings in a row and that teachers should be subject to losing tenure if they get 2 bad years in a row. Further, it is possible that the evaluations might actually stipulate that you can only get a good rating if you achieve a certain rating on the quantitative portion of the evaluation – or the VAM score. Likewise for bad ratings (that is, the quantitative measure overrides all else in the system).

The premise of the dice rolling activity from my previous post was that it is necessarily much less likely to roll the same number (or subset of numbers) three times in a row than twice (exponentially in fact). That is, it is much harder to overcome the odds based on error rates to achieve tenure, and much easier to lose it. Again, this is much due to the noisiness of the data, and less due to the difficulty of actually being “good” year after year. The ratings simply jump around a lot. See my previous post.

So, for those of you energetic young reformy wanna be teachers out there thinkin’ – hey, I can cut it – I’ll take my chances and my “good” teaching will overcome those odds – generating year-after-year top quartile rankings? Alot of that is totally out of your control! [Look, I would have been right there with you when I graduated college]

But my first post on this topic was all in hypothetical-land. Now, with the newly released NYC teacher datawe can see just how many teachers actually got three-in-a-row in the past three years [among those actually teaching the same subject and grade level in the same school], applying different ranges of “acceptableness” or not.

So, here, I give the benefit of the doubt, and set a reasonably low bar for getting a good rating – the median or higher [ignoring error ranges and sticking with the type of firm cut-points that current state policies and local contracts seem to be adopting]. Any teacher who gets the median or higher 3 years in a row can get tenure! otherwise, keep trying until you get your three in a row? How many teachers is that? How many overcome the odds of the randomness and noise in the data? Well, here it is:

As percentiles dictate (by definition) about half of the teachers in the data are in the upper half in the most recent year. But, only about 20% of teachers in any grade or subject are above the median two years in a row. Further, only about 6 to 7% actually were lucky enough to land in the upper half for three years running! Assuming stability remains relatively similar over time, we could expect that in any three year period, about 7% of teachers might string together three above-the-medians in a row. At that pace, tenure will be awarded rather judiciously. (but actually, stability in the most recent year over prior is unusually high)

Let’s say I cut teachers a break and only take tenure away if they get two in a row not in the bottom half, but rather all the way down into the bottom third! What are the odds? How many teachers actually get two years in a row in the bottom third?

Well, here it is:

That’s rather depressing isn’t it. The chances of ending up in the bottom third two years in a row are about the same as the chances of ending up in the top half three years in a row!

Now, perhaps you’re thinkin’ Big Deal. So you jump into and out of the edges of these categories. That just means you’re not really solidly in the “good” or the “bad” and it should take you longer to get tenure. That’s fair? After all, it’s not like any substantial portion of teachers are actually jumping back and forth between the top half and the bottom third?

In ELA, 14% of those in the top half in 2010 were in the bottom third in 2009
In ELA, 23.9% in the top half in 2009 were in the bottom third in 2010
In Math (where the scores are more stable in part because they appear to retain some biases), 9% of those in the top half in 2010 were in the bottom third in 2009
In Math, 26% of those in the bottom third in 2009 were in the top half in 2010 and nearly 16% of those in the top half in 2009 ended up in the bottom third in 2010.

[corrected]

Most of these shifts if not nearly all of them are not because the teacher actually became a good teacher or became a bad teacher from one year to the next.

The big issue here is the human side of this puzzle. None of the existing deselection or tightened tenure requirement simulations of the supposed positive effects of leveraging VAM estimates to improve student outcomes makes even halfhearted attempts to account for human behavioral responses to a system driven by these imprecise and potentially inaccurate metrics. All adopt the oversimplified “all else equal” assumption of an unending supply of new teacher candidates that are equal in quality to the current average teacher and with comparable standard deviation.

Reformy arguments ratchet these assumptions up a notch. The most reformy arguments in favor of moving toward these types of tenure and de-tenuring provisions posit that making tenure empirically performance based and de-selecting the “bad” teachers will strengthen the teaching profession. That better applicants – the top third of college graduates – will suddenly flock to teaching instead of other currently higher paying professions.

But, with so little control over one’s destiny is that really likely to be the case? It certainly stands to be a frustrating endeavor to achieve any level of job stability. And it doesn’t look like average compensation will be rising in the near future to compensate for this dramatic increase in risk. Further, if we tie compensation to these ratings either as one-time bonuses or as salary adjustments, many teachers who, by chance, get good ratings in one year will, by chance again, get bad ratings the next year. Teachers will have a difficult time even guessing at what their compensation might look like the following year. And since the ratings are necessarily relative (based on percentiles) the distribution of additional compensation must involve winners and losers. The luckier one or a handful of teachers get in a given year, the larger the share of the merit pot they receive and the less others receive. Once again, I do mean LUCK.

Who will really be standing in line to take these jobs? In the best case (depending on one’s point of view), perhaps a few additional energetic grads of highly selective colleges will jump into the mix for a couple of years. But as these numbers and frustrations play out over time, the pendulum is certainly likely to swing the other direction.

More risk and more uncertainty without any sign of significantly increased reward is highly unlikely to improve the teaching profession and far more likely to make things much worse, especially in already hard to staff schools and districts!

These numbers are fun to play with. I just can’t stop myself. And they have endless geeky academic potential. But I’m increasingly convinced that they have little practical value for improving school quality. And I’m increasingly disturbed by how policy makers have adopted absurd, rigid requirements around these anything but precise and questionably accurate metrics.

This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:

School Finance 101

The views expressed by the blogger are not necessarily those of NEPC.