Living in the Tails of the Rhetorical and Teacher Quality Distributions
A few weeks ago, Students First NY (SFNY) released a report, in which they presented a very simple analysis of the distribution of “unsatisfactory” teacher evaluation ratings (“U-ratings”) across New York City schools in the 2011-12 school year.
The report finds that U-ratings are distributed unequally. In particular, they are more common in schools with higher poverty, more minorities, and lower proficiency rates. Thus, the authors conclude, the students who are most in need of help are getting the worst teachers.
There is good reason to believe that schools serving larger proportions of disadvantaged students have a tougher time attracting, developing and retaining good teachers, and there is evidence of this, even based on value-added estimates, which adjust for these characteristics (also see here). However, the assumptions upon which this Students First analysis is based are better seen as empirical questions, and, perhaps more importantly, the recommendations they offer are a rather crude, narrow manifestation of market-based reform principles.
First things first – just for the record, advocacy groups, including Students First, have spent the past few years arguing that the S/U ratings are inaccurate (and many teachers agree the system was highly flawed). Now, all of a sudden, they’re equated with “teacher quality?” This is a superficial point, but that seems like quite a change of heart (as a few sharp reporters noticed).
SFNY of course acknowledges this inconsistency. The authors address it with a decent amount of discussion (much of which is pretty good). Their primary rationale is that the U ratings, which were given to only three percent of the city’s teachers in 2011-12, almost certainly understate the number of ineffective teachers, but that they are a “reasonable proxy for ineffective teachers, generally.”
Let’s examine this logic a bit. What SFNY is saying is that a better evaluation system would identify more ineffective teachers (as well as differentiate between the large mass of S-rated teachers), but that we can be reasonably confident that the U-rated teachers identified by the S/U system are “truly ineffective.” Put differently, a lot of fish slipped through the net, but the big fish were caught.
It’s plausible to assume, at least for the sake of an illustrative analysis like SFNY’s, that most U-rated teachers are, in general, poor performers. However, remember that SFNY is comparing the distribution of U ratings across school groupings.
Therefore, even if all the U-rated teachers were indeed low performers, that is not sufficient. We must also assume that the evaluations were carried out in a manner that, while imperfect, was consistent enough across schools serving different populations (e.g., by income, race, prior achievement) to permit this comparison. This is an empirical question treated as an assumption in this report.
In fact, some of the primary arguments against the old system speak directly to inconsistent implementation. For example, according to critics of the old system, some principals simply chose to give everyone S-ratings because there was no incentive to do otherwise, or because they wanted these teachers to transfer to other schools, or because they didn’t want to deal with the appeal process. There are plausible reasons why the prevalence of these types of behaviors might vary systematically by school poverty, performance and other characteristics (And there are compositional issues here as well.)*
In short, the S/U ratings may be as much a proxy for the manner in which the system is implemented as for teacher performance. To their credit, the authors of this report aren’t reckless about this, but they might have been a bit more cautious at points.
We could have just left it at that, but for the recommendations at the end of the report.
First off, SFNY’s primary solution for addressing the fact that U-rated teachers are located disproportionately in poor, minority and low-scoring schools is a new evaluation system, followed by nine additional recommendations (a couple of which are items from their overall policy agenda forced into this report). There’s a recurring theme: Most of them are staffing-related policies that differentiate based on measured performance (which is why new evaluations is their primary recommendation).
There’s merit to the idea underlying this approach. After all, if you’re interested in altering the distribution of teacher “quality” between schools, targeting can be a useful tool. For example, paying high-performing teachers more to work in hard-to-staff schools is a widely-supported, feasible idea.
But it can’t be all you bring to the table. In reading SFNY’s recommendations, I was reminded of the The New Teacher Project’s “irreplaceables” report, which addressed a similar issue – the retention of high-performing teachers. TNTP recommended reforms like performance pay, but they also made a point of discussing the importance of working conditions and the role of good leadership in creating a culture that makes teachers want to stay. They suggested interventions like career ladders and programs to improve instruction.
In contrast, SFNY’s recommendations basically ignore all of this, and I suspect (perhaps unfairly) this was a deliberate choice – they want to be tough and avoid the soft talk about support and working conditions. On the one hand, we need to be realistic about the limited evidence on “what works” when it comes to these broad-based retention strategies. Even policies that seem most promising on the surface don’t always have much of a track record. On the other hand, the Students First approach, for which there’s no evidence at all (mostly because it hasn’t been tried), doesn’t present an innovative alternative vision, but rather a crude, narrow application of old but potentially useful ideas.
They propose a couple of sensible policies like retention incentives and monitoring the situation regularly. Most of the rest of their recommendations, at least those that are germane to the report and specific enough to comment on, are pretty much about firing or punishing teachers who get low ratings, regardless of the schools in which they work. For instance, SFNY proposes that teachers who receive low-ratings in one year should be required to get parental consent for each kid in their class the next year, and that any child who has a low-rated teacher in one year can’t get one the next year (these policies, by the way, would probably bias value-added estimates in future years). They also propose a cap on the number of low-rated teachers in each school – if too many teachers got low ratings, there would have to be layoffs to bring the school down to the cap.
This is not a thoughtful use of measurement and incentives. It ignores the source of the problem, and focuses almost entirely on the tails of the “quality” distribution (one tail in particular).
And it’s not how the labor market works. You can’t increase risk several times over without increasing rewards and support. You can’t ignore that this risk will also affect the decisions of teachers who don’t get the low ratings, as well as those of people who are considering teaching as a career (whose numbers are not unlimited). And you can’t let the whole system run on evaluations ratings that, at the absolute best, will be untested, imprecise and viewed with skepticism.
Look, almost half of new teachers in NYC schools serving lower-performing students leave their initial schools within two years. And this is primarily because these are very difficult schools in which to teach, especially when you’re just starting out. Yes, some of this turnover is inevitable and some of it is healthy, but much of it is neither. And there’s plenty of productive middle ground between massive, costly, undifferentiated efforts to reduce attrition across the board and an agenda that focuses solely on rewarding and punishing teachers with performance measures that have zero track record.
The distribution of teachers across schools is, in fact, a problem, and we can give credit to Students First for highlighting this issue. But, when it comes to their ideas about how to move forward, we should look elsewhere.
- Matt Di Carlo
* Due to the constant churn in schools serving more disadvantaged populations, teachers in these schools tend to be less experienced. Principals have more incentive to evaluate new teachers thoroughly, which means that those with larger shares of novices would do so for more of their teachers. This would again mean that implementation varies by school characteristics. Even more generally, newer teachers are presumably more likely to receive poor ratings no matter where they teach. Experience would probably explain some part of the results in this report.
This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:
The views expressed by the blogger are not necessarily those of NEPC.