What Type and Level of Failure Can We Tolerate in Evaluation Systems?

Sherman Dorn

May 27, 2011

Bruce Baker points out the flaws in the Brookings white paper on evaluation, "Passing Muster" but I think there needs to be a broader take on this. Baker properly sees the circular-reasoning aspects of the Brookings approach (everything but value-added measures is judged by value-added measures, which are judged by … how well they are internally consistent). The problem is that every single personnel evaluation system is flawed if you're looking for pure technical capacity. Give a set of smart postdoctoral researchers a personnel evaluation system intended to be used for summative purposes (i.e., hiring/firing, etc.), and I bet they can poke at least 20 holes in it within an hour.

Employers can't wait for the perfect evaluation system, and given my cynical nature I think most of them are so far below the Lee Shulman idea of a "marriage of insufficiencies" that most of them are best characterized as marriages of incompetencies. But I don't think the flaws are generally technical; they are political flaws in the general sense (not the partisan sense). Any evaluation system has an implicit theory of action on what you do with employees. You can design a system that leans towards retention, leaving too many employees in place when there should be something to intervene to help the employee or counsel the employee out. (In schools, it's very important to keep in mind and remind people that most new teachers are under extraordinary pressures and think of leaving at some point in the first few years, even if they stay. "Annual evaluation leniency" isn't the systematic problem with managing new teachers that many think it is, or at least it's far from the most urgent priority.) You can design a system that leans towards dismissal, kicking too many employees out regardless of their level of competence. You can design a system that leans towards inconsistency, where the idiosyncrasies of individual supervisors dominate decision-making. (That appears to be Rick Hess's preference.) And the way those options dominate real systems leaves little room for a good system of evaluation. Yes, you can have appropriate evaluations in insane systems. But since a lot of the rhetoric revolves about "human capital" strategies for school districts and states, I don't think a proposal for a system should be allowed to live just because a number of supervisors are decent human beings who are smarter than the paperwork.

I know there are examples of better ways of evaluating teachers. They tend to be very local (as in Toledo) or very expensive (as in the Gates Foundation grant supporting peer evaluation in my area). But without extraordinary effort and luck, you're talking primarily about what level of stupid your evaluation system is. That's where you get systems that leave very weak teachers in place (or kicking around from school to school) for years. That's also where you get systems where (despite all pretenses otherwise) outcomes are determined entirely by test scores; see math-teacher blogger JD2718's discussion of a NYC teacher who was denied tenure (technically "extended" for another year on probationary status), at least on first impression entirely based on test scores and apparently part of a broader pattern in the insane NYC system.

So, how can we think around this issue without getting trapped in the unimaginative rhetoric I've read for the last few years? First, we can go back to the overlap between some writings in philosophy and management/I-O psychology research that talks about the relationship between (perceptions of) procedural and substantive justice in personnel decisions. Events that you and I might see as procedurally appropriate can result in decisions (substance) that we disagree with. So if the lived experience of teachers and principals or parents is that the substantive decision is wrong in a case, that can trump an abstract agreement with a procedure. So what appears to be the right substantive judgment in the vast majority of cases is a requirement for political legitimacy of a system to those within it. And the reverse is true for cases near the margin: if a particularly hard case is accompanied by procedural screwiness, lots of peers and community members are going to be unhappy with what happened. That doesn't mean you need perfection; but if I were a principal I'd be very unhappy with the practices of a system that led regularly to questions about the justice of a decision in either a procedural or substantive sense.

If you've worked in a small (and by this I mean a mom-and-pop) business, a lot of this probably seems ridiculous: bosses know those they supervise so well they're comfortable with making decisions based on that holistic judgment. That's if you're in a reasonably healthy job environment; in a boss-from-hell situation, if you have any choice you're likely to leave long before it's annual evaluation nuttiness that's what ends the job. If your main job is something generic such as a programmer with a certain package of skills, you can look for similar jobs. That's not the case with a large number of teachers, especially if their expertise is in an area with relatively few openings each year (and almost none in the middle of a year). So the exit from a particular school for a new teacher is also likely to be the exit from the field.

So we have a nasty combination of structurally asymmetrical exit/voice with new teachers and high vulnerability to lack of credibility from either procedural or substantive flaws. Add to that the idiosyncrasies of individual administrators who may or may not have the professional knowledge and judgment, paperwork skills, spine, political savvy, human touch, and sense of principle to make good decisions in the right way. (And that can be either in favor of retention or letting go, case by case.) It's enough to seduce a number of generally rational people into throwing one's hands up in the air and give in to the technocratic impulse.

I think that way lies both corruption and a politically vulnerable direction; there are loads of people who are claiming the "reformer" mantle and asserting that no personnel decision is occurring and no one is advocating personnel decisions based on test scores. Such claims are mere bullshit, those who voice them know better, and the political consequences of an expanding set of test-based personnel decisions are not what the self-anointed "reformers" really want. (Remember the technocratic AYP, anyone?)

So, if the problems are essentially political, any solution has to be political as well in the best sense of politics (i.e., handling interests in a practical sense). If peer-based systems are political viable, they're one possible solution because they provide input apart from a school principal (one solution to the potential/reality of both principal-tyrants and principal-wimps). There are other potential partial solutions: one would be giving new teachers greater freedom to move between schools before tenure decisions, to eliminate the monopoly an individual school has on new teachers' careers. (My dream of a required annual rotation of new teachers between schools is impractical in most places, unfortunately, and it would be too different from the current model to avoid seriously disrupting teachers' and principals' internal script for "real teacher careers.") But the best route to better evaluation is not now and never will be the perfection of value-added machinery. That is a fantasy of "reformer" technocrats along the lines of flying cars and robot maids.

This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:

Sherman Dorn

The views expressed by the blogger are not necessarily those of NEPC.