Skip to main content

Rating Ed Schools by Student Outcome Data?

Tweeters and education writers the other day were  all abuzz with talk by U.S. Secretary of Education Arne Duncan of the need to crack down on those god-awful schools of education that keep churning out teachers who don’t get sufficient value-added out of their students.


Once again, the conversations were laced with innuendo that it is our traditional public institutions of higher education that have simply failed us in teacher preparation. They accept weak students, give them all “As” they don’t deserve and send the out to be bad teachers. They, along with the lazy greedy teacher graduates they produce simply aren’t  cutting it, even after decades of granting undergraduate degrees and certifications to elementary and secondary teachers.

This is a long post, so I’ll break it into parts. First, let’s debunk a few myths – a) regarding who is cranking out degrees and credentials in the field of education and b) regarding whether education policy should ever be guided by the actions of Louisiana or Tennessee. Second, let’s take a look at teacher production and distribution across schools in a handful of Midwest & plains states.

Who’s crankin’ out the credentials?

Allow me to begin this post by reminding readers – and POLICYMAKERS – that many initial credentials for teachers these days aren’t granted at the undergraduate level – but rather as expedited graduate credentials. Further, the mix of institutions granting those degrees has changed substantially over the decades, and perhaps that’s the real problem?

Here’s the mix of masters degree production in 1990:

And again in 2009:

Yes, by 2009, thousands of teaching credentials and advanced degrees were being churned out each year by online mass production machines. Perhaps if we really feel that there has been a precipitous decline in teaching quality, these shifts may be telling us something! What has change? Who is now cranking out the credentials/degrees?

Now, I’m no big fan of the types of accountability systems and self-regulation that have been in place for education schools (specifically credential granting programs) in recent years.I tend to feel that these systems largely reward those who do the best job filling out the paperwork and listing that they have covered specific content standards (a syllabus matching exercise), while many simply lack qualified faculty to deliver on such promises. For more insights, see:

  • Wolf-Wendel, L, Baker, B.D., Twombly, S., Tollefson, N., & Mahlios, M. (2006)
    Who’s Teaching the Teachers? Evidence from the National Survey of Postsecondary
    Faculty and Survey of Earned Doctorates. American Journal of Education 112 (2) 273-

A colleague of mine at the University of Kansas (we’ve now both moved on) used to joke that we should simply list on our accreditation forms the names of all of the already accredited institutions that are plainly and obviously worse than us (Kansas). That should be sufficient evidence, right?

But, simply because current systems of ed school accountability may not be cutting it does not mean that we should rush to adopt the toxic foolish policies being thrown out on the table in current policy conversations, including the recent punditry of Arne Duncan on the matter.

First, let’s dispose of the notion that Louisiana and Tennessee can ever be used as model states.

Specifically, we are being told that states must look to Louisiana and Tennessee as exemplars for reforming teacher preparation evaluation. Exemplars yes. Positive ones? Not so much. Allow me to point out that I don’t ever intend to consider Louisiana or Tennessee as a model for education policies until or unless either state actually digs their public education system out of the basement of American public schooling. These states are a disgrace at numerous levels, and not because they have high concentrations of low-income children. Rather, because both put little financial effort into their education systems and perform dismally. Both have large shares of children exported entirely. They are not models!  Here’s my stat sheet on the two:

Sure, not a single measure in the table above relates to the teacher evaluation proposals on the table. And true, these states have adopted novel (putting the best light on it) models for evaluating teacher preparation programs. But, when put into the context of these states, one will likely never know whether or if those models of teacher prep program evaluation are worth a damn. Further, when placed into a context of states with such a historic record of deprivation of their public education systems, one might even question the motives of the “crack down” on teacher education. Can a state really be serious about improving public education with the record presented above?

Suggesting that these states are now models because they have decided to rate teacher education programs on the basis of the test scores of students of teachers who graduated from each program does not, can not, make these states models.

Perils of evaluating teacher preparation programs by value-added scores of the students of teachers who graduated from them?

Here’s where it gets tricky and really messy and for at least three major reasons. The proposals on the table suggest that the quality of teacher preparation programs can somehow be measured indirectly by estimating the average effect on student outcomes of teachers who graduated from institution x versus institution y.  Further, somehow, evaluation of these teacher preparation programs can be controlled through state agencies, with specific emphasis on state accredited teacher producing institutions.

  • Reason #1: Teachers accumulate many credentials from many different institutions over time. Attributing student gains of a teacher (or large number of teachers) to those institutions is a complex if not implausible task. Say, for example that a teacher in St. Louis got an undergraduate degree from Washington University in St. Louis, but not a teaching degree. The teacher got the position on emergency or temporary certification (perhaps through some type of “fellows” program) with little intent to make it a career – decided he/she loved teaching – and eventually got credentialed time through William Woods University (a regional mass producer of teacher and administrator credentials)). Is the credential institution, or the undergraduate institution responsible for this teacher’s success or failure?
  • Reason #2: If one looks at the data on the teacher workforce in any given state, one finds that teachers hold their various degrees from many, many institutions – institutions near and far. True, there are major producers and minor producers of teachers for any given labor market. But, in any given labor market or state, one is likely to find teachers with degrees from 10s to 100s of institutions. In some cases, there may be only a few teachers from a given institution (for example Michigan State graduates teaching in Wisconsin).  That makes it hard to generate estimates of effectiveness. Should states simply cut off these institutions? Send their graduates home? Never let them in? Further, while teachers do in many cases come from within-state public institutions, they also come from a scattering of institutions in border states, especially where metropolitan labor markets spread across borders.  Value-added estimates of teacher effectiveness will depend partly on state testing systems (ceiling effects, floor effects).  What is an institution to think/do when its graduates are rated highly in one state’s value-added model, but low in another? Does that mean they are good, for example at teaching Iowa kids but not Missouri ones? Iowa curriculum but not Missouri curriculum? Or simply whether the underlying scales of the state tests were biased in opposite directions? Can/should states start to erect walls prohibiting inter-state transfer of credentials? (after years of working toward the opposite!)
  • Reason #3: It will be difficult if not entirely statistically infeasible to generate non-biased estimates of teacher program effectiveness since graduates are NOT RANDOMLY DISTRIBUTED ACROSS SETTINGS. I would have to assume that what most states would try to do is to estimate a value-added model which attempts to sort out the average difference in student gains of teachers from institution A and from institution B, and in the best case, that model would include a plethora of measures about teaching contexts and students. But these models can only do so much in that regard. While this use of the value-added method may actually work better than attempts to rate the quality of individual teachers, it is still susceptible to significant problems, mainly those associated with non-random distribution of graduates. Here are a few examples from the middle of the country:

The first focuses on recent graduates of in-state Kansas institutions and the characteristics of schools in which they worked during their first year out. The average rate of children qualified for subsidized lunch ranges from under 20% to nearly 50%. Further, this average actually varies to this extent largely because teachers are sorted into geographic pockets around the state which differ in many regards. The most legitimate statistical comparisons that can be made across teacher prep graduates from these institutions are the comparisons across those working in similar settings. In some cases, the overlap between working conditions of graduates of one institution and another is minimal. And Kansas is a relatively homogeneous state compared to many!

Here’s Missouri, with teachers having 5 or fewer years of experience, and the percent free or reduced price lunch in schools where the teachers currently work. I’ve limited this figure to those institutions producing only very large numbers of Missouri teachers, which is less than half of the entire list. Notably, many of these institutions are from border states, including University of Northern Iowa and Arkansas State University. These universities tend to produce teachers for the nearest bordering portions of Missouri.

Again, there are substantial differences in the average low-income population in schools of graduates from various universities. Not here that graduates of the state flagship university – University of Missouri at Columbia – tend to be in relatively low poverty schools. Assuming the state testing system does not suffer ceiling effects, this may advantage Mizzou grads. Kansas grads above have a similar advantage in their state context. Graduates of Arkansas State, and of Avila College near Kansas City may not be so lucky.

Just to beat this issue into the ground… here’s a Wisconsin analysis comparable to the Missouri analysis. Graduates of Milwaukee area teacher prep institutions including UW-Milwaukee, Marquette and Cardinal Stritch may have significant overlap in the types of populations served by their graduates. But most are in higher poverty settings than graduates of the various state regional colleges. Again, only the BIG producers are even included in this graph. And the differences are striking statewide. And graduates are substantially regionally clustered further complicating effectiveness comparisons across teacher producing institutions.

These are just illustrations of the differences in one single parameter across the schools/students of graduates of teacher preparation programs. The layers difference in working conditions go much deeper, and include, for example, substantial variations in average class sizes taught, as well as significant often unmeasured neighborhood level differences in diverse metropolitan areas. Teacher labor markets remain relatively local. Teachers remain most likely to teach in schools like the ones they attended, if not the exact ones. Teacher placement is non-random. And that non-randomness presents serious problems for evaluating the quality of teacher preparation programs on the basis of student outcomes.

Is it perhaps interesting as exploratory research to attempt to study the relative “efficacy” of teacher prep programs by these and other measures to see what, if anything, we can learn? Perhaps so.

Is it at all useful to enter so blindly into using these tools immediately in making high stakes accountability decisions about institutions of higher education? Heck no! And certainly not because policymakers in Louisiana or Tennessee said so!

This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:

The views expressed by the blogger are not necessarily those of NEPC.

Bruce D. Baker

Bruce Baker is Professor in the Graduate School of Education at Rutgers University. His primary areas of research include elementary and secondary education...