VAMboozled!: Identifying Effective Teacher Preparation Programs Using VAMs Does Not Work

A New Study [does not] Show Why It’s So Hard to Improve Teacher Preparation” Programs (TPPs). More specifically, it shows why using value-added models (VAMs) to evaluate TPPs, and then ideally improving them using the value-added data derived, is nearly if not entirely impossible.

This is precisely why yet another, perhaps, commonsensical but highly improbable federal policy move to imitate great teacher education programs and shut down ineffective ones, as based on their graduates’ students test-based performance over time (i.e., value-added) continues to fail.

Accordingly, in another, although not-yet peer-reviewed or published study referenced in the article above, titled “How Much Does Teacher Quality Vary Across Teacher Preparation Programs? Reanalyzing Estimates from [Six] States,” authors Paul T. von Hippel, from the University of Texas at Austin, and Laura Bellows, a PhD Student from Duke University, investigated “whether the teacher quality differences between TPPs are large enough to make [such] an accountability system worthwhile” (p. 2). More specifically, using a meta-analysis technique, they reanalyzed the results of such evaluations in six of the approximately 16 states doing this (i.e., in New York, Louisiana, Missouri, Washington, Texas, and Florida), each of which ultimately yielded a peer-reviewed publication, and they found “that teacher quality differences between most TPPs [were] negligible [at approximately] 0-0.04 standard deviations in student test scores” (p. 2).

They also highlight some of the statistical practices that exaggerated the “true” differences noted between TPPs in each of these but also these types of studies in general, and consequently conclude that the “results of TPP evaluations in different states may vary not for substantive reasons, but because of the[se] methodological choices” (p. 5). Likewise, as is the case with value-added research in general, when “[f]aced with the same set of results, some authors may [also] believe they see intriguing differences between TPPs, while others may believe there is not much going on” (p. 6). With that being said, I will not cover these statistical/technical issue more here. Do read the full study for these details, though, as also important.

Related, they found that in every state, the variation that they statistically observed was greater among relatively small TPPs versus large ones. They suggest that this occurs, accordingly, due to estimation or statistical methods that may be inadequate for the task at hand. However, if this is true this also means that because there is relatively less variation observed among large TPPs, it may be much more difficult “to single out a large TPP that is significantly better or worse than average” (p. 30). Accordingly, there are
several ways to mistakenly single out a TPP as exceptional or less than, merely given TPP size. This is obviously problematic.

Nonetheless, the authors also note that before they began this study, in Missouri, Texas, and Washington, that “the differences between TPPs appeared small or negligible” (p. 29), but in Louisiana and New York “they appeared more substantial” (p. 29). After their (re)analyses, however, their found that the results from and across these six different states were “more congruent” (p. 29), as also noted prior (i.e., differences between TPPs around 0 and 0.04 SDs in student test scores).

“In short,” they conclude, that “TPP evaluations may have some policy value, but the value is more modest than was originally envisioned. [Likewise, it] is probably not meaningful to rank all the TPPs in a state; the true differences between most TPPs are too small to matter, and the estimated differences consist mostly of noise” (p. 29). As per the article cited prior, they added that “It appears that differences between [programs] are rarely detectable, and that if they could be detected they would usually be too small to support effective policy decisions.”

To see a study similar to this, that colleagues and I conducted in Arizona, and that was recently published in Teaching Education, see “An Elusive Policy Imperative: Data and Methodological Challenges When Using Growth in Student Achievement to Evaluate Teacher Education Programs’ ‘Value-Added” summarized and referenced here.

This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:

The views expressed by the blogger are not necessarily those of NEPC.

Audrey Amrein-Beardsley

Audrey Amrein-Beardsley, a former middle- and high-school mathematics teacher, received her Ph.D. in 2002 from Arizona State University in the Division of Educational Leadership and Policy Studies with an emphasis on Research Methods. Awarded tenure in 2010 as an ASU Presidential Exemplar, she is currently an Associate Professor in the...