VAMboozled!: Teacher Evaluation Recommendations Endorsed by the Educational Psychology Division of the American Psychological Association (APA)

Recently, the Educational Psychology Division of the American Psychological Association (APA) endorsed a set of recommendations, captured within a research brief for policymakers, pertaining to best practices when evaluating teachers. The brief, that can be accessed here, was authored by Alyson Lavigne, Assistant Professor at Utah State, and Tom Good, Professor Emeritus at the University of Arizona.

In general, they recommend that states’/districts teacher evaluation efforts emphasize improving teaching in informed and formative ways verses categorizing and stratifying teachers in terms of their effectiveness in outcome-based and summative ways. As per recent evidence (see, for example, here), post the passage of the Every Student Succeeds Act (ESSA) in 2016, it seems states and districts are already heading in this direction.

Otherwise, they note that prior emphases on using teachers’ students’ test scores via, for example, the use of value-added models (VAMs) to hold teachers accountable for their effects on student achievement and simultaneously using observational systems (the two most common teacher evaluation measures of teacher evaluation’s recent past) is “problematic and [has] not improved student achievement” as a result of states’ and districts’ past efforts in these regards. Both teacher evaluation measures “fail to recognize the complexity of teaching or how to measure it.”

More specifically in terms of VAMs: (1) VAM scores do not adequately compare teachers given the varying contexts in which teachers teach and the varying factors that influence teaching and student learning; (2) Teacher effectiveness often varies over time making it difficult to achieve appropriate reliability (i.e., consistency) to justify VAM use, especially for high-stakes decision-making purposes; (3) VAMs can only attempt to capture effects for approximately 30% of all teachers, raising serious issues with fairness and uniformity; (4) VAM scores do not help teachers improve their instruction, also in that often teachers and their administrators do not have access, have late access, and simply do not understand their VAM-based data in order to use them in formative ways; and (5) Using VAMs discourages collegial exchange and sharing of ideas and resources.

More specifically in terms of observations: (1) Given classroom teaching is so complex, dynamic, and contextual, these measures are problematic given no systems that are currently available capture all aspects of good teaching; (2) Observing and providing teachers with feedback warrants significant time, attention, and resources but oft-receives little in all regards; (3) Principals have still not been prepared well enough to observe or provide useful feedback to teachers; and (4) The common practice of three formal observations/year/teacher does not adequately account for the fact that teacher practice and performance varies over time, across subject areas and students, and the like. I would add here a (5) in that these observational system have also been evidenced as biased in that, for example, teachers representing certain racial and ethnic backgrounds might be more likely than others to receive lower observational scores (see prior posts on these studies herehere and here).

In consideration of the above, what they recommend in terms of moving teacher evaluation systems forward follows:

  • Eliminate high-stakes teacher evaluations based only on student achievement data and especially limited observations (all should consider if and how additional observers, beyond just principals, might be leveraged);
  • Provide opportunities for teachers to be heard, for example, in terms of when and how they might be evaluated and to what ends; 
  • Improve teacher evaluation systems in fundamental ways using technology, collaboration, and other innovations to transform teaching practice;
  • Emphasize formative feedback within and across teacher evaluation systems in that “improving instruction should be at least as important as evaluating instruction.”

You can see more of their criticisms of the current and recommendations for the future, again, in the full report here.

