Yesterday, the Council of the American Education Research Association (AERA) – AERA, founded in 1916, is the largest national professional organization devoted to the scientific study of education – publicly released their “AERA Statement on Use of Value-Added Models (VAM) for the Evaluation of Educators and Educator Preparation Programs.” Below is a summary of the AERA Council’s key points, noting for transparency that I contributed to these points in June of 2014, before the final statement was externally reviewed, revised, and vetted for public release.
As per the introduction: “The purpose of this statement is to inform those using or considering the use of value-added models (VAM) about their scientific and technical limitations in the evaluation of educators [as well as programs that prepare teachers].” The purpose of this statement is also to stress “the importance of any educator evaluation system meeting the highest standards of practice in statistics and measurement,” well before VAM output are to carry any “high-stakes, dispositive weight in [such teacher or other] evaluations” ( p. 1).
As per the main body of the statement, the AERA Council highlights eight very important technical requirements that must be met prior to such evaluative use. These eight technical requirements should be officially recognized by all states, and/or used by any of you out there to help inform your states regarding what they can and cannot, or should and should not do when using VAMs, whereas “[a]ny material departure from these [eight] requirements should preclude [VAM] use” (p. 2).
Here are AERA’s eight technical requirements for the use of VAM:
- “VAM scores must only be derived from students’ scores on assessments that meet professional standards of reliability and validity for the purpose to be served…Relevant evidence should be reported in the documentation supporting the claims and proposed uses of VAM results, including evidence that the tests used are a valid measure of growth [emphasis added] by measuring the actual subject matter being taught and the full range of student achievement represented in teachers’ classrooms” (p. 3).
- “VAM scores must be accompanied by separate lines of evidence of reliability and validity that support each [and every] claim and interpretative argument” (p. 3).
- “VAM scores must be based on multiple years of data from sufficient numbers of students…[Related,] VAM scores should always be accompanied by estimates of uncertainty to guard against [simplistic] overinterpretation[s] of [simple] differences” (p. 3).
- “VAM scores must only be calculated from scores on tests that are comparable over time…[In addition,] VAM scores should generally not be employed across transitions [to new, albeit different tests over time]” (AERA Council, 2015, p. 3).
- “VAM scores must not be calculated in grades or for subjects where there are not standardized assessments that are accompanied by evidence of their reliability and validity…When standardized assessment data are not available across all grades (K–12) and subjects (e.g., health, social studies) in a state or district, alternative measures (e.g., locally developed assessments, proxy measures, observational ratings) are often employed in those grades and subjects to implement VAM. Such alternative assessments should not be used unless they are accompanied by evidence of reliability and validity as required by the AERA, APA, and NCME Standards for Educational and Psychological Testing” (p. 3).
- “VAM scores must never be used alone or in isolation in educator or program evaluation systems…Other measures of practice and student outcomes should always be integrated into judgments about overall teacher effectiveness” (p. 3).
- “Evaluation systems using VAM must include ongoing monitoring for technical quality and validity of use…Ongoing monitoring is essential to any educator evaluation program and especially important for those incorporating indicators based on VAM that have only recently been employed widely. If authorizing bodies mandate the use of VAM, they, together with the organizations that implement and report results, are responsible for conducting the ongoing evaluation of both intended and unintended consequences. The monitoring should be of sufficient scope and extent to provide evidence to document the technical quality of the VAM application and the validity of its use within a given evaluation system” (AERA Council, 2015, p. 3).
- “Evaluation reports and determinations based on VAM must include statistical estimates of error associated with student growth measures and any ratings or measures derived from them…There should be transparency with respect to VAM uses and the overall evaluation systems in which they are embedded. Reporting should include the rationale and methods used to estimate error and the precision associated with different VAM scores. Also, their reliability from year to year and course to course should be reported. Additionally, when cut scores or performance levels are established for the purpose of evaluative decisions, the methods used, as well as estimates of classification accuracy, should be documented and reported. Justification should [also] be provided for the inclusion of each indicator and the weight accorded to it in the evaluation process…Dissemination should [also] include accessible formats that are widely available to the public, as well as to professionals” ( p. 3-4).
As per the conclusion: “The standards of practice in statistics and testing set a high technical bar for properly aggregating student assessment results for any purpose, especially those related to drawing inferences about teacher, school leader, or educator preparation program effectiveness” (p. 4). Accordingly, the AERA Council recommends that VAMs “not be used without sufficient evidence that this technical bar has been met in ways that support all claims, interpretative arguments, and uses (e.g., rankings, classification decisions)” (p. 4).
CITATION: AERA Council. (2015). AERA statement on use of value-added models (VAM) for the evaluation of educators and educator preparation programs. Educational Researcher, X(Y), 1-5. doi:10.3102/0013189X15618385 Retrieved from http://edr.sagepub.com/content/early/2015/11/10/0013189X15618385.full.pdf+html
This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:
The views expressed by the blogger are not necessarily those of NEPC.