Skip to main content

Georgia Educators’ Open Letter: Proposed Teacher Evaluation Invalid, Unreliable, & Detrimental to Student Learning

Latest Story

Starting this Fall, Georgia will carry out a teacher and leader assessment system that will use student progress on content tests and administrator evaluation based on a checklist of teacher classroom behaviors to evaluate teachers.

A group of Georgia professors has prepared a letter sent to key politicians including the governor, the GA state school superintendent, and the superintendents of 29 school systems. The letter challenges the teacher and leader evaluation system, identifies the unintended negative consequences, and recommends the state opt out of this invalidated and unreliable system.

I received this letter from EmpowerED Georgia, a non-partisan citizen activist organization that lending its support by coordinating with the group of professors to conduct this important campaign.

The complete letter follows. You can also download a copy and forward it to colleagues and urge them to support these educators.



Georgia Researchers, Educators, and Advocates for Teacher Education Reform

An Open Letter of Concern

Regarding Georgia’s Pending Implementation of It’s New Teacher/Leader Evaluation System


Governor Nathan Deal
Dr. John D. Barge, Georgia State School Superintendent,
Brooks Coleman, Chair, GA House Education Committee
Fran Miller, Chair, Senate Education and Youth Committee
Jill C. Fike, Director, GA State Senate Research Staff
and Superintendents of the following Georgia school systems:
Atlanta, Ben Hill, Bibb, Burke, Carrollton, Chatham, Cherokee, Clayton, Dade, DeKalb, Dougherty, Gainesville, Gwinnett, Hall, Henry, Meriwether, Muscogee, Peach, Pulaski, Rabun, Richmond, Rockdale, Spalding, Treutlen, Valdosta and Whit

The state of Georgia plans to implement significant systemic changes in teacher evaluation in the 2012-2013 school year. GREATER is a consortium of Georgia university professors, researchers, and educational advocates. We study many disciplines directly connected to education such as education policy, measurement, ethics, multiculturalism, curriculum, and evaluation and we wish to express our deep concern about the initiation of this new evaluation system. GREATER joins with our colleagues in Chicago, IL and New York State, where similar evaluation methods are being implemented [1] and, as professionals, we are reaching out to policymakers and legislators to caution against the imposition of yet another impetuous educational policy change in Georgia without high-quality evidentiary research and support.

As university scholars and professors who specialize in educational research, we recognize that change is an essential component of school improvement. We support accountability and high standards and we want what is best for all students in GA. We firmly believe, however, that Teacher Keys and Leader Keys are unproven evaluation systems that carry unforeseen consequences and they are not the path to lasting school improvement.

The state’s new evaluation system, Teacher Keys and Leader Keys, centers on “value added” measures of student growth. We believe the use of value added measures in teacher and leader evaluation will likely lead to negative educational, social, and emotional outcomes for Georgia’s children. We believe it is our ethical, moral, and professional obligation to raise awareness about how the proposed evaluation changes not only lack a sound research basis but also, in some instances, have already proven to be detrimental.

Knowing that many in the legislature would benefit from the input of those who work full time in the field before making a final decision, we offer four concerns and two recommendations, substantiated by rigorous and relevant educational research. Research supports our primary recommendation that the state returns the federal monies related to this project and chooses to “opt out” as Idaho, Indiana, Kansas, Minnesota, Oregon, South Dakota, Virginia, West Virginia and Wyoming have.

However, at a minimum, we encourage (1) more extensive piloting and evaluation of the system before implementing it on a large scale and (2) a drastic reduction of the percentage of student growth as a measure of teacher or leader effectiveness. The Value Added Model is predicated on the belief that tools and processes can be used to accurately determine an educator’s impact on student learning. However, Value Added Models do not address the fact that even in the best of circumstances a teacher’s efforts are one of many indistinguishable conditions for student success.

GREATER does not propose to negate an educator’s responsibility to provide a high quality learning experience. However, it is clear that the tools used to measure educator value (Keys) and the tools used to measure student learning provide an incomplete picture. To base an educator’s evaluation and eventual livelihood on incomplete data does not advance the goals that we all have, which include increasing student academic knowledge, skills and habits and developing productive, informed citizens who are inquiring adults and lifelong learners.

Further, as the majority of decision makers and consultants in this process do not have a concentrated background in the field of educational research or, in some cases, are not located in the state of GA, we urge the State Board of Education and the Georgia Legislature to consult on the recommendations proposed in this letter, as well as other
proposed educational reforms, with the professors and researchers among us who are local and accessible. We bring both scholarly and practical expertise of national renown to these issues. Our forthcoming recommendations are based on the following concerns:

  1. Value Added Models are not proven;
  2. GA is not prepared to implement this evaluation model;
  3. This model is not the most useful way to spend education funds;
  4. Students will be adversely affected by this Value Added Model.

Concern #1: Educational research and researchers strongly caution against teacher evaluation approaches that use Value Added Models (VAMs).

Georgia has already used a value-added statistical model to determine which schools were to be put on probation, closed, or turned around under No Child Left Behind (NCLB)and found this model wanting. For the new teacher evaluation system, “student academic growth” will be measured with VAMs or similar models. Myriad researchers have found that value added models (VAMs) of teacher effectiveness do not produce stable ratings of teachers. For example, different statistical models (all based on reasonable assumptions) can yield different effectiveness scores [2]. Even when models try to control for prior achievement and student demographic variables, teachers are unduly advantaged or disadvantaged based on the students they teach. Researchers have found that teacher evaluation scores can fluctuate from class to class, from year to year, and from test to test [3]. In making the decision to use VAM’s, we encourage the state to consider that ten prominent researchers of assessment, teaching, and learning recently wrote an open letter that included some of the following concerns about using student test scores to evaluate educators [4]:

a. No evidence exists that evaluation systems that incorporate student test scores produce gains in student achievement. In order to determine if there is such a relationship, researchers recommend long-term, small-scale pilot testing of such systems. Furthermore, student test scores have not been found to be a strong predictor of the quality of teaching as measured by other instruments or approaches.[5]

b. Testing companies themselves advise against the use of their instruments to evaluate educators or provide supporting evidence linking test scores to any type of teacher pay for performance model [6].

c. Validity of the testing instruments used to evaluate the students’ value added scores is a large concern in this case. Validity refers to the degree in which an interpretation of a test score is supported by evidence. For a measure of teacher effectiveness to be valid, evidence must support the argument that the measure can actually determine the teacher effectiveness it claims to measure. This is essential. An assessment instrument must be validated before it can be used for particular purposes [7]. Assessments designed to evaluate student learning are not necessarily valid for measuring teacher effectiveness or student learning growth.[8] Using them to measure the latter is akin to using a meter stick to weigh a person: you might be able to develop a formula that links height and weight, but there will be plenty of error in your calculations.

Concern #2: Georgia is not ready to implement a teacher evaluation model that is based on the use of “student growth” as a significant determinant of teacher effectiveness.

A pilot of GA’s proposed evaluation system began in January 2012 and ended May 2012. The state has acknowledged that they intend to adjust the model based on the findings of the pilot and implement a finalized iteration in Fall 2012. Unfortunately, the state has not allowed itself enough time to analyze data and evaluate the outcomes of the semester long pilot with validity and reliability. The current plan leaves only two months to analyze
data from the pilot and make appropriate adjustments and assumes that the outcomes of the pilot will be valid, reliable, or even desirable. These are serious assumption to make about an instrument that will have such a powerful effect on the lives of teachers, principals, students, and families.

For the student growth and academic achievement portion of the Teacher/Leader Keys evaluations, the state and local schools systems must take into consideration:

a. The influence of certain student characteristics such as placement in special education, limited English language proficiency, and residence in low-income households.

b. How they will accurately match teachers to their actual students (e.g. who gets the “credit” for student outcomes in a co-taught class, a class with a paraprofessional, a class with a change of teacher, a new student mid-year arrival, or a class with a support teacher).

c. That teachers, principals, “coaches” and other school administrators have to be trained in the use of student assessments for teacher evaluation and this training will take place in addition to training already planned for the new Common Core State Standards (CCSS); thus, becoming a burden to already overworked individuals.

d. That there is little point in providing value-added teacher evaluations unless they will trigger continuous goal-setting for areas teachers want to work on and provide coaching, remediation, and support through high-quality professional development. At a time when school systems are furloughing teachers and cutting school days, the possibility of any system having appropriate and sufficient funding to support high-quality professional development is seriously in doubt.

Further, for teachers of “non-tested subjects” (e.g. art or music), a standardized value added student assessment does not exist. While some work is being done nationally to develop assessments for “non-tested” subjects, this work is in its infancy. Despite this fact, participating GA school districts must identify at least one “non-tested” assessment for every grade and every subject; determine how student growth will be measured on these assessments; and translate the student growth from these different assessments into teacher evaluation ratings in a fair manner [9].

Concern #3: Feasibility – This evaluation model is not the most responsible, realistic use of state funds and human resources.

At a time when class sizes in GA are being increased, teachers furloughed, and school years shortened due to lack of school funding, spending so much taxpayer money on an untested, un-validated instrument is fiscally irresponsible. Furthermore, this assessment model places a heavy unsupported burden on local school administrators, teachers, and
colleges of education.

a. This assessment model places a HUGE burden on school administrations. The induction phase of this system is a continuous (up to two-year) cycle in which evaluators will often be asked to evaluate content delivery and successful implementation of content-based research. Will evaluators, who are often principals or lead teachers, be trained to recognize effectiveness in subject matter not their own? When will this “training” actually take place and who will conduct it? Furthermore, in an effort to provide flexibility for the 26 partner school districts, many of the provisions require the districts to develop their own evaluations of teacher induction systems. In an already understaffed and overworked school system, who will design and implement these instruments and who will pay the salaries of those hired to do so?

b. This assessment model places a HUGE burden on teachers. A primary (and problematic) presumption of a value added model is that a teacher’s effectiveness can be identified independently through students’ standardized test scores. This evaluation system makes teachers solely responsible for student success when, in reality, quite the opposite is true. Teachers do not work in isolation because schools are learning communities where all parts contribute to student development. An evaluation system that even partially bases an individual teacher’s evaluation on his or her students’ scores ignores the reality that student success is often predicated on the work of many in a school, including reading teachers, resource teachers, reading and English Language Learner specialists, guidance counselors, social workers, psychologists, and other personnel. Most importantly, out-of-school factors are actually more responsible for student success [10]. Non-classroom-teacher factors, including parents’ income level and level of education, account for roughly 85-90% of the statistical variation in students’ test scores [11]. How could we possibly begin to disaggregate each individual’s effect? And why would we want to? Schools operate best when there is cooperation among all caretakers, faculty, and staff members [12] and when all are accountable for each student’s learning.

c. This assessment model places a HUGE burden on colleges of education. The effective preparation of teachers requires practice time spent with mentor teachers in actual classrooms. The model of preparing new teachers, called clinical practice or student teaching, is similar to the one used to prepare medical professionals. The student-teaching portion of an educational program has been determined by some research, such as that done by the Blue Ribbon Commission of NCATE, to be one of the most important and influential parts of a teacher education program [13]. This requires districts, schools, and mentor teachers to willingly allow colleges of education to place student teachers in their classrooms. Yet, even now, many of us who spend time in schools working with students and their mentor teachers have found it increasingly difficult to find placements because of teacher and administrator concerns over student test scores. Given that the new teacher effectiveness measure places so much emphasis on test scores, will student teachers still be welcomed? We fear the answer will be a resounding no.

Concern #4: Students will be adversely affected by the implementation of this new teacher evaluation system.

Our undue focus on testing affects teacher-student relationships and makes it more difficult to establish a classroom community of academically successful learners and critical thinkers.

a. Since the initiation of NCLB, a focus on test preparation to the exclusion of other content has come to be known as “narrowing the curriculum” [14]. Enrichment activities in the arts, music, civics, and other nontested areas have diminished. Using student test scores as a measure of teacher “value” will further restrict what is taught. Educators have spoken of their lived experience with unintended consequences of the narrow curriculum from NCLB over the past decade. Children arrive in middle school without fundamental abilities in non-tested areas. Children are not taught that ideas and issues are multi-faceted but are in some way, artificially constrained to language arts, math, and science.

b. There has been and, with the implementation of these evaluations, inevitably will be a further narrowing of the curriculum as teachers focus more on test preparation and skill-and-drill teaching – particularly in low scoring schools which are largely attended by low Socio-Economic Status students and students of color [15]. By focusing on testing to the exclusion of true teaching, we further catalyze a pending civil right battle for equal educational opportunities.

c. Teachers will subtly but surely be incentivized to avoid students with health issues, students with disabilities, students who are English Language Learners, or students suffering from emotional issues. Research has shown that no evaluation model yet developed can adequately account for all of these ongoing factors [16].

d. The dynamic between students and teacher will change. Instead of “teacher and student versus the exam,” it will be “teacher versus students’ performance on the exam.”

e. Collaboration among teachers will be replaced by competition. With a “value-added” system, a 5th grade teacher has little monetary incentive to make sure that his or her incoming students score well on the 4th grade exams because incoming students with high scores would make his or her job more challenging. When competition replaces collaboration, every student loses.

f. When student test scores take a front and center position and the livelihoods of teachers, administrators, and schools are dependent on these scores, we must stop acting surprised that cheating scandals emerge. Georgia is one of many states already marred by allegations of cheating on standardized tests. Are we assuming that linking standardized test scores to teacher and administrator evaluations will make things better? If so, how?

Our Recommendations

1. Further pilot and adjust the evaluation system before large-scale implementation.

Any annual evaluation system should be piloted and adjusted on a small scale for a length of time that provides sufficient feedback before being implemented statewide. Delaware spent years piloting and fine tuning their system before formally putting it in place statewide. Conversely, Tennessee’s and New York City’s teacher evaluation systems made headlines when their rapid implementation led to unintended negative consequences.

2. Minimize the percentage that “student growth and academic achievement” counts toward teacher or leader effectiveness and look for more valid and reliable ways to measure effectiveness.

We are aware of the complexity of developing teacher evaluation models. However, until standardized student-growth measures are found to be valid and reliable sources of information on teacher or principal performance, they should in no way play a role, or should play a very limited role, in summative ratings. There are other types of instruments and evaluative programs that provide a better, more precise picture of teacher effectiveness. These measures focus on what a teacher does and how practice can be strengthened through non-value added measures, such as paid peer mentoring of first and second-year teachers, seminars, personal portfolios and reflections, and an ongoing analysis of teacher holistic performance. One clear example of these methods may be found in the TEAM (teacher education and mentoring) program currently implemented in CT, which provides provide differentiated professional learning for beginning teachers as they reflect on instructional strategies and analyze student data and outcomes [17]. Students benefit when objective feedback is part of their teachers’ experience [18]. Similar frameworks for principals can serve the same purpose.

The GREATER consortium concludes that hurried implementation of teacher evaluation using student value added growth models will result in inaccurate assessments of our teachers, a demoralized profession, decreased learning, and harm to the children in our care. Further, it is a waste of our state’s increasingly limited resources to widely implement a program that has not yet been thoroughly field-tested or fully strategized. Our students are more than the sum of their test scores, and research clearly shows that an overemphasis on test scores will not result in increased learning, increased well-being, or greater success. According to a nine-year study by the National Research Council [19], the past decade’s emphasis on high-stakes standardized testing has yielded little learning progress. This is particularly troubling when we consider the cost of this emphasis to taxpayers.

We all cannot afford to lose sight of what matters the most—the academic, social, and emotional growth and well-being of Georgia’s children. Our students, teachers, and communities deserve better. They deserve thoughtful, reliable, valid reforms that will improve teaching and learning for all students. It is in this spirit that we write this letter.


Signed by 9 educational researchers across the State of Georgia, as of June 8th , 2012. University affiliations are listed for identification purposes only and do not imply affiliated consent.

Since then, many other professors and educators have signed on to the letter, including this writer.

1. (Primary Contact) Mari Ann Roberts, Ph.D., Clayton State University, 404-374-9154
2. Alyssa Hadley Dunn, Ph.D., GA State University
3. Erica Dotson, Ph.D., Clayton State University
4. Karen Falkenberg, Ph.D., Emory University
5. Jillian Carter Ford, Ph.D., Kennesaw State University
6. Rebecca Hill, Ph.D. Kennesaw State University
7. Marquita Jackson-Minot, Ph.D., Georgia Gwinnett College
8. Regina Meeler, Ph.D., Gainesville State College
9. Vera Stenhouse, Ph.D., National Association for Multicultural Education, President, Georgia Chapter

Show your support for this important campaign to improve the way teachers will be evaluated. Add you name and comment at the bottom of this article.


[1] Note: This letter was adapted from the letter written by Dr. Kevin Kumashiro, University of Illinois at Chicago, which was signed by more than 80 university professors and researchers in the Chicago area. It was also inspired by the letter written by Sean C. Feeney, Ph.D. and Carol C. Burris, Ed.D., which was signed by more than 1400 New York State principals in opposition to New York’s evaluation plan.

[2] Papay, J. (2011). Different tests, different answers: The stability of teacher value-added estimates across outcome measures. American Educational Research Journal, 48(1), 163-193

[3] Darling-Hammond, L. (2012). Creating a Comprehensive System for Evaluating and Supporting Effective Teaching. Stanford, CA. Stanford Center for Opportunity Policy in Education.

[4] Baker, E., et al. (2011). Correspondence to the New York State Board of Regents. Retrieved October 16, 2011 from… 2011/05/21/AFJHIA9G_blog.html.

[5] See Burris, C., & Welner, K. (2011). Conversations with Arne Duncan: Offering advice on educator evaluations. Phi Delta Kappan, 93(2), 38-41.

[6] Fair Test (2007). Organizations and Experts Opposed to High-Stakes Testing.

[7] Goe, L., Bell, C. & Little, O. (2008). Approaches to Evaluating Teacher Effectiveness: A Research Synthesis. National Comprehensive Center for Teacher Quality.

[8] Goe, L., & Holdheide, L. (2011). Measuring teachers’ contributions to student learning growth for nontested grades and subjects. Retrieved February 2, 2012 from

[9] Note: RT3 information procured from Strickland Design (2012), RT3 Newsletter Archives. Retrieved April 7, 2012 from and the GA Department of Education Top/Pages/default.aspx

[10] Goldhaber, D., Brewer, D. & Anderson, D. (1999). A three-way error components analysis of educational productivity. Education Economics 7(3), 199-208.

[11] See Hanusheck, E., Kain, J., & Rivkin, S. (1998). Teachers, schools and academic achievement. Retrieved October 16, 2011 from

[12] DuFour, R., & Eaker, R. (1998). Professional learning communities at work. Best Practices for enhancing student achievement. Bloomington, IN: National Educational Service.

[13] NCATE (2010). Transforming Teacher Education Through Clinical Practice: A National Strategy to Prepare Effective Teachers. Retrieved June 1, 2012 from

[14] Crocco, M.S., Costigan, A.T. (2007). The Narrowing of Curriculum and Pedagogy in the Age of Accountability Urban Educators Speak Out. Urban Education, 42(6), 512-535.

[15] Committee on Incentives and Test-Based Accountability in Education of the National Research Council. (2011). Incentives and Test-Based Accountability in Education. Washington, DC: National Academies Press.

[16] Baker, E., et al (2010). Problems with the use of test scores to evaluate teachers. Washington, DC: Economic Policy Institute. Retrieved October 16, 2011 from

[17] TEAM Retrieved from… and

[18] National Board for Professional Teaching Standards (2012). Impact of National Board Certification on Teacher
Practice & Schools. Retrieved April, 7, 2012 from

[19] Committee on Incentives and Test-Based Accountability in Education of the National Research Council. (2011). Incentives and Test-Based Accountability in Education. Washington, DC: National Academies Press.


This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:

The views expressed by the blogger are not necessarily those of NEPC.

Jack Hassard

Jack Hassard is a former high school science teacher and Professor Emeritus of Science Education, Georgia State University. While at Georgia State he was coordina...