Skip to main content

Why Giving Standardized Tests to Young Children is "Really Dumb"


Nobel Prize-winning physicist Richard Feynman opined that science is the belief that everyone in authority is ignorant. I am a social scientist; politicians have authority; therefore, politicians are quite likely to be ignorant. Their ignorance cannot show up any clearer than in their recent desire to give tests to very young children.

Some states currently are preparing proposals to engage in another round of Race to the Trough [otherwise known as Race to the Top]. They are seeking a share of the $700 million federal dollars allocated for early learning in the 2011 education budget. States can get this money if they design, develop, and administer pre-kindergarten assessments and kindergarten readiness tests. Common sense and research both suggest that this is really dumb!

If any of these authorities can remember what their own children were like at ages 3, 4 and 5, they would immediately know that any assessments of children at this age are unreliable.

Distinguished developmental psychologist Samuel Meisels believes that most young children have a restricted ability to comprehend the formal, spoken instructions required for most standardized assessments, thus they fail to pick up the cues that older children use to determine what is expected of them in an assessment environment. Younger children also lack the sophistication to interpret situational cues, or written instructions.

Similarly, questions that require complex information-processing skills, such as giving differential weights to alternative choices, distinguishing recency from primacy, or responding correctly to multistep directions, may easily cause a child to give the wrong answer to a question.

In fact, what a child had for lunch, whether they could play outside that day, and whether Sarah hit Johnny or told him he could play dress up with her, has more to do with a test score than the knowledge stored in memory. Furthermore, the knowledge assessed is not exactly what might be called “critical thinking.” The questions often do no more than ask the young child to correctly identify which color is green, which stick is bigger, and which one of the pictures shown is a cow!

Suppose the child missed all three of these common item types used for assessing young children? Are the lives of those children ruined? Or, is it more plausible to expect that they will pick up that knowledge eventually? Except for the severely cognitively challenged, who are easily identified by all early childhood teachers, all other children will learn these “big” ideas eventually.

In fact, when longitudinal studies of testing were examined to see if the achievement test scores of young children could predict the achievement test scores received by those same children a few years later, the answer was that the tests did not predict well at all. And the scores received by young children on assessments of their social and behavioral skills turned out to be completely useless as predictors of the scores the children received on the same measures a few years later. The research quite convincingly shows that for young children, even over relatively short time periods, predictions from one administration of a test to the next are not usually accurate enough to engender any confidence that this year’s performance will tell us much about next years’ performance.

This is what any rational parent or informed politician should expect since young children are undergoing significant changes in brain growth, physiology, and emotional regulation throughout their first eight years of life. As Meisels noted, any brief snapshot of a child’s skills and abilities taken on a single occasion is simply unable to capture the shifts and changes in that child’s development. In addition to the developmental changes we expect of all children, among poor children there are also more frequent changes in family income, housing, caretakers, food security, and so forth. That is, the instability in the scores of middle-class children is expected to be even greater among lower-class children.

The current efforts to assess young children also reflect America’s remarkable amnesia. The federal government tried to assess young children once before, when it mandated a test to assess the effects of Head Start. The government spent millions of dollars to develop theNational Reporting System (NRS) to assess 4-year-olds in Head Start programs. But the NRS was a complete failure.

It failed because a compelling purpose for the test could never be clearly specified. Too few people asked why it is we needed those tests. Too few wondered whether teachers already knew most of what we needed to know about the children that were served.

The test also failed because it could not conform to basic professional standards of test development. The test developers could not provide evidence that reliability and validity were sufficient for the test to be useful. The test failed also because it tapped a narrow sample of children’s skills, as so many tests do. Finally, the test failed because, like so many high-stakes tests, it promoted a curriculum for Head Start that was drill oriented so it would look like the students in that program had rising test scores.

In retrospect, the NRS failed most of all because it ignored the complexity of early development. Meisels and virtually all other scholars in this field teach us that no single indicator, especially a formal test, can reliably and validly assess a young child’s skills, achievements, or personality. It is quite fair to say that no collection of standardized indicators can produce an assessment of any lasting value.

So now, less than a decade after the NRS debacle, Secretary of Education Arne Duncan and other federal and state personnel are out to test young children again. Ignorance redux. I was taught that one definition of insanity was repeating the same thing time and time again, expecting a different outcome. Secretary Duncan has apparently never come across that bit of wisdom.

My own scholarship bears on this issue as well. Arizona, like some other states, tests all children at second grade, a grade below that required by the NCLB legislation. And it is not unusual for districts to test children in first grade and in Kindergarten.

When I asked Arizona State Department of Instruction and district personnel why this was done, the answer was always the same: So they could learn which children needed help and which did not. I asked if they could get that information from teachers, but I was told that such information would not be “objective,” that teacher ratings were “untrustworthy.” 
The state and district administrators I talked with believed that professional teachers did not and could not know enough about the skills and abilities of their students, even after spending eight months with them. I thought they were wrong. So in a series of studies with colleagues Annapurna Ganesh, Joseph Riley, and others, I tried to get the information the State of Arizona wanted from their testing of young children by means of an alternative. I simply asked teachers. I thought that if teachers could reliably identify children who need more help we could save time and money, as well as reduce the anxiety that teachers and students feel at assessment time.

These were simple studies. We asked classroom teachers in grades two to six to rank order the students in their classes in terms of how they would do on the state’s No Child Left Behind accountability test. Following is some information obtained from only the two lowest grades.

In grade two, 36 teachers participated, with class sizes ranging from 17 to 30; in grade three, 30 teachers participated, with class sizes ranging between 22 to 32 students. The correlation coefficients of the teachers’ ranking of their students’ performance with the students’ rank on the state test revealed only strong positive correlation coefficients. In third grade reading and mathematics teachers’ ranks of their students correlated with the rank the student obtained on the test about .84, about as high as the reliability of the tests themselves. Many teachers exhibited correlations greater than .90, indicating that teachers are quite capable of providing the state with information about who needs help and who does not in about 10 minutes, and at the savings of millions of dollars.

In second grade, we expected lower correlations because, as described above, the test scores of children at this age are less reliable. Yet we still found correlations between the teachers ranking and the child’s rank on the test to be about .70 in both reading and mathematics. This correlation is probably as high as the test would correlate with itself a week later (its one week stability reliability), and at the extremes, the rankings by the teachers of the highest and lowest performing students were remarkably accurate.

These results once again indicate that if the state’s interest is identifying students who need help, teachers can do this as well as the test.

Moreover, it is likely that the teachers’ ranking provides more valid information about a child’s performance vis a vis others since it is ordinarily based on thousands of hours of teaching experience, hundreds of hours of observation of that particular child, and interviews with parents and guardians, rather than based on just a few hundred minutes of standardized testing.

To compound the irony, even when the State of Arizona identified through its testing program the young children who appeared to need help, little or no help was given because there were no funds to do so.

Testing young children may be cruel, has not worked out well in the past, often provides unreliable scores and therefore invalid inferences about the abilities of children are made too often. Potentially more valid information, at least as reliable as the tests themselves, and unlikely to elicit anxiety on the part of teachers or students can be obtained from professional educators much quicker and for drastically less money. The funds saved, of course, in any sane world would be used to help the children that teachers identify as needing help.

We certainly do not need more formal testing of young children, but I do think we need sanity tests for those in authority who deny the experiences they have had with their own or other peoples’ children.

This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:

The views expressed by the blogger are not necessarily those of NEPC.

David C. Berliner

David C. Berliner is Regents’ Professor of Education Emeritus at Arizona State University. He has also taught at the Universities of Arizona and&nb...