Skip to main content

VT Digger: William Mathis: What Achievement Tests Tell Us

Editor’s note: This commentary is by William J. Mathis, who is the managing director of the National Education Policy Center and vice chair of the Vermont State Board of Education. He previously served as a design consultant for the National Assessment for Educational Progress and as director of the New Jersey Assessment and Evaluation programs, and consulted on assessment for a number of state agencies. The views expressed here are his own and do not necessarily reflect the views of any group with which he is affiliated.

Like the unexplained monoliths in the classic movie, “2001, A Space Odyssey,” our standardized test scores float untethered in space, free of the very things they are supposed to measure; yet having great power.

They claim to measure “college and career readiness.” Yet, it takes no particular insight to know that being ready for the forestry program at the community college is not the same as astrophysics at MIT. Likewise, “career ready” means many different things depending upon whether you are a health care provider, a convenience store clerk, or a road foreman.

The fundamental flaw is pretending that we can measure an educated person with one narrow set of tests. There is no one universal knowledge base for all colleges and careers. This mistake is fatal to the test-based reform theory.

When the two test batteries (PARCC and SBAC) are put to the test, they don’t score very well. Princeton based Mathematica Policy Research compared PARCC test scores with freshman grade point average and found only 16 percent could be predicted (in the best case) by the math test and less than 1 percent by the English Language Arts score. The SBAC doesn’t have such a validity study but they say it “appears in their crystal ball.” (p.72 1). Since the future of schools and children are in the balance, this is no place for murky crystal balls.

Building a test is conceptually simple. You get an elaborate web of subject matter specialists together to outline the content based on what they think is important. For those tests that have a pass-fail point, the cut-score is likewise based on expert opinion. Aided and abetted by advocates and politicians seeking to create a scientific “proof” of the failure of American education, the cut scores are knowingly set to have a majority of students fail 2.

The irony is the tests have a major predictive validity problem. They can’t tell you whether they are measuring what they claim but they know how many will fail. Like our monolith, they float untethered in space yet have immense but ungrounded power.

Now, why do we have such a state of affairs?

In a distinguished awards address by former American Educational Research Association president and Stanford professor Richard Shavelson,3 test-makers get caught up in the latest testing fad. This results in the tail wagging the dog.

In the current latent traits fad, here’s how the tail has to wag:

Knowledge can only have one line from easiest to hardest, children within a grade are equally distributed within and across all classrooms, and that all children learn the same things in the same way, in the same order and at the same time. As any parent of two or more children can tell you, that is not reality.

Another fatal tail wagging is that no matter how important the item, if it doesn’t fit the latest test fad, it is tossed out. The result is that the test drifts off in space. This problem is made worse when politicians dangle money in front of test experts to do things with tests that cannot and should not be done, says Shavelson.

If we redesigned our measures to address what our state constitutions and citizens tell us is important, we would concentrate on the skills that define success as a citizen, worker and human being. These which include clear and effective communication, creative and practical problem-solving, informed and integrative thinking, responsible and involved citizenship, and self-direction.

This is not to say that standardized testing should be eliminated. It is the single uniform measure across schools. But the very standardized attributes that make them valuable cause harm to those things that are truly important for our children, and our communities.

Since the “recommended” SBAC tests’ standards are currently set to fail about two-thirds of students, the data will wrongly and dishonestly provide fodder for school critics. In high scoring states, a mere half of students will be declared failures even though they would rank in the top 10 percent of the world. The test scores measure neither college nor careers nor success in life. They simply float free in monolithic space radiating glossy ignorance but as far as informing us about our schools, they are a cold, silent and misleading void.


3 Shavelson, R. (April 28, 2017). “On Assumptions and Applications of measurement Models: Is the Tail Wagging the Dog?”

This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:

The views expressed by the blogger are not necessarily those of NEPC.

William J. Mathis

Following a decade as the Managing Director of the National Education Policy Center at the University of Colorado, William J. Mathis serves as a Senior Policy Adv...