Symposium Entry

Standardized assessments must account for non-standardized institutions

By Dr. Fredrik deBoer
Purdue University scholar and academic researcher

May 31st, 2016

Though differentiation is the enemy of sound social science, national efforts at assessing learning in college must make certain affordances for the variability inherent to the higher education system.

Assessment has become an unavoidable topic in higher education circles. The past two American presidential administrations, one Republican and the other Democratic, have both made endorsement of assessment of college learning outcomes a cornerstone of their higher administration policy. More and more state governments are putting pressure on public institutions to gather data about student learning gains as well. Accreditation reform, meanwhile, has been in the national conversation as well, with many calling for more rigorous collection of data regarding student outcomes and student learning. The ongoing tuition and student debt crisis, meanwhile, naturally leads parents and students to question whether a given college delivers strong learning opportunities in exchange for all that money. For many administrators and educators, therefore, the question is not whether or not to conduct assessment of student learning but how.

But assessment of college learning represents a serious challenge. Colleges and universities have long emphasized their individuality, the particular culture, systems, and values that separates them from other institutions. From the standpoint of trying to attract students, this makes sense; in a crowded landscape of competing colleges and universities, there is an obvious intrinsic incentive for schools to differentiate themselves. But differentiation is the enemy of sound social science. Without consistency between institutions, making our assessments valid and reliable – that is, ensuring that they measure what we intend to measure and do so in a consistent and fair way – becomes a much more challenging endeavor. National efforts at assessing learning in college, therefore, must make certain affordances for the variability inherent to our system. If not, we will fail to develop an accurate picture of how well our students and institutions are doing, and risk making bad decisions based on bad information.

Here are some of the particular dynamics of higher education assessment that stakeholders in the assessment process should understand.

Adjusting for ability effects is essential.

Educational testing that is designed to assess schools and teachers rather than learners always faces a major hurdle: as different students have different levels of incoming ability, it can be difficult to fairly evaluate a given institution or instructor’s quality. For example, we know that socioeconomic status is strongly associated with educational outcomes, with students from poorer backgrounds tending on average to perform significantly worse than students from richer backgrounds. To simply look at the average test results of a school full of poor children and compare it to the average of a school full of rich children is a sure way to unfairly judge the teachers at the poorer school.

In the college context, these problems are multiplied. We know for a fact that the incoming populations of different colleges are deeply unequal in prerequisite ability. The most obvious and strongest reason for this is the very college admissions process itself. Competitive schools invest tremendous resources into ensuring that their student body is not like that of other schools. Admissions departments seek out the best-performing, most talented and accomplished high school students, who they then attempt to woo to their institution. The inevitable result is deep stratification in incoming student ability across the university system. We should be careful in noting that attending a competitive institution is quite rare, overall; most American institutions of higher education accept a vast majority of the students that apply. Still, these differences in incoming ability effects are troubling, as they potentially represent serious confounds in our effort to sort out how much students are learning at different institutions. This problem is compounded by the fact that the biggest criterion for selecting a college, for the average student, is not its perceived quality but its geography, with most college students choosing to attend schools close to home. As we know that there are strong geographical trends in educational outcomes – with students in Massachusetts, for example, performing far better than students in Mississippi – this represents another challenge to our analysis.

There are several ways to address these issues. First, score results can be normed against incoming SAT scores, an imperfect but powerful means to sort students into ranks of incoming ability. Scores on tests of higher education learning tend to be highly correlated with SAT and ACT results. We can quantitatively adjust scores on the latter to help control for ability effects. Second, test-retest systems, where students are tested in freshman and senior year, can help to determine how much growth has occurred, and can give us scores that are based not on where students end up but on how much their scores have improved during the course of their education. Sometimes, these efforts can take advantage of complex Value Added Models, though such procedures are controversial.

Perhaps the best check is the simplest: everyone involved in the assessment process, and everyone evaluating the results of assessment, should remain clear that colleges and universities will always produce deeply unequal outcomes based on the admissions procedures of selective colleges.

The testing industry is big business.

Whether assessments should be developed “in-house” or should be provided by testing corporations and nonprofits is one of the perpetual controversies in the assessment literature. There are clear advantages to developing assessments internally. For one, internally-developed assessments can better adapt to the kinds of institution-specific complexity that I discussed previously. Internally-developed assessments also can better involve faculty, helping them to feel like stakeholders in the process, and in doing so, easing tensions that often result from assessment efforts. Internally-developed assessments also have the advantage of keeping funding within the university community, often resulting in money for graduate assistants and other staff. But there are major hurdles to developing assessments internally. They represent a significant investment of time, manpower, energy, and money. Also, in many cases, state administrators and accreditors will likely insist on the use of standardized instruments developed externally.

What everyone involved in the assessment process must understand is that the testing industry is just that, an industry, made up of institutions that are primarily motivated by the drive for profits. The testing industry makes hundreds of millions of dollars, and many companies are fighting to gain purchase in the higher education space. And while there are prominent nonprofit organizations like the Educational Testing Service and ACT in test development, their status as nonprofits has been repeatedly challenged, including legally. Those involved in assessment must bear in mind that, when organizations attempt to sell them tests, they are receiving a marketing pitch like any other. Skepticism of the claims of the institutions that develop tests is perfectly warranted.

This is not to dismiss these instruments or the organizations that make them. Tests developed by for-profit entities are frequently valid and reliable. Many aspects of the collegiate learning experience, such as the production of textbooks, are already farmed out to the private sector. It does mean, however, that everyone involved in the process should utilize critical reasoning when considering these instruments, and to recognize that the developers have the profit motive in mind when selling them. We can fairly expect test developers to frequently exaggerate the validity and reliability of their tests, in an effort to sell them to colleges and universities. Let the buyers beware.

We’re all in this together.

Controversy is a constant in this debate, and for good reason. When we discuss assessment, we are discussing, in a very real way, what the academy does and should value. It’s natural and healthy for such issues to invite debate, even heated debate. What we must all strive for as a community of educators is to make these debates constructive rather than destructive. Faculty, administrators, students, politicians, and parents all have legitimate points of view to bring to bear in this discussion, and in order to serve the interests of our colleges and universities, all of them must be heard. We may never arrive at perfect agreement about how to assess college student learning. But we can create constructive compromises that protect the interests of all involved. If we do, we can gain invaluable knowledge about our institution, our students, and our value in a world where many colleges and universities are threatened. It’s up to all of us to start this conversation.