Viewpoint: Having bad data often isn’t a ‘technology problem’

Campuses are producing high-quality data, but much of this information is not being used.
Campuses are producing high-quality data, but much of this information is not being used.

I have worked with many different schools on reporting and institutional research projects. The common question I am asked from Information Technology (IT) and Institutional Research (IR) offices is, “What tool should we be using to pull data from our system?”

People continue to struggle to get the information they need out of their institution’s administrative systems. In their minds, the problem—and answer—always seems to be the technology.

I have seen schools that have been successful with a variety of reporting tools, ranging from Microsoft Excel to the most advanced business intelligence (BI) and statistical analysis tools. I have also seen many schools fail with the exact same tools.

My conclusion is that institutional reporting is not a technology problem. The key to success is communication, collaboration, knowledge of the data, and trust between all users—all of which are categories of best practices in data management.

“Data management” is a broad term and often includes each of the following:

  1. Data definitions: Do we agree on what is meant when we talk about our data?
  2. Data governance: Who owns and manages the data in different areas on campus?
  3. Data knowledge: How do we find the data we need?
  4. Data quality: Are the data “good”? Are they accurate?
  5. Data access: Who is allowed to see or change the data?
  6. Data integrations: How do we manage data across disparate systems?

These are all important concepts, but when extracting data from administrative systems, the most important concepts are data definitions, governance, knowledge, and quality.

Data definitions: Have a conversation

For anyone outside of a college or university, it’s hard to understand the complexity of higher-education data. How is it possible to argue over the definition of a “first-time freshman?”

To understand what this term means, specifics such as minimum registered credits, add/drop periods, transfer credits, non-degree seeking students, registration status, census dates, academic levels, and dual degree students need to be considered.

Even individuals familiar with higher-education data can lose sight of these intricacies. In this environment, remember a simple fact: Computers are not as smart as their users; they don’t understand the assumptions and details that go into a data request.

It is important that users have a conversation about the data in order to avoid assumptions and uncover underlying details. When IR, IT, functional offices, faculty, and administration talk about data, everyone must understand and communicate all of the details for the data definitions.

Campuses should have conversations about data definitions every day and will need a place to capture this information. These definitions must be saved in a single, transparent location where all parties can see the results of these conversations.

Many schools have started initiatives for creating data dictionaries.

These projects are often started in either an IR or IT office and tend to be one-sided definitions. IR offices tend to capture only the functional definition, while IT offices tend to capture only the technical definition. The goal should be to tackle both the functional and technical definitions in data dictionaries.

Once an IR office or registrar agrees on the definition for “first-time freshman,” the process of extracting data to define the term should be documented to ensure it is accurate and repeatable; this might include multiple technical definitions.

How do you pull the necessary data from the transactional student system? How do you pull the necessary information from the data warehouse? Is there a different definition for different time contexts (such as “current” or “as of a given date”)?

Data governance: Create structured collaboration

It is not surprising to find half a dozen different definitions for “student” on a campus. The interesting thing is that all of these definitions can be correct; they are just used by different departments in different contexts.

The bursar’s office, the admissions office, and the state government all have different ways of identifying a “student.” When creating a data dictionary, recognize and structure these different points of view. Create a data governance community to allow for “approved” or “official” definitions in different functional areas.

Data stewards can moderate an institution’s conversations about data when assigned to areas like academic records, admissions, HR, and financial aid.

These data stewards should be responsible for approving all terms in their area.

Outside of data definitions, data stewards also can take on critical roles in data quality and data security. Leadership of the data stewardship community should be shared between IT and IR, because they have a role to play for both departments. A collaborative environment with a data stewardship community will help to create a culture of trust and buy-in for data usage.

Data knowledge: Democratize user access

For any individual, the biggest indicator of success in a reporting project is knowledge of both the data structures and business needs. Many campuses have super-users who, over time, have developed a deep understanding of the data systems.

Super-users often gain this knowledge from a combination of timing, circumstances, and specific project assignments. These individuals are very valuable to a campus, but they’re also frequently bottlenecks or silos for access to information.

When I started out in higher-education technology, I was lucky enough to work on multiple data migration projects that forced me to learn the data structures.

The experience I acquired from these projects provided me with an invaluable knowledge about student data systems. Like many super-users on campus, this knowledge has been both useful and valuable to me now as a university resource.

Is it good to have knowledge limited to a handful of people?

I would argue that it’s time to let go of our “super-user pride” and become more open about data knowledge. Institutions need to think of ways to democratize the knowledge and accelerate the learning curve for anyone who needs access to data.

There are four ways to do this:

1) Allow users direct access to the data. This does not mean access to production, or violating FERPA or HR privacy; instead, open secure query databases for direct access with a variety of tools. This requires careful monitoring, but the rewards can be enormous.

2) Give users the necessary documentation. Access to data dictionaries with functional and technical definitions is a great start. Create forums for asking questions and mentoring.

3) Allow users to answer their own questions with data. There is no better way to learn than to have your own question to work on. Training is often theoretical or unrelated to actual jobs (take a SQL class and see what I mean). Real projects create real learning.

4) Get users to share the knowledge as it is learned. Collaboration through data dictionaries, shared documentation, and user groups will ensure the investment in learning does not create a new class of super-user gatekeepers.

Data quality: Build trust among all users

Several things affect data quality. Some “bad” data come from the old “garbage in, garbage out” principle. At the same time, there are also “good” data with “bad” communication, so clean data with no communication can still create “garbage out.”

Many schools are producing vast amounts of high-quality data for their campus. However, owing to a lack of trust in the data, much of this information is not being used. When people are not included in conversations about the data definitions, or they just see data as numbers coming out of a black box, it can lead to doubt.

It’s important to create transparency and shared documentation in data management. Without both transparency and open collaboration during the process, it is very difficult to create trust. No report is ever easy enough or pretty enough to overcome a lack of trust.

Improving data management is the key to success

When institutions struggle with institutional reporting, they assume that the existing technology is not good enough, so they look for a new tool. I have seen schools move through the landscape of reporting and BI tools only to find that they continue to have the same problems. There are some great tools out there, and you can do some amazing things with new technologies.

However, before throwing more money at the problem with a new tool, you should investigate how your campus is communicating about data.

With a small investment in data management best practices, campuses can create a “value multiplier” for existing investments in reporting technologies. The institutions that are seeing a return on their technology investments have started with the fundamental understanding that, in order to turn data into information, everyone must be involved in the conversation.

Brian Parish has been working in education technology since 1995, with a primary focus in enterprise data systems. Parish is the president of IData Inc., a higher-education technology consulting and software solutions firm. IData provides services in IR, IT, and system implementation; its product line includes, an online collaboration tool designed to serve as a central repository for data management and governance.