At a conference hosted by Harvard and MIT, schools using the open-source edX platform agreed on a common data structure for their online courses, with the goal of facilitating research on how students learn.
With online courses now part of the mainstream, colleges and universities are collecting terabytes of data on how students interact with their systems and content. But most schools gather this data according to their own specs, which makes comparisons difficult for researchers trying to identify broader trends.
However, this may all change in the wake of a conference hosted by Harvard and MIT this August that saw a dozen schools implement a standardized data structure for MOOCS and other online courses using the Open edX platform. The goal: Create a better understanding of how students learn online and improve instructional approaches accordingly.
“The biggest issue we’re trying to address right now is the feedback loop in online learning—going from content back to better instruction and better material,” said Daniel Seaton, a research scientist at Harvard’s Office of the Vice Provost for Advances in Learning. “We’re helping our partner schools build infrastructure that is capable of analyzing large data sets, which can be quite detailed and messy.”
A New Infrastructure
At the heart of this infrastructure are data standards intended to give researchers a common foundation on which to build. “A big part of the conference involved helping individual institutions set up a workflow that will allow them to extract data about how students interact in MOOCS and online courses, and put them in a usable format,” added Dustin Tingley, a professor of government at Harvard. “An important part of any future collaborative process is a common standard for how certain things are calculated and what specific types of data sets are produced.”
(Next page: Learning to use the right data for the right challenge)
During the conference, which included institutions as diverse as the University of Arizona and Hamilton College, schools learned how to pull a fake edX data set into a dashboard-style display, piggybacking off work MIT and Harvard have already done in identifying the most germane data sets. “One of the greatest challenges for us as a joint team is getting the right kind of data to answer relevant questions in a privacy-respecting manner,” said Isaac Chuang, a professor of physics and electrical engineering at MIT who also serves as senior associate dean of digital learning. “At MIT, we’re very good at finding the data and recording every click. And, given the long history of Harvard’s Graduate School of Education, its faculty knows the right questions to ask. Together, we can shape the data to answer questions about socio-economic status, gaps in education, and growth in computer science.”
Utilizing Online Learning’s Potential
While edX has been around since 2012, the need for a common data standard—and its inherent potential—stems from a more recent phenomenon: the explosion of online learning across the higher ed spectrum. “What brings it all together is the fact that there is now a wide variety of classes,” said Tingley. “It’s not just classes from Harvard and MIT, but hundreds of courses from more than 100 institutions in edX. Suddenly, you can start to see patterns. You can see the flow of learners going from one class to another.”
Although analysis of the data is in its early stages, researchers are already gleaning important lessons about online learning. A perfect example, says Chuang, is the ChinaX MOOC, which started out as a huge course lasting 12-14 weeks. “Professor Peter Bol of Harvard learned that students had better outcomes and discussions when he broke the course into smaller modules and then made those modules even smaller,” he said. “The increased use of modularity has been one of the greatest lessons that Harvard and MIT have gleaned about how online learning can be different and better.”
In another research project, Chuang and Andrew Ho, a professor at Harvard’s GSE, used a set of statistical tools to evaluate the characteristics of assessment questions asked in a MOOC. “The purpose was to see if we were asking the right questions to assess student comprehension and knowledge,” said Tingley. “By putting data in a common format, researchers can analyze these things more systematically.”
The long-term hope is to accelerate the pace of analysis and discovery by involving researchers at more and more institutions. “We’re trying to empower staff at the different universities to launch this workflow themselves, so we can all compare against numbers that were generated using the same code,” said Seaton.
While researchers are eager to exchange their findings with colleagues from other schools, privacy laws such as FERPA make the sharing of the data itself unlikely. “We can’t share our data in its raw form—there can’t be one place that hosts everything,” said Seaton. “The next best thing is to say, ‘Here are all these open-source tools that will allow us to compare apples to apples.'”
The issue of privacy was a major topic at the conference, with participating schools discussing a wide range of policy considerations that extend well beyond the issue of data sharing between institutions. At what level, for example, do schools start to anonymize data for internal purposes? “Part of the goal of the conference was to identify our common challenges and then supply a technical operating structure to address them,” said Tingley.
(Next page: Moving beyond MOOCs in online learning with the new infrastructure)
Moving Beyond MOOCs
While MOOCs tend to take up all the oxygen in any discussion about online learning, the creation of a common data structure is intended to pave the way for advances in all forms of online learning. As many as 50 courses at MIT, for example, use Residential MITx, an on-campus version of Open edX, which opens up interesting research possibilities for Chuang and his team. “Having this common standard between the big sets of MOOC data and the smaller sets of on-campus MIT student data allows us to connect and compare the two,” he said. A soon-to-be-released paper, for example, will analyze the grades received by students in an edX MOOC on quantum mechanics with those of MIT students who covered the same material on campus.
According to Tingley, Harvard is also moving toward more blended courses, and faculty are looking to the university’s experience with edX MOOCs for pointers. The shift, says Tingley, “has been tremendously facilitated by both the technical expertise and the educational perceptions that have been developed through the HarvardX domain. The conversations are decidedly more focused now than they would have been otherwise.”
Developing a common data structure represents just the beginning, however. As colleges and universities begin to analyze the incoming data, forums are needed to help disseminate and discuss research findings. Some of this is already happening: The Association for Computing Machinery, for example, holds an annual conference called Learning at Scale, and new periodicals such as the Journal of Learning Analytics cover research into online learning.
Researchers from Harvard and MIT meet every other week in workshops to discuss their findings, too, but Seaton is looking to expand these workshops to include those university partners who just set up their own systems. “I hope to try to something similar to the August conference in the spring, where we get together to understand the movement collaboratively,” he said. “We need to have more people talking about real data—what’s happening at Wellesley, what’s happening at ASU, what’s happening elsewhere. I think it will all happen very fast.”
For now, though, Seaton is ecstatic that all the participating colleges were able to move from raw edX data to a functional dashboard during the course of the conference, and he fully expects the majority to have something up and running on their campuses within a few weeks. Down the road, he envisages a scenario where this kind of technical conference is not even needed to help schools set up their data workflow. Instead, he foresees an automated download and a setup tutorial. “Probably toward the end of this month we’ll think about how to package the materials so that schools can live on their own without a workshop,” said Seaton. “In future, you’ll simply download a virtual machine that installs everything you need, along with tutorial Google Docs that contain a step-by-step process of how to set this up.”