Big Data has become a focal point for college IT departments.
The storing, management, and security of Big Data is of paramount importance to colleges and universities, with many schools using the reams of student data to stem dropout rates and identify at-risk students, among other uses.
Big Data extends well beyond the walls of the Ivory Tower. Every day, another 2.5 quintillion bytes of online information is created. This data can be used to find patterns in student learning and help quickly diagnose patients and track consumer trends.
In fact, researchers in the University of California system recently created a platform to make sense of all of this information without a way to store and analyze it. By using large, “shared-nothing” computing clusters – a system where each computer node is self-sufficient and not relying on one other – the Big Data platform can better process the large volume of data.
Jose Ferreira, founder and CEO of Knewton, an adaptive learning company, released a list July 18 of five types of Big Data that matter most in education, providing a summary of how this sort of data is critical to the future of higher education.
“Education has always had the capacity to produce a tremendous amount of data, more than maybe any other industry,” Ferreira wrote. “First, academic study requires many hours of schoolwork and homework, 5+ days per week, for years. These extended interactions with materials produce a huge quantity of information. Second, education content is tailor-made for big data, generating cascade effects of insights thanks to the high correlation between concepts.”
Here’s his list of the five kinds of Big Data that matter the most.
1) Identity Data: Identifying who students are, and whether their allowed to use a certain application. This kind of data is also used to determine what administrative rights a person might have on campus, along with their socioeconomic status.
2) User Interaction Data: This includes data like engagement metrics, click rate, page views, and bounce rate. “These metrics have long been the cornerstone of internet optimization for consumer web companies, which use them to improve user experience and retention. This is the easiest to collect of the data sets that affect student outcomes. Everyone who creates an online app can and should get this for themselves,” Ferreira wrote.
3) Inferred Content Data: “How well does a piece of content ‘perform’ across a group, or for any one subgroup, of students? What measurable student proficiency gains result when a certain type of student interacts with a certain piece of content? How well does a question actually assess what it intends to?” Ferreira wrote. “Efficacy data on instructional materials isn’t easy to generate — it requires algorithmically normed assessment items. However it’s possible now for even small companies to ‘norm’ small quantities of items.”
4) System-Wide Data: “Rosters, grades, disciplinary records, and attendance information are all examples of system-wide data,” he wrote on Knewton’s blog. “Assuming you have permission (e.g. you’re a teacher or principal), this information is easy to acquire locally for a class or school. But it isn’t very helpful at small scale because there is so little of it on a per-student basis.”
“At very large scale it becomes more useful, and inferences that may help inform system-wide recommendations can be teased out. But even a lot of these inferences are tautological … or unactionable. So these data sets — which are extremely wide but also extremely shallow on a per-student basis — should only be used with many grains of salt,” Ferreira added.
5) Inferred Student Data: “Exactly what concepts does a student know, at exactly what percentile of proficiency? Was an incorrect answer due to a lack of proficiency, or forgetfulness, or distraction, or a poorly worded question, or something else altogether? What is the probability that a student will pass next week’s quiz, and what can she do right this moment to increase it?” he wrote.