Five considerations for Big Data privacy
According to Reif, MIT is at the forefront of digital learning through its Online X programs, the pilot program developed by a MIT and Harvard partnership called edX, a not-for-profit venture that has drawn students from around the world since its 2012 launch. (Read: “MIT’s Big Data education set to get underway.”)
Reif noted that the X programs have over 760,000 unique registered learners from more than 90 countries with over 700 million records.
“We want to measure what works to improve learning, but we also want to share this data with other institutions so that they can learn from our data. However, we have to think about privacy and for us that means FERPA.”
Governed by FERPA (Family Educational Rights and Privacy Act), universities must now decide who counts as a student under the law.
“Are you a student if you take a MOOC? What if you only take so many courses? Are you only considered a student if you receive a certificate? These are some of the new questions higher education institutions must discuss concerning Big Data and privacy,” explained Reif.
He also emphasized that online forums as part of online courses are proving the biggest challenge, since many students post personal information that can be aggregated.
“It’s about setting boundaries while balancing competing interests,” he said.
2. Non-predicated data
In 2012, 2.4 billion internet users shared enough information to surpass 2 zettabytes of digital information, leading to a jump in analytics technology reminiscent of something straight out of “Minority Report.”
With so much information available, said John Podesta, White House counselor, there’s been a move from predicated data, or data individuals have given, to non-predicated data, or data that can profile an individual based on information they themselves may not know.
“The discussion has currently become: How do we inform and develop privacy policies on non-predicated data and what are the social implications of this?”
For higher-ed institutions, this could mean predictive analytics that, based on things like learning trends, financial aid statistics, and teaching costs, can determine which students will succeed at the college before they are ever admitted. (Read: “Higher Education’s Big (Data) Bang: Part One.”)
Is it fair to use Big Data in this way, and should incoming students be notified of these analytics? How do we set up policy to protect student learning data that’s non-predicated? These are the questions we are trying to answer, explained Podesta.
3. Personnel, not software
“There’s a misconception that many data breaches are caused by software malfunctions,” said Mike Stonebraker, adjunct professor at MIT CSAIL. “The truth is, the tools used to manage Big Data, like The Hadoop/hive world, doesn’t make security mistakes. It’s the human element.”
Stonebraker argues that if Big Data users, like universities—often targets of data breaches—want to manage privacy risks, the database should have a command log.
“It’s perfectly acceptable to have a command log to know who’s doing what on the database. I’d also recommend creating a detection system to measure suspicious behavior in personnel when accessing the database. Yes, someone may have to create a breach for the system to become predictive, but it’s a worth investment,” he said.
(Next page: Considerations 4-5)