There’s quite a kerfuffle going on in the world of big data, with a range of prominent articles in the past month suggesting it’s not the analytical holy grail it’s been made out to be, ABC reports.
Taken together, these pieces suggest the start of a serious rethink of what big data can and can’t actually do.
Perhaps most prominent is a piece in the journal Science on March 14. It builds from an article in Nature last year reporting that Google Flu Trends (GFT), after a promising start, flopped in 2013, drastically overestimating peak flu levels. Science now reports that GFT overestimated flu prevalence in 100 of 108 weeks from August 2011 on, in some cases with estimates that were double the CDC’s prevalence data.
As well as picking apart GFT’s problems (inconsistent data source, possibly inconsistent measurement terms) the authors blame “big data hubris,” which they define as “the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.” Fundamentally, they add: “The core challenge is that most big data that have received popular attention are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis.”
While “enormous scientific possibilities” remain, the authors say, “quantity of data does not mean that one can ignore foundational issues of measurement.”