Big Data 101: Myths and realities

Massive amounts of data are generated by IT systems these days. Some of this is carefully tracked and analysed by well-defined reporting systems (finance and payment information, for instance), Lifehacker reports.

However, much of it is either stored in logs that are never referred to again (website visitors) or dumped after a very limited period of time (security camera footage).

Processing these large volumes of data and correlating them with other business information and external sources of data can lead to useful insights. Businesses might discover, for example, that particular goods are often purchased in combination, but that those combinations vary by time of day or the location of the customer. That can make it easier to cross-sell to those customers.

Big data is often discussed in terms of the “three Vs”: volume, velocity and variety. A project isn’t big data if it isn’t dealing with a very large stream of constant data from a wide range of sources which is arriving at an unpredictable pace.

The analysis process isn’t straightforward, since it’s not as simple as merely matching up columns of information using a predictable structure that remains the same over time. Data sources need to be rated for relevancy, and the mere act of churning through large volumes of data requires significant processing power, storage and I/O. It also requires constant attention, since the analyses themselves have to adjust to changing incoming data.

Expertise is thin on the ground: One constant theme in big data: it’s hard finding people with the skills to do it well. The ideal combination includes skill in scientific analysis and confidence with large databases — a skill-set you’ll pay a premium for.

Read more

"(Required)" indicates required fields