Forty

Big data

John said “big data” is the phrase describing today’s facility to harvest, store and analyze large quantities of data. In fact, he went so far as to say that the phrase implies such facility wasn’t widely available during the first decade of the century; it demands a whole new set of tech.

He described another way in which “big data” may be considered to be different to “the normal sized stuff” – data helps us answer questions; big data also helps us conceive new questions.

Calling big data “big” is sort of underplaying it. It’s really really REALLY big! It’s often measured in terms of petabytes, where a petabyte is a thousand terabytes, or a billion megabytes. John put that into perspective: a 1-petabyte mp3 music track (128kbps) would play for 1,980 years.

I asked John to repeat the statement he’d confronted me with a few weeks earlier. He did. “Data paucity was the problem of the 20th Century. Having too much of the stuff is rapidly becoming the challenge and the opportunity of the 21st.”

I’d begun to think about the skills aspects of this. We weren’t overflowing with people competent in statistics or research methodologies, let alone people who understood the vagaries of big data collection, storage and analysis. Perhaps the heavy tech can be outsourced – per my conversation with John about the ‘T’ in IT – but we still need to understand the analytical insight it gives us. And as John says, perhaps the challenge isn’t so much understanding the answers as knowing what to ask.

I didn’t raise this concern because it seemed more to do with the ‘how’ when we were still focusing on the ‘what’.

Future sources feeding our big data include the social web, test data, and performance data kicked off by our products in the field.

John said if every one of our commercial products fed back a couple of kilobytes of data an hour, this could add up to more than 300 gigabytes a day, a tenth of a petabyte a year. He was at pains to point out that this was an estimate because right now we don’t even have the data to tell us how many of our products are in use, and this simple observation alone underlined the stark contrast of the transition we’re confronting.