But first of all, what is Big Data?
Recently “Big Data” has become one of such things you hear everybody talking about but still hardly understands. Just like the story of the blind men and the elephant, Big Data appears to mean different things to different people. For starters, we know it has something to do with data, we also know that the size of the data is big. But isn’t big or small relative? In order words just how big is big enough for a data collection to qualify as Big Data? Another important issue is the application of Big Data in the real world especially in developing countries. Sometimes one gets the feeling that we are yet to conquer “little data” in the developing world and tends to feel as if going for big data will be unwise at this stage.
It’s really more about how the data is used and not necessarily about the size of the data.
But if you look around, you realize that one thing doesn’t have to wait for another. Working in the tech field in Nigeria, I have many times, witnessed the disparity in the take off of technology between different organizations within the same locality. For example, a private clinic in Maiduguri has deployed a custom developed patient information management system while the state’s education authority still performs manual headcount to know the total number of students in its public schools. The analogy of the slow hippo and fast cheetah living in the same forest comes to mind.
But first of all, what is Big Data? Even though it is difficult to put a finger on one perfect definition of big data, this one from webopedia seems agreeable.
Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large that it’s difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. Big data has the potential to help companies improve operations and make faster, more intelligent decisions.
In essence, if you collect large amount of data and extract patterns, analytics and intelligence for decision making, then you may be playing the big data game. It’s really more about how the data is used and not necessarily about the size of the data.
While the big data trend may appear to have emerged in recent years, the idea has been in gestation for quite some time. As far back as the 1940s, some experts have been anticipating of a coming Information Explosion and the challenges that will be faced in managing it.
In 1944, Fremont Rider, a librarian at the Wesleyan University published a paper titled The Scholar and the Future of the Research Library. In it he he estimates that libraries in American universities double in size every sixteen years. By that estimate, he determined that by 2040, the Yale library will have approximately 200 million volumes which will require over 6000 miles of shelves to house. This in addition to over six thousand people needed to work as cataloging staff.
Over the years, others have worked on the issue of exponential data growth. For example in his 1961 publication Science Since Babylon, Derek Price concluded that the number of new journals grows exponentially and not linearly with a doubling in volume every fifteen years and an increment by a factor of 10 every 50 years. This he called The Law of Exponential Growth.
While people like Rider and Price may be paying attention to the size of data being generated, others try to find out how the data is being consumed. For example Ithiel de Sola Pool in a 1983 publication of The Science wrote the following under the title “Tracking The Flow Of Information”;
“words made available to Americans (over the age of 10) through these media grew at a rate of 8.9 percent per year… words actually attended to from those media grew at just 2.9 percent per year…. In the period of observation, much of the growth in the flow of information was due to the growth in broadcasting… But toward the end of that period  the situation was changing: point-to-point media were growing faster than broadcasting.”
These early predictions and studies with respect to information explosion may have paid off in terms of the effort put in place in developing technologies that will be ready to handle such a phenomenon. The growth in storage technology and data compression techniques is also going at an unprecented rate. While it may not be clear whether the rate we are generating data may be outpacing our technological advancement to handle it, one only needs to look back a few years ago to appreciate how far we’ve come. For example, my smartphone today with a 64 GB storage, 4GB of memory and a Quadcore processor is far more powerful than my PC in 2001 which had a 20GB hard disk, 128 MB memory and an 800 Mhz cpu.
In the next episode, we will touch on Who Is Using Big Data And How