The first platform saw users connecting to a central computer, either a mainframe or some other host system through a terminal. The second platform saw this evolve to the use of personal computers in a client-server relationship, and then as the Internet came into the equation, application servers and web-enabled applications.
The third platform, the current iteration, sees a democratisation of technology across enterprises and consumers through such trends as mobile devices, cloud computing, social media, application development platforms and analytics.
The third platform does not restrict access to any of the above. Where the first and second platforms were the domain of the enterprise, the third platform now lies in the palms of consumers of all ages, both to create and access information. This contributes to what IDC calls the digital universe, and it is the exponential growth of the digital universe which has led to the phenomenon known as big data.
By IDC's estimation, 90 per cent of the world's total data has been created in the last two years and 70 per cent of it by individuals. IDC predicts the digital universe will expand in 2013 by almost 50 per cent to just under 4 trillion gigabytes. Despite this, 38 per cent of organisations still don't understand what big data is.
Big data is a term that has been thrown around quite extensively over the last couple of years, and in the process has been misused, misaligned, misconceived and misinterpreted.
At the heart of it, big data has described the sudden explosion of data through the proliferation of smartphones, tablets, sensors, scanners, machines and any other receptacle of electronic information, but the concept is far more encompassing than that.
IDC defines big data as a new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data by enabling high-velocity capture, discovery, and/or analysis.
The real problem of big data is not so much about volume. Technologies are continually evolving to manage the growth of data, and the Hadoop Distributed File System seems to be the emerging standard most solutions are adopting. The real problem lies in the variety and velocity of data.
Big data is messy. It is unstructured and does not fit neatly into the rows and columns of the relational database. It is varied and comes in different types and from different sources.
Organisations are now collecting social media feeds, images, streaming video, text files, documents, telemetry data and so on, reading everything from sentiment, to expression, to electronic forms, to genomes, to soil temperatures and pH levels. This variety of data is hard to render into a structured format and almost impossible for a standard query language to interpret.
Sign up for Computerworld eNewsletters.