Today, the term big data draws a lot of attention, but behind the hype there's a simple story. For decades, companies have been making business decisions based on transactional data stored in relational databases. Beyond that critical data, however, is a potential treasure trove of non-traditional, less structured data: weblogs, social media, e-mail, sensors, and photographs that can be mined for useful information. Decreases in the cost of both storage and compute power have made it feasible to collect this data - which would have been thrown away only a few years ago. As a result, more and more companies are looking to include non-traditional yet potentially very valuable data with their traditional enterprise data in their business intelligence analysis.
To derive real business value from big data, you need the right tools to capture and organise a wide variety of data types from different sources, and to be able to easily analyse it within the context of all your enterprise data. The broadest and most integrated portfolio of products will help you acquire and organise these diverse data types and analyse them alongside your existing data to find new insights and capitalise on hidden relationships.
Defining Big Data
Big data typically refers to the following types:
- Traditional enterprise data - includes customer information from CRM systems, transactional ERP data, Web store transactions, general ledger data.
- Machine-generated /sensor data - includes Call Detail Records ("CDR"), weblogs, smart meters, manufacturing sensors, equipment logs (often referred to as digital exhaust), trading systems data.
- Social data - includes customer feedback streams, micro-blogging sites like Twitter, social media platforms like Facebook
The McKinsey Global Institute estimates that data volume is growing at 40 percent per year, and will grow 44x between 2009 and 2020. But while it's often the most visible parameter, volume of data is not the only characteristic that matters. In fact, there are four key characteristics that define big data:
- Volume. Machine-generated data is produced in much larger quantities than non-traditional data. For instance, a single jet engine can generate 10 terabytes of data in 30 minutes. With more than 25,000 airline flights per day, the daily volume of just this single data source runs into the petabytes. Smart meters and heavy industrial equipment like oil refineries and drilling rigs generate similar data volumes, compounding the problem.
- Velocity. Social media data streams - while not as massive as machine-generated data - produce a large influx of opinions and relationships valuable to customer relationship management. Even at 140 characters per tweet, the high velocity (or frequency) of Twitter data ensures large volumes (over eight terabytes per day).
- Variety. Traditional data formats tend to be relatively well described and change slowly. In contrast, non-traditional data formats exhibit a dizzying rate of change. As new services are added, new sensors deployed, or new marketing campaigns executed, new data types are needed to capture the resultant information.
- Value. The economic value of different data varies significantly. Typically there is good information hidden amongst a larger body of non-traditional data; the challenge is identifying what is valuable and then transforming and extracting that data for analysis.
Sign up for Computerworld eNewsletters.