Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Hadoop: How open source can whittle Big Data down to size

Rohan Pearce | March 5, 2012
In 2011 'Big Data' was, next to 'Cloud', the most dropped buzzword of the year. In 2012 Big Data is set to become a serious issue that many IT organisations across the public and private sectors will need to come to grips with.

The company sells support and a licence to its proprietary software, Cloudera Manager, which helps deploy and monitor CDH. The Oracle Big Data Appliance, released in January, runs CDH.

"Appliances are a great way to get a customer in the door, but most folks end up buying a customised cluster," Cutting says. "Some folks may find the appliance itself to be the right solution, but more frequently people want something that's more suited to their particular uses.

"Folks tend to start with a small proof-of-concept system, perhaps 10 or 20 nodes. Once they've gained some experience with this then they have an idea of both how big their production system needs to be and what its bottlenecks are. This informs the balance of storage, compute, memory and networking that will serve them best.

"Over time, as workloads evolve and grow, folks may gravitate towards common configurations, but we're not yet seeing a lot of one-size-fits-all solutions."

Cutting says when he started Hadoop, which was named after his son's toy elephant, he didn't realise just how significant the project would end up being. "I thought it would probably be useful to lots of folks, but I didn't think much about how many or how they might use it," Cutting says. "I certainly didn't think that it would become the central component of a new paradigm for enterprise data computing.

However, the software is "ultimately the product of a community," he adds. "I contributed the name and parts of the software and am proud of these contributions. The Apache Software Foundation has been a wonderful home for my work over the past decade, and I am pleased to be able to help sustain it."

Cutting uses the example of a hypothetical large retailer to explain what Hadoop can do with an enterprise's data: "Instead of just being able to analyse national sales over the past month, it can with Hadoop analyse sales trends over many years. This lets them better manage pricing, inventory and other core aspects of their business: They get a higher resolution picture of their business.

"Similarly, credit card companies can better guess whether a transaction is fraudulent, banks can better guess whether someone is credit worthy, oil companies can better guess where to drill, and so on. In nearly every case they can use data they were formerly discarding to improve the quality and profitability of their products."

Cutting predicts continued exponential growth in Big Data analytics. "We're still in the steep part of the adoption curve and will be for at least a few more years," he says.

"It will be a while before growth merely tracks that of the larger economy. Developing economies like China and India will fuel continued growth in this space."


Previous Page  1  2  3  Next Page 

Sign up for Computerworld eNewsletters.