Vyacheslav Arkharov, application platform business development manager at Microsoft Russia, listed the following examples of real-world tasks that may require implementing big data technologies: risk assessment, prevention of money laundering, trend analysis and forecasting in the financial segment; inquiry examination; Web and social network analysis, advertising intelligence and digital image analysis in the mass media and online content sectors; customer behavior analysis and sales intelligence both in online and traditional commerce; fraud prevention in online games; various national security tasks; gene and pharmaceutical research; as well as scientific and educational research.
Further use scenarios for big data technology include assessing the impact of weather and road traffic conditions on cargo delivery and fuel consumption; examining conversation records in call centers for customer behavior analysis; operation and fault analysis in telecom networks; probing the impact of weather changes on energy generation; interpreting smart meter data in electric grids; and analyzing system transaction logs in various verticals, said Sergey Likharev, head of information management solutions at IBM Eastern Europe and Asia.
Solutions for working with big data should be able to provide easy access to the entire body of corporate information; process both structured and unstructured data; map relations between various pieces of data regardless of their format; work with original data sources to prevent duplicating; understand meaning and context of all data; identify similar phone calls, emails, documents and IM messages; as well as process and analyze data on the fly using predefined rules, according to HP's Wagner.
When addressing the big data issue, it is critical to evaluate total costs of collecting, storing and processing the data, and of course, the priority is to increase ROI for the corresponding technologies, pointed out Nick Rossiter, regional director of Informatica Russia and CIS. This could be done through raising the value of data or lowering its cost, said Rossiter. Increased value is achieved primarily by acquiring new business capabilities and advantages (such as speeding up customer request processing, widening customer audience, taking measures to reduce customer complaints, lowering risk of fraud transactions, increasing employee efficiency, etc.). Meanwhile, among the first things to do to lower data costs is to optimize and update IT infrastructure and processes, which in turn leads to lower total IT costs.
"Is it possible to derive 10 times more value from data than we can now?" asked Luke Lonergan, co-founder and chief technology officer of Greenplum (now part of EMC). "Definitely yes, by using the data, which is usually neglected or not processed due to technical constraints."
Most often mentioned at the forum was Apache Hadoop technology, a distributed computing architecture capable of automatically replicating data to numerous nodes, as well as searching and analyzing the data across all of them. Based on Google MapReduce technology, Hadoop enables the analysis of petabytes of unstructured data distributed across a cluster not necessarily made up of high-end servers. The technology is used in many high-profile companies, including Facebook, Twitter, LinkedIn, Apple, Amazon and Yahoo. Not surprisingly, almost every company that had its executives present keynote speeches at the forum also stated its support for Hadoop in some form or other.
Sign up for Computerworld eNewsletters.