The burgeoning tech industry movement around big data is churning up a variety of new applications, but remains an evolving field that faces lingering challenges, judging from an event held Wednesday at a Microsoft research facility in Cambridge, Massachusetts.
Big data -- or Big Data as some in the industry call it -- refers to the ever-growing quantity and variety of data, particularly in unstructured form, being generated by websites, sensors, social media and other sources, as well as a growing array of technologies aimed at deriving insights from it.
Startup Recorded Future seeks to perform "temporal analysis" of information found in the public Web, said Christopher Ahlberg, CEO and co-founder, during a panel discussion at the Massachusetts Technology Leadership Council's Big Data Disruption event.
Recorded Future's system taps into some 70,000 sources, including news sites, trade publications, blogs and financial databases, sifting through the information and identifying references to individual entities and events, Ahlberg said. "We're ingesting 100,000 to 300,000 documents every hour."
Publication dates and other time-related bits of information are associated with the references, allowing them to be organized in a historical manner. Then they are analyzed for sentiment and tone.
Recorded Future's capabilities are used by defense agencies, financial services firms and competitive intelligence experts. The system can be used to pinpoint "broad signals," such as regarding the potential rise or fall of a stock, or "fine-grained alerting" of a specific type of news event, Ahlberg said.
Startup DataXu, which was also featured at Wednesday's event, offers analytics meant to help digital marketing executives. Its software analyzes data derived from tracking pixels embedded in online ads and builds predictive models showing which types of ad impressions are most like to lead to sales, said CTO Bill Simmons during another panel talk. DataXu's customers "want to change the minds of consumers and build a brand," he said. To do so, they may need to show an advertising message 100 times, but "where do you show it," he added.
DataXu is applying machine learning "to a very imbalanced problem," given that today, thousands of ad impressions may lead to only one person buying anything, Simmons added. His company also has to make its service more cost-effective than simply buying and running ads at saturation levels, he said.
Many speakers on Wednesday referred to their companies' use of one the most closely associated technologies with big data, Hadoop, an open-source programming framework that allows users to split up large processing jobs and run them in parallel across clusters of servers.
But Hadoop in its current form has serious limitations, said Michael Stonebraker, a Massachusetts Institute of Technology professor and founder of a number of database vendors. He was also the primary architect for the Ingres and Postgres database systems and is currently CTO of VoltDB.
Sign up for Computerworld eNewsletters.