Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

How to close the big data skills gap by training your IT staff

Thor Olavsrud | Oct. 3, 2013
Research firms paint a dire picture of a massive big data skills gap that will get worse over time. But companies like Persado, which uses big data to help marketers optimize their messages, are finding success training their existing staff in the new big data technologies.

"We can look at the different 'genes' of a marketing message and break it down and build it back up using mathematics, linguistics and technology to make it a marketing message that a marketer would be happy to bring to market and a consumer would be more likely to interact with and click on," says Matthew Novick, chief financial officer at Persado.

Achieving this requires continuous data collection and the ability to query that massive volume of data. Persado's business depends upon its data warehouse.

Persado's development team is focused on ensuring that the company's infrastructure is aligned with the needs of its data scientists, including regularly generating key performance indicator (KPI) reports, managing data from heterogeneous sources, preparing customized analyses and implementing specific statistical algorithms in Java based on reference implementations of R.

But in 2010, not long after Persado was born, the relational database management system (RDBMS) the company was using to power its data warehouse was becoming unwieldy. The development team, led by Christos Soulios, software team leader and application architect at Persado, began the process of migrating to a NoSQL environment. With its analytics and reporting needs becoming more sophisticated, it then needed to decouple the online analytical processing (OLAP) system into a technology stack of its own.

Soulios decided that Apache Hadoop was the right solution to collect, aggregate and process data from Persado's heterogeneous data sources, including MongoDB, MySQL config servers and Apache logs populated by structured and semi-structured files in Amazon Web Services (AWS) S3 buckets using libraries built on Apache Kafka and Apache ZooKeeper.

But those tasks were easier said than done. Persado didn't have the big data engineers on its staff that it needed to grow capabilities and scale its systems. Moreover, while Persado is a global company with headquarters in London and New York, its development team is based in Athens, Greece, making big data talent even harder to come by.

"Most of our development team and the resources are here in Athens, Greece," says Xinyu Huang, vice president of Engineering at Persado. "Unlike in the U.S., where big data is all over the place, in Greece it's still in the early stage."

Persado Looks to Train Its Teams to Use Big Data Tools
Without the ability to buy the talent it needed, Persado decided to create its own, Huang says. Soulios brought in Cloudera-specifically, Cloudera University. Soulios and the development team worked with Cloudera University's curriculum team to tailor a private, week-long onsite training course for Persado.

"We started benefiting from our decision to work with Cloudera almost right away, since no other company offers a full Data Analyst Training targeted at both developers and analysts, which was one of our biggest priorities," says Soulios, speaking of a course on Apache Hive and Apache Pig. "The intensive workshop also included the full Cloudera Developer Training for Apache Hadoop with the option of testing for the sought-after CCDH certification following the class."


Previous Page  1  2  3  Next Page 

Sign up for Computerworld eNewsletters.