"The way we use the data allows us to differentiate between user groups and helps optimize the experience for buyers and for the crafters and small businesses that are trying to sell their goods to people around the world," Thomas says.
Externally, Etsy prepares analytic products for shop owners that allow each seller to see how they're doing. Shop Stats, an analytics system for sellers, shows what people were searching for, how they navigated to the shop, and how many purchases were made, for instance. In the big picture, Etsy publishes a monthly report that shares overall business metrics such as total goods sold by the Etsy community, number of items listed, site membership and page views.
Hadoop and critical thinkers
Making data usable throughout the company requires a combination of people and technology.
On the people front, working with data at Etsy requires a blend of business, analytics and engineering skills.
"When we hire people who we intend to be analysts, we look for qualities like critical thinking skills, skepticism, and an ability to think statistically. We expect that we'll be able to train them in whatever programming language they'll need to do their day-to-day job," McKinley says. "Typically we're hiring engineers and training them in basic statistics, or we're hiring people in statistics and training them in basic engineering. The people who are awesome at both of those things are very few and far between."
On the technology front, Etsy uses a wide range of tools. The company collects transactional data, which is anything to do with products, listings and purchases, as well as behavioral data, which includes any kind of interaction that people have while they're browsing the site. As site traffic has grown, so have Etsy's analytic capabilities. The e-commerce site has beefed up its event-logging platforms, its analytics infrastructure and its presentation tools.
Ensuring data consistency and accuracy is one of the biggest challenges. "We're making decisions with data, yet it's very hard to actually make sure that the data is correct," says Mardenfeld, who's focused on building the infrastructure that powers Etsy's big data projects. "We put a lot of work into error checking, making sure our collection pipelines are working. Data is a little bit of a different beast. You can't just get your code to compile. You have to compile and also make sure that it makes sense. I think that's the hardest part about this."
In terms of platforms and tooling, Hadoop plays a key role in storing and processing the data. Etsy runs dozens of workflows each night on Amazon's cloud-based Elastic MapReduce service. Rather than keeping a single cluster running continuously, Etsy brings up a new cluster for each job so it can tailor the number and types of instances to the workload.
Sign up for Computerworld eNewsletters.