With all that activity, Etsy generates enormous quantities of data. Every interaction with the site -- a page view, a click, a pop-up -- is collected. "We're doing about 175 million events per day, which amounts to roughly 75 gigabits of event data that we store per day," Mardenfeld says.
Democratization of data
Data analysis is everywhere at Etsy; it's not the domain of any single group. "We try to have it be a work in progress, be part of the culture, and be embedded throughout different parts of the site and parts of the company," Thomas says.
The lack of centralization is deliberate. "There's less central control, which can mean more opportunity for people to bite off different parts of the data we have and use it. That can lead to really positive things, and it can also be a challenge in terms of making sure people understand the data and are making decisions based on correct interpretations," says Thomas, who acts as an ambassador between Etsy's data teams and the rest of the company.
"It might be simpler if there were one monolithic group that came down as the source of data truth, but it would create a bottleneck and a silo that wouldn't necessarily help us move quickly and use data to inform what we're doing."
Internally, data analysis is incorporated throughout the product life cycle, helping development teams to design and prioritize site changes.
"The engineers and product people who are building features on the site are doing experimentation, and a majority of features are A/B tested, so everybody in those groups, to some extent, uses big data in order to analyze those things," McKinley says.
"We also use data to decide what we're going to do going forward, working with our product road map," Mardenfeld adds. "We use it all over. We use data to make sure that our products are behaving the way we're expecting them to. We use data to understand and gain insight into how people are using the site, and we use it to iterate as well. It's part of all these different steps."
Sharing the data
With such a massive volume of merchandise for sale, it's a constant challenge to try to make sellers' items more discoverable by shoppers. Etsy uses big data to power the content that's being shown to site visitors via its product recommendation system, for example, and search ranking. The clickstream data is processed in real time and used to deliver relevant content to a user.
At the feature level, big data powers Etsy's Taste Test, which takes users through a product quiz of sorts before recommending products they might like, and recommendations for visitors who come to Etsy via Google Product Listing Ads.
Sign up for Computerworld eNewsletters.