Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Data scientists: Question the integrity of your data

Rebecca Merrett (CIO) | Aug. 5, 2015
If there’s one lesson website traffic data can teach you, it’s that information is not always genuine. Yet, companies still base major decisions on this type of data without questioning its integrity.

If there's one lesson website traffic data can teach you, it's that information is not always genuine. Yet, companies still base major decisions on this type of data without questioning its integrity.

At ADMA's Advancing Analytics in Sydney this week, Claudia Perlich, chief scientist of Dstillery, a marketing technology company, spoke about the importance of filtering out noisy or artificial data that can skew an analysis.

"Big data is killing your metrics," she said, pointing to the large portion of bot traffic on websites.

"If the metrics are not really well aligned with what you are truly interested in, they can find you a lot of clicking and a lot of homepage visits, but these are not the people who will buy the product afterwards because they saw the ad."

Predictive models that look at which users go to some brands' home pages, for example, are open to being completely flawed if data integrity is not called into question, she said.

"It turns out it is much easier to predict bots than real people. People write apps that skim advertising, so a model can very quickly pick up what that traffic pattern of bots was; it can predict very, very well who would go to these brands' homepages as long as there was bot traffic there."

The predictive model in this case will deliver accurate results when testing its predictions. However, that doesn't bring marketers or the business closer to reaching its objective of real human ad conversions, Perlich said.

In addition, any model that looks too good to be true probably is, and data scientists need not let themselves be blinded by shiny models that beam perfect performance.

"This is [how we found] the [fake data] issue, because the performance of our model doubled to [help us] predict who goes to a brand's home page. Now, a double in performance in predictive modelling usually takes a lot of work. But I didn't do anything, it just happened," Perlich said, on her experience with website traffic data.

"At that point you start to wonder what the hell is going on? We started seeing a huge amount of traffic of basically cookies circling from sequences of sites over and over again and in between hitting some of the brands that we were running campaigns for.

"There is a network of sites that are created just so that bots can 'buy' ads on those sites."

Click-through data can also be misleading. Perlich gave the example of ads appearing on the Flashlight app resulting in high chance of click-through, not because people were interested in the ad, but because fumbling in the dark with a torch app usually results in accidental clicks.

 

1  2  Next Page 

Sign up for Computerworld eNewsletters.