Credit: Daniel Oines
At technology company Graphiq, web-scraping bots were becoming more than just a nuisance. They were impacting its bottom line.
The company collects and interprets billions of data points from thousands of online data sources and turns them into visual graphs that website visitors can use for free. Scrapers were extracting data from hundreds of millions of these pages and building duplicate sites.
“We don’t want people to reuse [our data] commercially for free because there is a cost associated with creating that content,” says Ivan Bercovich, vice president of engineering. “It also undermines the value of our content” and steals traffic away from its site. Then there are the operational costs associated with blocking those web-scraping attempts. “We may have months where we block 5% to 6% of all requests,” Bercovich says. “For a site of our volume – about 30 million visitors a month -- that’s a lot of wasted requests.”
Web scraping is on the rise, especially attacks on businesses to steal intellectual property or competitive intelligence. Scraping attacks increased 17 percent in 2014, up for the fifth year in a row, according to ScrapeSentry, an anti-scraping service. Some 22 percent of all site visitors are considered to be scrapers, according to the report.
“One of the big things driving this up is that it’s just getting easier,” says Gus Cunningham, CEO at ScrapeSentry. Would-be scrapers can pay for commercial scraping services, write the code themselves using step-by-step online tutorials or even get free automated tools that do all the work.
Web-scraping tools are out in the open because web scraping is legal in some cases, such as gathering data for personal use. But it also creates a loophole for nefarious scrapers and a security hole for companies that don’t update their legal terms or their IT security processes.
“A lot of people are under the misconception that this kind of thing is considered ‘fair use,’ which is absolutely incorrect,” says Michael R. Overly, a partner and intellectual property lawyer focusing on technology at Foley & Lardner LLP in Los Angeles. “They think that because [the data provider’s] website doesn’t require any payment of fees there’s this exception. In general, if you (the scraper) are selling ads on your site, even if your end users don’t pay you any money, you’re getting revenues from ad displays. It’s a commercial purpose, so it’s highly unlikely it’s going to be fair use.”
Ticketmaster and Massachusetts Institute of Technology have successfully gone after scrapers of their data who claimed that their actions were fair use or didn’t violate copyright laws.
Sign up for Computerworld eNewsletters.