Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Hackathon geared toward the 'liberation' of data from public PDF documents

Chris Kanaracus | Jan. 17, 2014
The Sunlight Foundation and others will sponsor a three-day hackathon starting Friday.

"It's worth remembering that there's a multi-stage process here," Monash added. "For example, a PDF can be converted to text (and image) data, (Name, value) pairs can be extracted. Those can have their spelling corrected. Then the company names can be regularized. In real life, there can be tens of steps."

As for the hackathon's potential value, "a large fraction of the world's interesting information is on paper, or in paper-like formats such as PDF," he added. "Of course it's worthwhile to make all that more accessible."

 

 

Previous Page  1  2 

Sign up for Computerworld eNewsletters.