When the Curiosity rover arrived on Mars two months ago it was just about the best public relations exercise that NASA could have hoped for, short of actually landing a human on the red planet.
"They've really done a lot for the agency to make people think it's cool to work at NASA again," says senior computer scientist Chris Mattmann, who works at Jet Propulsion Laboratories (JPL), one of ten NASA centres.
Mattmann is speaking at the ITEX conference in Auckland on November 8 at the Viaduct Events Centre. He has worked for NASA since he was an undergraduate at the University of Southern California, when he took a part time academic position.
While not directly involved in the Curiosity mission, Mattmann has worked on Apache (the open source software foundation) data processing and information integration software projects that help power NASA's planetary data system -- the archive for all its space missions.
Mattmann became involved in Nutch, an open source search engine program, when studying for his doctorate. Nutch was created by Doug Cutting, who went on to found the big data system Hadoop.
Cutting was inspired to create Nutch because of a frustration with the 'black box' approach that Google had towards its search technology.
"He really felt that search should be more open and people should be able to tinker with things like ranking," says Mattmann.
Mattmann has used Nutch in his work at NASA, which he describes as "organising the information for scientists." At JPL he leads teams who build large scale data systems that manage hundreds of terabytes of information.
"Part of the stuff that I help to do is organise the information for scientists," he says.
"Organisation can range from the way that files are specifically named and the information that's captured in file names, to their organisation on discs, to the way the information is disseminated to the public."
Using Nutch within NASA, contributing code and helping people on the mailing lists has led to Mattmann becoming an Apache 'committer'. According to the Apache website that means he has access to the source code repository, and can help make strategic decisions around bug fixes and new software releases.
Mattmann explains that Nutch could originally only scale up to 100 million web pages, whereas the big search engines such as Yahoo and Google were in the four billion page range. So Cutting set about creating a new system, which he called Hadoop, allegedly after his childhood stuffed toy.
Inspired by Cutting's work and sense of humour, Mattmann has started his own project called Tika - a text analysis tool that detects and extracts metadata and structured text content from various documents using existing parser libraries. It is named after a soft toy belonging to the daughter of his partner in the project.
Sign up for Computerworld eNewsletters.