"It was such a large base of users, it suggests that what they were able to find wasn't just a fluke because they had a small amount of data," Golbeck said.
"It may not be that the specific correlations would work on a large population. But it certainly seems true that the general methodology to make the connections between personal attributes and the way we behave online is really promising," Golbeck said.
Facebook serves well for such analysis, but such digital footprints can be found elsewhere online too, Kosinski said. Public forums such as Twitter, usage statistics of what type of music or movies are streamed, or even a company's Web server logs, could provide a basis for further user analysis.
Online companies already scrutinize such footprints, though the types of conclusions they draw tend to be limited, Kosinski said, adding that further data about people could identify more general traits.
For example, the publisher of a site about Salvador Dali already knows that frequent visitors are art lovers. But the additional insight that they may also be more open to new ideas than the population at large could help guide site development decisions, such as re-designing it in a more experimental style.
Of course, the use of such data can raise privacy concerns. Kosinski said Facebook could use such approaches to infer the personalities of its users, if it hasn't already.
And while not all companies have the range of user data that Facebook has, it would be pretty easy to cobble together different sources of public and proprietary data to form the basis of a personality analysis that goes beyond the profiling that online advertisers have done for years, Golbeck said.
The work should raise awareness of the kind of insights that companies can glean about their users. "There is some really scary stuff that could happen with this too," Golbeck said. For instance, your credit scores or insurance rates could be affected by traits that providers infer about you, rightly or wrongly.
Just as behavioral analysis could be used for unscrupulous ends, it could also be used for good -- to adjust applications to better fit users' needs.
"This potentially has promise to uncover more of those serendipitous things, where I see stuff that I would never be able to find myself, but because the system knows so much about me and has access to what a billion other people are doing, it can look for tiny little things in the data I couldn't find myself," Golbeck said.
The researchers saw no major barriers to scaling up their algorithms to identify personality traits for billions of users, without too much computational heft. It could even be done in near real time, providing a personality profile in milliseconds.
"You can run predictions for very huge populations in no time whatsoever, with very little cost," Kosinski said.
Sign up for Computerworld eNewsletters.