Using an iPhone 4 and some pirate software they wrote, a team of researchers at Georgia Tech has managed to capture complete sentences from a nearby keyboard with up to 80 percent accuracy.
"The way we see this attack working is that you, the phone's owner, would request or be asked to download an innocuous-looking application, which doesn't ask you for the use of any suspicious phone sensors," team member Henry Carter, a PhD student in computer science and one of the paper's co-authors, explains. "Then the keyboard-detection malware is turned on, and the next time you place your phone next to the keyboard and start typing, it starts listening."
The team initially tried to use an iPhone 3GS in their experiments, but the results were too difficult to read.
"But then we tried an iPhone 4," says Georgia Tech School of Computer Science Assistant Professor Patrick Traynor, who is a member of the team along with Carter, Georgia Tech grad student Arunabh Verma, and MIT Lincoln Laboratory's Philip Marquardt.
"[The iPhone 4] has an added gyroscope to clean up the accelerometer noise [and] the results were much better. We believe that most smartphones made in the past two years are sophisticated enough to launch this attack."
Other researchers have attempted to steal keystrokes using a phone's microphone, but there are drawbacks to that method. For example, microphones have a sampling frequency of 44,000 vibrations per second. This is much more difficult to analyze than an accelerometer, which samples at just 100 times per second.
Also, handset makers typically restrict app access to phone microphones. When an app tries to grab hold of the mic, your phone will usually ask you if you want that to happen. Such protections aren't placed around accelerometers.
How it Works
The malware creates a model based on probability and keyboard pairs. It determines if a pair is on the left or right side of the keyboard, and then it determines the distance between the keys in the pair--are they far apart or close together? After analyzing that data for a series of pairs, it compares what it's hearing to a pre-loaded dictionary that classifies words based on left-right, near-far characteristics.
For example, the word "canoe" would consist of four pairs: C-A, A-N, N-O and O-E. The malware would interpret those strokes into Left-Left-Near, or LLN, LRF, RRF and RLF. When that data is compared to the entries in pre-loaded dictionary, a statistically probable result would be produced. In this case, "canoe."
For the technique to work reliably, words must be three letters or more. Working with a 58,000 word dictionary, the researchers found their word recovery rate was as high as 80 percent.
Sign up for Computerworld eNewsletters.