During the past nine months, an Nvidia engineering team built a self-driving car with one camera, one Drive-PX embedded computer and only 72 hours of training data. Nvidia published an academic preprint of the results of the DAVE2 project entitled End to End Learning for Self-Driving Cars on arXiv.org hosted by the Cornell Research Library.
The Nvidia project called DAVE2 is named after a 10-year-old Defense Advanced Research Projects Agency (DARPA) project known as DARPA Autonomous Vehicle (DAVE). Although neural networks and autonomous vehicles seem like a just-invented-now technology, researchers such as Google’s Geoffrey Hinton, Facebook’s Yann Lecune and the University of Montreal’s Yoshua Bengio have collaboratively researched this branch of artificial intelligence for more than two decades. And the DARPA DAVE project application of neural network-based autonomous vehicles was preceded by the ALVINN project developed at Carnegie Mellon in 1989. What has changed is GPUs have made building on their research economically feasible.
Neural networks and image recognition applications such as self-driving cars have exploded recently for two reasons. First, Graphical Processing Units (GPU) used to render graphics in mobile phones became powerful and inexpensive. GPUs densely packed onto board-level supercomputers are very good at solving massively parallel neural network problems and are inexpensive enough for every AI researcher and software developer to buy. Second, large, labeled image datasets have become available to train massively parallel neural networks implemented on GPUs to see and perceive the world of objects captured by cameras.
Mapping human driving patterns
The Nvidia team trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. Nvidia’s breakthrough is the autonomous vehicle automatically taught itself by watching how a human drove, the internal representations of the processing steps of seeing the road ahead and steering the autonomous vehicle without explicitly training it to detect features such as roads and lanes.
Although in operation the system uses one camera and one Drive-PX embedded computer, the training system used three cameras and two computers to acquire three-dimensional video images and steering angels from the vehicle driven by a human that were used to train the system to see and drive.
Nvidia monitored changes in the steering angle as the training signal that mapped the human driving patterns into bitmap images recorded by the cameras. The system learned using the CNN to create the internal representations of the processing steps of driving, such as detecting useful road features like lanes, cars and road outlines.
The open-source machine learning system Torch 7 was used to render the learning into the processing steps that autonomously perceived the road, other vehicles and obstacles to steer the test vehicles. The actual training occurred at 10 frames per second (fps) because there wasn’t enough differentiation in adjacent frames at 30 fps to make learning valuable. The test vehicles were a 2016 Lincoln MKZ and a 2013 Ford Focus.
Sign up for Computerworld eNewsletters.