Self-Driving Cars Learn to Read the Body Language of People on the Street

Articles

Game developers use motion-capture tech to teach Cruise’s autonomous vehicles to understand gestures

By Casey Weaver

A four-lane street narrows to two to accommodate workers repairing a large pothole. One worker holds a stop sign loosely in his left hand as he waves cars through with his right. Human drivers don’t think twice about whether to follow the gesture or the sign; they move smoothly forward without stopping.

This situation, however, would likely stop an autonomous vehicle in its tracks. It would understand the stop sign and how to react, but that hand gesture? That’s a lot more complicated.

And drivers, human and computer, daily face this and far more complex situations in which reading body language is the key. Consider a city street corner: A pedestrian, poised to cross with the light, stops to check her phone and waves a right-turning car forward. Another pedestrian lifts a hand up to wave to a friend across the way, but keeps moving. A human driver can decode these gestures with a glance.

Navigating such challenges safely and seamlessly, without interrupting the flow of traffic, requires that autonomous vehicles understand the common hand motions used to guide human drivers through unexpected situations, along with the gestures and body language of pedestrians going about their business. These are signals that humans react to without much thought, but they present a challenge for a computer system that’s still learning about the world around it.

Autonomous-vehicle developers around the world have been working for several years to teach self-driving cars to understand at least some basic hand gestures, initially focusing on signals from cyclists. Generally, developers rely on machine learning to improve vehicles’ abilities to identify real-world situations and understand how to deal with them. At Cruise we gather that data from our fleet of more than 200 self-driving cars. These vehicles have logged hundreds of thousands of miles every year for the past seven years; before the pandemic hit, they were on the road around the clock, taking breaks only to recharge (our cars are all-electric) and for regular maintenance. Our cars are learning fast because they are navigating the hilly streets of San Francisco, one of the most complex driving environments in the United States.

But we realized that our machine-learning models don’t always have enough training data because our cars don’t experience important gestures in the real world often enough. Our vehicles need to recognize each of these situations from different angles and distances and under different lighting conditions—a combination of constraints that produce a huge number of possibilities. It would take us years to gain enough information on these events if we relied only on the real-world experiences of our vehicles.

We at Cruise found a creative solution to the data gap: motion capture (or mo-cap) of human gestures, a technique that game developers use to create characters. Cruise has been hiring game developers—including me—for expertise in simulating detailed worlds, and some of us took on the challenge of capturing data to use in teaching our vehicles to understand gestures. [READ MORE]