By adding randomness to a relatively simple simulation, OpenAI’s robot hand learned to perform complex in-hand manipulation

In-hand manipulation is one of those things that’s fairly high on the list of “skills that are effortless for humans but extraordinarily difficult for robots.” Without even really thinking about it, we’re able to adaptively coordinate four fingers and a thumb with our palm and friction and gravity to move things around in one hand without using our other hand—you’ve probably done this a handful (heh) of times today already, just with your cellphone.

It takes us humans years of practice to figure out how to do in-hand manipulation robustly, but robots don’t have that kind of time. Learning through practice and experience is still the way to go for complex tasks like this, and the challenge is finding a way to learn faster and more efficiently than just giving a robot hand something to manipulate over and over until it learns what works and what doesn’t, which would probably take about a hundred years.

Rather than wait a hundred years, researchers at OpenAI have used reinforcement learning to train a convolutional neural network to control a five-fingered Shadow hand to manipulate objects, all in just 50 hours. They’ve managed this by doing it in simulation, a technique that is notoriously “doomed to succeed,” but by carefully randomizing the simulation to better match real-world variability, a real Shadow hand was able to successfully perform in-hand manipulation on real objects without any retraining at all.

Ideally, all robots would be trained in simulation, because simulation is something that can be scaled without having to build more physical robots. Want to train a bajillion robots for a bajillion hours in one bajillionth of a second? You can do it, with enough computing power. But try to do that in the real world, and the fact that nobody knows exactly how much a bajillion is will be the least of your problems.

The issue with using simulation to train robots is that the real world is impossible to precisely simulate, and it’s even more impossible to precisely simulate when it comes to thorny little things like friction and compliance and object-object interaction. So the accepted state of things has always been that simulation is nice, but that there’s a big scary step between simulation success and real world success that somewhat diminishes the value of the simulation in the first place. It doesn’t help that the things it would be really helpful to simulate (like in-hand manipulation) are also the things that tend to be the most difficult to simulate accurately, because of how physically finicky they are.

A common approach to this problem is to try to make the simulation as accurate as possible, in the hope that it’ll be close enough to the real world that you’ll be able to get some useful behaviors out of it. OpenAI is instead making accuracy secondary to variability, giving its moderately realistic simulations a bunch of slightly different tweaks with the goal of making the behaviors that they train robust enough to function outside of simulation as well. [READ MORE]