DeepMind Teaches AI Teamwork

AIs that were given a “social” drive and rewarded for influence learned to cooperate

The U.S. women’s soccer team has been showing a commanding World Cup performance in France. What would it take for a group of robotic players to show such skill (besides agility and large batteries)? For one, teamwork. But coordination in even simple games has been difficult for artificial intelligence to learn without explicit programming. New research takes a step in the right direction, showing that when virtual players are rewarded for social influence, cooperation can emerge.

Humans are driven not just by extrinsic motivations—for money, food, or sex—but also by intrinsic ones—for knowledge, competence, and connection. Research shows that giving robots and machine-learning algorithms intrinsic motivations, such as a sense of curiosity, can boost their performance on various tasks. In the new work, presented last week at the International Conference on Machine Learning, AIs were given a “social” drive.

“This is a truly fascinating article with a huge potential for expansions,” says Christian Guckelsberger, a computer scientist at Queen Mary University of London who studies AI and intrinsic motivation but was not involved in the work.

The virtual creatures played two games in which they collectively navigate a two-dimensional world to gather apples. In Harvest, apples grow faster when more apples are nearby, so when they’re all gone, they stop appearing. Coordinated restraint is required. (In soccer, if everyone on your team runs toward the ball, you’ll lose.) In Cleanup, apples stop growing if a nearby aquifer isn’t continuously cleaned. (A team needs both offense and defense.)

The creatures relied on a form of AI called reinforcement learning, in which an algorithm uses trial and error, and gains rewards for better performance. In this work, each creature earned rewards not only for collecting apples, but also for altering the choices of other players—whether that helped or hurt the others.

In one experiment, the creatures estimated their influence using something like humans’ “theory of mind”—the ability to understand others’ thoughts. Through observation, they learned to predict the behavior of others. They could then predict what neighbors would do in response to one action versus another, using counterfactual or “what-if” reasoning. If a particular action would change their neighbors’ behaviors more than other possible actions, it was deemed more influential and thus more desirable. [READ MORE]