Baidu’s AI Can Do Simultaneous Translation Between Any Two Languages

Baidu Research reveals a translation tool that keeps up by predicting the future

Would-be travelers of the galaxy, rejoice: The Chinese tech giant Baidu has invented a translation system that brings us one step closer to a software Babel fish.

For those unfamiliar with the Douglas Adams masterworks of science fiction, let me explain. The Babel fish is a slithery fictional creature that takes up residence in the ear canal of humans, tapping into their neural systems to provide instant translation of any language they hear.

In the real world, until now, we’ve had to make do with human and software interpreters that do their best to keep up. But the new AI-powered tool from Baidu Research, called STACL, could speed things up considerably. It uses a sophisticated type of natural language processing that lags only a few words behind, and keeps up by predicting the future.

“What’s remarkable is that it predicts and anticipates the words a speaker is about to say a few seconds in the future,” says Liang Huang, principal scientist of Baidu’s Silicon Valley AI Lab. “That’s a technique that human interpreters use all the time—and it’s critical for real-world applications of interpretation technology.”

The STACL (Simultaneous Translation with Anticipation and Controllable Latency) tool is comparable to the human interpreters who sit in booths during UN meetings. These humans have a tough job. As a dignitary speaks, the interpreters must simultaneously listen, mentally translate, and speak in another language, usually lagging only a few words behind. It’s such a difficult task that UN interpreters usually work in teams and take shifts of only 10 to 30 minutes.

A task requiring that kind of parallel processing—listening, translating, speaking—seems well suited for computers. But until now, it was too hard for them too. The best “real-time” translating systems still do what’s called consecutive translation, in which they wait for each sentence to conclude before rendering its equivalent in another language. These systems provide quite accurate translations, but they’re slow.

Huang tells IEEE Spectrum that the big challenge in simultaneous interpretation comes from word order differences in various languages. “In the UN, there’s a famous joke that an interpreter who’s translating from German to English will pause, and seem to get stuck,” he says. “If you ask why, they say, ‘I’m waiting for the German verb.’” In English, the verb comes early in the sentence, he explains, while in German it comes at the very end of the sentence.

STACL gets around that problem by predicting the verb to come, based on all the sentences it has seen in the past. For their current paper, the Baidu researchers trained STACL on newswire articles, where the same story appeared in multiple languages. As a result, it’s good at making predictions about sentences dealing with international politics.

Huang gives an example of a Chinese sentence, which would be most directly translated as “Xi Jinping French president visit expresses appreciation.” STACL, however, would guess from the beginning of the sentence that the visit would go well, and translates it into English as “Xi Jinping expresses appreciation for the French president’s visit.” [READ MORE]