Are You Still Using Real Data to Train Your AI?

It may be counterintuitive. But some argue that the key to training AI systems that must work in messy real-world environments, such as self-driving cars and warehouse robots, is not, in fact, real-world data. Instead, some say, synthetic data is what will unlock the true potential of AI. Synthetic data is generated instead of collected, and the consultancy Gartner has estimated that 60 percent of data used to train AI systems will be synthetic. But its use is controversial, as questions remain about whether synthetic data can accurately mirror real-world data and prepare AI systems for real-world situations.

Nvidia has embraced the synthetic data trend, and is striving to be a leader in the young industry. In November, Nvidia founder and CEO Jensen Huang announced the launch of the Omniverse Replicator, which Nvidia describes as “an engine for generating synthetic data with ground truth for training AI networks.” To find out what that means, IEEE Spectrum spoke with Rev Lebaredian, vice president of simulation technology and Omniverse engineering at Nvidia.

Rev Lebaredian on…

The Omniverse Replicator is described as “a powerful synthetic data generation engine that produces physically simulated synthetic data for training neural networks.” Can you explain what that means, and especially what you mean by “physically simulated”?

Rev Lebaredian: Video games are essentially simulations of fantastic worlds. There are attempts to make the physics of games somewhat realistic: When you blow up a wall or a building, it crumbles. But for the most part, games aren’t trying to be truly physically accurate, because that’s computationally very expensive. So it’s always about: What approximations are you willing to do in order to make it tractable as a computing problem? A video game typically has to run on a small computer, like a console or even on a phone. So you have those severe constraints. The other thing with games is that they’re fantasy worlds and they’re meant to be fun, so real-world physics and accuracy is not necessarily a great thing.

With Omniverse, our goal is to do something that really hasn’t been done before in real-time world simulators. We’re trying to make a physically accurate simulation of the world. And when we say physically accurate, we mean all aspects of physics that are relevant. How things look in the physical world is the physics of how light interacts with matter, so we simulate that. We simulate how atoms interact with each other with rigid-body physics, soft-body physics, fluid dynamics, and whatever else is relevant. Because we believe that if you can simulate the real world closely enough, then you gain superpowers.

What kind of superpowers?

Lebaredian: First, you get teleportation. If I can take this room around me and represent it in a virtual world, now I can move my camera around in that world and teleport to any location. I can even put on a VR headset and feel like I’m inside it. And if I can synchronize the state of the real world with the virtual one, then there’s really no difference. I might have sensors on Mars that ingest the real world and send over a copy of that info to Earth in real time—or 8 minutes later or whatever it takes for the speed of light to travel from Mars. If I can reconstruct that world virtually and immerse myself in it, then effectively it’s like I’m teleporting to Mars 8 minutes ago.

And given some initial conditions about the state of the world, if you can simulate accurately enough, then you can potentially predict the future. Say I have the state of the world right now in this room and I’m holding this phone up. I can simulate what happens the moment I let go and it falls—and if my simulation is close enough, then I can predict how this phone is going to fall and hit the ground. What’s really cool about that is you can change the initial conditions and do some experiments. You can say, What can alternate futures look like? What if I reconfigure my factory or make different decisions about how I manipulate things in my environment? What would these different futures look like? And that allows you to do optimizations. You can find the best future. [READ MORE]