How Google’s DeepMind will train its AI inside Unity’s video game worlds

Unity’s AI boss Danny Lange explains how the Google sibling will use reinforcement learning and virtual worlds to “evolve” smarter algorithms.

How Google’s DeepMind will train its AI inside Unity’s video game worlds
[Animation: courtesy of Unity Technologies]

Intelligent design versus evolution isn’t just a divide in people’s worldviews. It’s also been a divide in the artificial intelligence community. Until just a few years ago, AI was mainly about humans coding smart algorithms–from bank fraud detectors to autonomous video game characters. But with massive server farms, machine learning AI can run wild in the boundless fields of data that society generates–and often intuit algorithms faster and better than humans could program them. The next stage is to let algorithms loose in environments that look and act like actual fields, or houses, or highways, or anything else that can be simulated in 3D gaming environments.


DeepMind, part of Google parent company Alphabet, is going big on virtual world AI training through a deal with game-making software provider Unity Technologies (which powers games like Monument Valley and Pokémon Go). DeepMind will run the software at a giant scale to train algorithms in physics-realistic environments–part of a growing trend in AI. Game engines like Unity or Unreal provide customizable settings for advanced AI techniques such as reinforcement learning (a kind of machine learning), in which an algorithm pursues a goal through trial and error until it’s been mastered.

“Games are in many, many ways . . . much closer to nature than people think,” says Danny Lange, Unity’s VP of machine learning and AI. “You get the visual, the physics, the cognitive, and . . . the social aspect–the interaction.” These all put evolutionary pressures on algorithms, just as nature does on living things, he says.

DeepMind did not comment for this story beyond a press release. Unity wouldn’t reveal many details of the deal–such as how much money is changing hands. The tie-up follows a deal announced in June between Unity and DeepMind sibling company Google Cloud to provide services for online game developers.

Starting with no knowledge how to move, this stick figure eventually learned to run–albeit awkwardly. [Image: courtesy of Unity Technologies]

Video vérité

As a cute example of how games train AI, Lange shows me a virtual dog that learned how to fetch. All that the algorithm driving it knew was that it had to get the stick: Progress toward the goal triggers numerical rewards that encourage more of the successful behavior. In the beginning, the poor dog didn’t even know how to use its legs. But it kept trying, bound within the parameters of simulated physiology and the laws of physics, until that good boy finally got it.

None of this technology is new, per se. Researchers and companies have been using game engines for some time to train AI. Nvidia, for instance, has created a virtual robot-training system called Isaac that runs on Unity rival Unreal Engine, the system underlying the blockbuster Fortnight.

Self-driving car algorithms learn from traveling billions of miles of accurately simulated roads–to bolster what they can learn from covering far less territory in real-world driving. Lange knows this well, having built Uber’s machine learning platform before going to Unity in December 2016.


But Lange is quite expansive in his vision of what reinforcement learning and other AI can achieve in game worlds. Beyond robots, examples include using virtual people to develop more livable building designs. “You can actually test a thousand different designs on a thousand different virtual families living in that house,” he says.

Related:DeepMind AI taught itself to navigate a maze like a mammal

Usually reinforcement learning is about pushing virtual agents to achieve a discrete task, for as long as it takes. But in this example, the amount of difficulty the characters have learning their way around could help designers evaluate how intuitive (or unintuitive) their floor plans are.


Going even grander, simulated physics could allow virtual chemistry experiments, in which software conducts far more experiments with virtual chemicals than humans can with real ones, says Lange. That could at least narrow down the candidates for real-world testing. Lange predicts that AI based on gaming engines may be able to achieve this in about five years. (Incidentally, that’s also the timeline some advocates of quantum computing reckon for it to start simulating complex chemistry.)

As players of sprawling, open-world titles know, games are about more than physics. Grand Theft Auto simulates how rubber tires grip asphalt, but it also simulates interactions between colorful characters. “It’s an emerging area,” says Lange, of modeling social dynamics. “You simulate multiple agents and they interact with each other. They invent what they say.”

That provides insights into how crowds behave, for instance. As a potentially practical–but still theoretical–example, he describes the ability to model how chatter affects stock prices. “One guy says the stock is going to go up, another guy says this stock is going to go down,” says Lange. “How do they influence the crowd?”

[Image: courtesy of Unity Technologies]

Training 1,000 dogs for 24 hours at 10,000 fps

In nature, animals must learn to crawl before they can walk, or run, or buy stocks. Reinforcement learning follows the same stepwise progression. “You would think that you just take the hard problem and throw a big computer at it, but that does not lead to a good result,” says Lange.

Instead challenges have to be broken into increasingly difficult tasks, known as curriculum learning. An algorithm masters one challenge, and uses what it’s learned to master the next one, and the next one. Having learned to fetch, Unity’s virtual dog quickly learned how to jump through a hoop, says Lange.

That’s also how games work. Players level up–squaring off against bigger and bigger “bosses” as they go. (But here, the “players” are algorithms.) Another great thing about game engines: They can generate levels on their own. The signature example is the 2016 game No Man’s Sky. Using Hello Games’s own in-house engine, the title can generate, landscape, and populate 18,446,744,073,709,551,616 unique planets.


Related: The AI guru behind Amazon, Uber, and Unity explains what AI really is

I ask Lange if he enjoys playing video games, and he hesitates a bit. “Uh, I play a lot of AI games,” he says. “We are really super excited about this relationship [with DeepMind] because it really shows that Unity has so much more than pure gaming to it. It’s more than people playing Pokémon Go,” he says.

DeepMind, in fact, has been using its own game-engine software for some time. In a prepared quote, DeepMind cofounder and CEO Demis Hassabis says, “Games and simulations have been a core part of DeepMind’s research program from the very beginning and this approach has already led to significant breakthroughs in AI research.”


Perhaps his former life as a game designer made Hassabis receptive to collaborating with a maker of consumer games instead of doing everything in-house. Popular engines like Unity and Unreal are commercially driven to develop ever better simulations, and they benefit from huge developer communities.

A year ago, Unity also extended AI development to the public with its ML-Agents tool kit–open-source software linking its game engine to machine learning programs. Participants include AI researchers and some “notable game developers,” says Lange. Whatever anyone develops is available to everyone–including Unity, which wants to employ AI to evolve better “non-playable characters” that human players face in games.

The deal with DeepMind is more than just selling software licenses, according to Lange, who calls it a collaboration.


“When you build a gaming engine, it runs fast in iOS and runs fast on Android and it runs fast on your Xbox,” he says. But running Unity on thousands or even tens of thousands of servers to drive deep learning is a very different task, which requires tweaking and configuring Unity for those demands, says Lange.

And only at massive scale does deep learning offer a payoff to develop. “If I train one dog for five minutes, I’m not really going to get there,” says Lange. “If I train a thousand dogs for 24 hours at 10,000 frames per second, then all these dogs are doing all kinds of crazy things.” And eventually, from all those attempts, one dog ends up jumping through a hoop.


About the author

Sean Captain is a business, technology, and science journalist based in North Carolina. Follow him on Twitter @seancaptain.


Call for Most Innovative Companies entries! Apply now.

500+ winners will be featured on Final deadline: 9/23.