Google's AlphaGo Levels Up From Board Games to Power Grids

By redesigning how its AlphaGo AI learns, Google has made a system that can tackle much more than just boardgames.
Demis Hassabis.
Demis Hassabis.

WUZHEN, CHINA — When researchers inside Google's DeepMind artificial intelligence lab first built AlphaGo—the machine that plays the ancient game of Go better than any human—they needed human help. The machine learned to play this exceedingly complex game by analyzing about 30 million moves by professional Go players. Then, once AlphaGo could mimic human play, it reached an even higher level by playing game after game against itself, closely tracking the results of each move. In the end, the machine was good enough to beat the Korean grandmaster Lee Sedol, the best player of the last decade.

But then, about a year ago, DeepMind redesigned the system. In essence, they built the new AlphaGo without help from human moves. They trained it entirely from games where the machine plays against itself—part of a continuing progression toward AI techniques that truly learn on their own. "AlphaGo has become its own teacher," says David Silver, the project's lead researcher.

Self-Taught

Silver unveiled the new design this week in Wuzhen, China, where AlphaGo is playing the current number one player in the world, 19-year-old grandmaster Ke Jie. Demis Hassabis, the founder and CEO of DeepMind, says that because the system can do more learning on its own, with less existing data, it's better suited to learning a wide range of tasks beyond Go. The system could help optimize power grids, he says, or streamline shipping routes, or refine scientific research.

Indeed, the techniques that underpin AlphaGo—known as deep reinforcement learning—have become increasingly influential across the world of AI research. Researchers inside Google Brain, the company's other AI lab, now use reinforcement learning in training robotic arms to open doors and pick up objects on their own. Uber uses the technique in teaching AI agents to play driving games like Grand Theft Auto—a stepping stone to systems that handle real cars on real roads. And much like DeepMind, others at OpenAI, the lab bootstrapped by Tesla founder Elon Musk, are applying to the same ideas to a wide range of games and simulations.

"What we're going to move towards is: Can systems learn more on their own? Can they interact with their environment in some way and learn how to do well in that environment?" says Jeff Dean, who oversees the work at Google Brain.

If researchers can build the right simulation and AI agents spend enough time training inside it, many researchers believe, they can learn to handle almost any task. This includes physical navigation, but also intellectual. Given the right simulation, Hassabis says, an agent could learn to understand the natural way we humans talk—something DeepMind is already exploring.

The end game is a long way off. But AlphaGo shows the very real progress toward such lofty goals.

Noah Sheldon for WIRED
The Master

The original AlphaGo relied on two deep neural networks, complex pattern-recognition systems that can learn by analyzing vast amounts of data. Initially, both learned by analyzing that corpus of 30 million human moves. The new AlphaGo relies on a pair of similar neural networks, but they train from the beginning on games that AlphaGo plays against itself.

This new incarnation of the system still owes a debt to human players. It trained on moves by the original version of AlphaGo, which trained on human moves. But Hassabis says that the current architecture could potential learn from random play—without no help from humans at any point in the process. And even today, the system can continue to improve without help from additional human play.

That continued progress was evident as far back as January, when AlphaGo, under the pseudonym "Master," played several grandmasters over the internet. It won all sixty of its games. And on Tuesday, in Wuzhen, the machine topped Ke Jie in the first round of their three-game match. It's clear that the Chinese grandmaster has little chance of topping the machine's new incarnation.

Hassabis and team also believe they've fixed a notable flaw in the system that Lee Sedol exposed when he took one of the five games in Seoul. And he says that the new algorithms are significantly more efficient than those that underpinned the original incarnation of AlphaGo. The DeepMind team can train AlphaGo in weeks rather than months, and during a match like the one in Wuzhen, the system can run on just one of the new TPU chip boards that Google built specifically to run this kind of machine learning software. In other words, it needs only about a tenth of the processing power used by the original incarnation of AlphaGo.

On the Grid

But Go isn't the only aim. After building what Hassabis calls a more general system, DeepMind is already pushing the technology into new places. According to Hassabis, the lab is beginning to work with National Grid UK, aiming to use AlphaGo's underlying infrastructure as a way of improving the efficiency of the British power grid.

DeepMind has already done something similar with the computer data centers that underpin Google's online empire. In essence, Hassabis and team have created a simulation of these data centers where the AI can learn to more efficiently control fans and other hardware, much as AlphaGo learns to more effectively play the game of Go. Only now, the scale, and the stakes, are so much greater.