Teaching a computer to master the game of Go has been a grail hotly pursued by computer scientists because, unlike chess, it cannot be solved by brute force logic alone.
"Go has been played for more than 3,000 years. It’s considered more an art form than a game because, with more possible positions on the board than there are atoms in the universe, human players rely more on intuition than logic," DeepMind founder Demis Hassabis told a rapt audience at New Scientist Live on Friday.
"Even if you took all the world’s computers and ran it for a million years you could not brute force solve the game of Go. That is why computer scientists are so obsessed with Go."
Essentially, by teaching a computer to play Go, one would be teaching a computer intuition.
"It was also thought that it was impossible to write an evaluation function – the function that tells the system if one side or the other is winning and by how much," he said.
In making AlphaGo, the team at DeepMind could not rely on the solutions reached by IBM’s chess-playing computer Deep Blue as the evaluation programmes for chess can be hand-crafted.
"Computer engineers talk to teams of chess grandmasters and try to distil the tactics and knowledge into an evaluation function. Go, however, is so esoteric that it doesn’t lend itself to a rules-based approach," Hassabis explained.
When you ask a Go player why they chose to make a move, unlike chess players they won’t explain the plan they had in mind. Instead, they will often say that the move "felt right" because master Go players rely mostly on intuition.
"It’s much more how an artist would approach their art. Rather than how a mathematician approaches a problem – which is how the game of chess is played," he continued.
So rather than hand-coding the game of Go based on a set of rules, Hassabis and his team set about teaching the computer to play the game.
First, the team built a neural network, called the policy network, and showed it around 100,000 games of Go with human amateurs playing against each other. The network was trained to mimic the behaviour of these games and to calculate the probability that a human would make a move.
"So, instead of searching through all 200 possible moves you could make in each turn of Go, the computer would look at just the top three most probable moves," Hassabis said. "We then made the system play against itself millions of times. Each time it won or lost it would adjust its neural network patterns to determine if it was more or less likely to play those moves the next time."
As this first network improved incrementally, the team gained enough data to train a second network that would act as the evaluation function deemed impossible to create.
"It is impossible to hand-write this. So we didn’t. We fed it positions from games we knew the outcomes of and asked it to predict. In many was the network learnt for itself how to tell if it was going to win," Hassabis recounted.
DeepMind then pitched AlphaGo against European Go champion, Fan Hui in 2015. When the AI won against him, they then moved on to challenge the reigning world champion, Lee Sedol.
The 37th move
While most know that AlphaGo won its tournament against Lee Sedol, not many are aware that it wasn’t the AI’s victory that gave its creators and Go players around the world chills. It was the 37th move it made in the second game of the match.
"It was in the second game when AlphaGo played a move that was unthinkable for a human professional to play," Hassabis explained.
In Go, there are two, important, critical lines. There is the third line from the edge, where playing a piece states the player’s intent to surround the edge of the board and take that territory. Then, there is the fourth line, which states an intent to take influence and power from the centre that will translate into territory elsewhere on the board.
"For the past thousand years of recorded Go history, the trade-off between playing the third and fourth lines were considered equal. AlphaGo played on the fifth line," Hassabis said.
It was a move so unthinkable that several game commentators double-checked it. They thought someone had misreported the move, that there had been a mistake.
Of course, in Go, as in any other game, a move is only elegant or beautiful if it wins and that 37th move ended up affecting the battle at the bottom-left corner of the board. And 100 moves or so later, the game spun across the board and that 37th stone was in the perfect position to pivotally affect the game.
"The game was telling us that in 3,000 years of playing the game, human beings had vastly underestimated the value of central power," Hassabis said.
Was it intuition?
After the game, Hassabis and his team interrogated AlphaGo and found that the game had given that move a less than one in a thousand chance that a professional player would make that move.
"It knew it was unexpected, but it played it anyway," Hassabis said.
This is an AI that had been trained to play the way humans play. To favour moves most likely made by human players over other moves. And yet, it innovated away from its programming to redefine the game.
"AlphaGo, arguably, demonstrated both intuition and creativity in that move. It synthesised knowledge to create an original idea in the service of a goal, and it did so using knowledge it had gained by experience but wasn’t directly expressable," Hassabis said.
Humanity redefined
But the AlphaGo’s story doesn’t end there. In the fourth match, Lee Sedol made a move that confused the AI and won him the game.
That 78th move is regarded with as much awe by the Go-playing community as AlphaGo’s 37th move.
By playing against the AI, Lee Sedol had intuitively learnt something that caused him to evolve his game.
Ke-Jie, the world’s current reigning Go champion too went on to challenge AlphaGo this year. He too lost against the AI but went on a 22-game winning streak because he was forced to rethink the game of Go.He has since said that humans and computers will usher in a new era of Go.
And this is the root of Hassabis’ dream for humanity and AI: "The power of humans and machines collaborating together to achieve amazing things. Where human ingenuity, combined with AI, will unlock our potential."
Already DeepMind is working on areas where AlphaGo’s techniques will apply including drug development and the discovery of new materials.
"We are working with the NHS on improving diagnostics. On education, on smartphones… every aspect of our lives will be touched by AI and its systems," he said.
DeepMind and parent company Alphabet have used a variant of AlphaGo to manage the cooling of one of Google’s data centres.
"By controlling hundreds of variables, the AI has saved 40% of the power that the cooling systems used. So we’ve been thinking. Why just data centres? Why not optimise national grids? We’re working on saving 10-15% of the power a while country uses now."
Humanity has reached a point of knowledge where the next things we want to know are so complex that even teams of the smartest humans are struggling. Problems such as the climate, macroeconomics and particle physics are regarded as almost too complex to solve.
"AI could turbocharge our efforts in all these areas," Hassabis concluded. "This is why I spend my life working on AI and have dedicated a team to it."