The Big Nine
Page 5
In Go, the traditional grid size is 19 × 19 squares. Unlike other games, such as chess, Go stones are all equally weighted. Between the two players, there are 181 black and 180 white pieces (black always goes first, hence the uneven number). In chess—which uses pieces that have different strengths—the white player has 20 possible moves, and then black has 20 possible moves. After the first play in chess, there are 400 possible board positions. But in Go, there are 361 possible opening plays, one at every intersection of what’s essentially a completely blank grid. After the first round of moves by each player, there are now 128,960 possible moves. Altogether, there are 10170 possible board configurations—for context, that’s more than all of the atoms in the known universe. With so many conceivable positions and potential moves, there is no set playbook like there is for checkers and chess. Instead, Go masters rely on scenarios: If the opponent plays on a particular point, then what are the possible, plausible, and probable outcomes given her personality, her patience, and her overall state of mind?
Like chess, Go is a deterministic perfect information game, where there is no hidden or obvious element of chance. To win, players have to keep their emotions balanced, and they must become masters in the art of human subtlety. In chess, it is possible to calculate a player’s likely future moves; a rook can only move vertically or horizontally across the board. That limits the potential moves. Therefore, it’s easier to understand who is winning a chess game well before any pieces have been captured or a king is put in checkmate. That isn’t the case in Go. Sometimes it takes a high-ranking Go master to even figure out what’s happening in a game and determine who’s winning at a particular moment. Go’s complexity is what’s made the game a favorite among emperors, mathematicians, and physicists—and the reason why AI researchers have always been fascinated with teaching machines to play Go.
Go always proved a significant challenge for AI researchers. While a computer could be programmed to know the rules, what about rules to understand the human characteristics of the opponent? No one had ever built an algorithm strong enough to deal with the game’s wild complexities. In 1971, an early program created by computer scientist Jon Ryder worked from a technical point of view, but it lost to a novice. In 1987, a stronger computer program called Nemesis competed against a human for the first time in a live tournament. By 1994, the program known as Go Intellect had proven itself a competent player. But even with the advantage of a significant handicap, it still lost all three of its games—against kids. In all of these cases, the computers would make incomprehensible moves, or they’d play too aggressively, or they’d miscalculate their opponent’s posture.
Sometime in the middle of all that work were a handful of researchers who, once again, were workshopping neural networks, an idea championed by Marvin Minsky and Frank Rosenblatt during the initial Dartmouth meeting. Cognitive scientist Geoff Hinton and computer scientists Yann Lecun and Yoshua Bengio each believed that neural net–based systems would not only have serious practical applications—like automatic fraud detection for credit cards and automatic optical character recognition for reading documents and checks—but that it would become the basis for what artificial intelligence would become.
It was Hinton, a professor at the University of Toronto, who imagined a new kind of neural net, one made up of multiple layers that each extracted different information until it recognized what it was looking for. The only way to get that kind of knowledge into an AI system, he thought, was to develop learning algorithms that allowed computers to learn on their own. Rather than teaching them to perform a single narrow task really well, the networks would be built to train themselves.
These new “deep” neural networks (DNNs) would require a more advanced kind of machine learning—“deep learning”—to train computers to perform humanlike tasks but with less (or even without) human supervision. One immediate benefit: scale. In a neural network, a few neurons make a few choices—but the number of possible choices could rise exponentially with more layers. Put another way: humans learn individually, but humanity learns collectively. Imagine a massive deep neural net, learning as a unified whole—with the possibility to increase speed, efficiency, and cost savings over time.
Another benefit was turning these systems loose to learn on their own, without being limited by our human cognitive abilities and imagination. The human brain has metabolic and chemical thresholds, which limit the processing power of the wet computers inside our heads. We can’t evolve significantly on our own, and the existing evolutionary timeframe doesn’t suit our current technological aspirations. The promise of deep learning was an acceleration of the evolution of intelligence itself, which would only temporarily involve humans.
A deep neural net would be given a basic set of parameters about the data by a person, and then the system would go out and learn on its own by recognizing patterns using many layers of processing. For researchers, the attraction of deep learning is that by design, machines make decisions unpredictably. Thinking in ways we humans have never imagined—or been able to do ourselves—is vitally important when trying to solve big problems for which there haven’t ever been clear solutions.
The AI community dismissed deep neural networks as the nonsensical ramblings of a scientist working on the fringe. Their doubt only intensified once it became clear that because deep-learning processes happen in parallel, they wouldn’t really be observable by AI researchers in real time. Someone would have to build the system and then trust that the decisions it was making were the right ones.
Winning and Losing
Hinton kept working, workshopping the idea with his students as well as with Lecun and Bengio, and published papers beginning in 2006. By 2009, Hinton’s lab had applied deep neural nets for speech recognition, and a chance meeting with a Microsoft researcher named Li Deng meant that the technology could be piloted in a meaningful way. Deng, a Chinese deep-learning specialist, was a pioneer in speech recognition using large-scale deep learning. By 2010, the technique was being tested at Google. Just two years later, deep neural nets were being used in commercial products. If you used Google Voice and its transcription services, that was deep learning, and the technique became the basis for all the digital assistants we use today. Siri, Google, and Amazon’s Alexa are all powered by deep learning. The AI community of interdisciplinary researchers had grown significantly since the Dartmouth summer. But those three key practices—that the big tech companies and academic researchers would work together, commercial success would drive the progress of AI, and the network of researchers would tend be homogenous—were still very much in play.
All of the advancements being made in America weren’t going unnoticed in Beijing. China now had a nascent but growing AI ecosystem of its own, and the state government was incentivizing researchers to publish their work. The number of scientific papers on AI published by Chinese researchers more than doubled between 2010 and 2017.35 To be fair, papers and patents don’t necessarily mean that research will find its way into widespread use, but it was an early indication of how rattled Chinese leaders were at all the progress being made in the West—especially when it came to Go.
By January 2014, Google had begun investing significantly in AI, which included more than $500 million to acquire a hot deep-learning startup called DeepMind and its three founders, neuroscientist Demis Hassabis, a former child prodigy in chess, machine-learning researcher Shane Legg, and entrepreneur Mustafa Suleyman. Part of the team’s appeal: they’d developed a program called AlphaGo.
Within months, they were ready to test AlphaGo against a real human player. A match was arranged between DeepMind and Fan Hui, a Chinese-born professional Go player and one of the strongest professional masters in Europe. Since playing Go on a computer isn’t quite the same as playing on a physical board, it was decided that one of DeepMind’s engineers would place the computer’s moves on the board and could communicate Hui’s moves back to the computer.
Before the game, Toby Man
ning, who was one of the heads of the British Go Association, played AlphaGo in a test round—and lost by 17 points. Manning made some errors, but so did the program. An eerie thought crossed his mind: What if the AlphaGo was just playing conservatively? Was it possible that the program was only playing aggressively enough to beat Manning, rather than to clobber him entirely?
The players sat down at a table, Fan Hui wearing a pinstriped button-down shirt and brown leather jacket, Manning in the center, and the engineer on the other side. Game play started. Hui opened a bottle of water and considered the board. As the black player, it was his turn to start. During the first 50 moves, it was a quiet game—Hui was clearly trying to suss out the strengths and weaknesses of AlphaGo. One early tell: the AI would not play aggressively unless it was behind. It was a tight first match. AlphaGo earned a very narrow victory, by just 1.5 points.
Hui used that information going into the second game. If AlphaGo wasn’t going to play aggressively, then Hui decided that he’d fight early. But then AlphaGo started playing more quickly. Hui mentioned that perhaps he needed a bit more time to think between turns. On move 147, Hui tried to prevent AlphaGo from claiming a big territory in the center of the board, but the move misfired, and he was forced to resign.
By game three, Hui’s moves were more aggressive, and AlphaGo followed suit. Halfway through, Hui made a catastrophic overplay, which AlphaGo punished, and then another big mistake, which rendered the game effectively over. Reeling from frustration, Hui had to excuse himself for a walk outside so that he could regain his composure and finish the match. Yet again, stress had gotten the better of a great human thinker—while the AI was unencumbered to ruthlessly pursue its goal.
AlphaGo—an AI program—had beaten a professional Go player 5–0. And it had won by analyzing fewer positions than IBM’s Deep Blue did by several orders of magnitude. When AlphaGo beat a human, it didn’t know it was playing a game, what a game means, or why humans get pleasure out of playing games.
Hanjin Lee, a high-ranking professional Go player from Korea, reviewed the games after. In an official public statement, he said, “My overall impression was that AlphaGo seemed stronger than Fan, but I couldn’t tell by how much… maybe it becomes stronger when it faces a stronger opponent.”36
Focusing on games—that is, beating humans in direct competition—has defined success using a relatively narrow set of parameters. And that brings us to a perplexing new philosophical question for our modern era of AI. In order for AI systems to win—to accomplish the goals we’ve created for them—do humans have to lose in ways that are both trivial and profound?
AlphaGo continued playing tournaments, besting every opponent with masterful abilities and demoralizing the professional Go community. After beating the world’s number one champion 3–0, DeepMind announced that it was retiring the AI system from competition, saying that the team would work on a new set of challenges.37 What the team started working on next was a way to evolve AlphaGo from a powerful system that could be trained to beat brilliant Go players to a system that could train itself to become just as powerful, without having to rely on humans.
The first version of AlphaGo required humans in the loop and an initial data set of 100,000 Go games in order to learn how to play. The next generation of the system was built to learn from zero. Just like a human player new to the game, this version—called AlphaGo Zero—would have to learn everything from scratch, completely on its own, without an opening library of moves or even a definition of what the pieces did. The system would not just make decisions—which were the result of computation and could be explicitly programmed—it would make choices, which had to do with judgment.38 This meant that the DeepMind architects wielded an enormous amount of power, even if they didn’t realize it. From them, Zero would learn the conditions, values, and motivations for making its decisions and choices during the game.
Zero competed against itself, tweaking and adjusting its decision-making processes alone. Each game play would begin with a few random moves, and with every win, Zero would update its system and then play again optimized by what it had learned. It took only 70 hours of play for Zero to gain the same level of strength AlphaGo had when it beat the world’s greatest players.39
And then something interesting happened. The DeepMind team applied its technique to a second instance of AlphaGo Zero using a larger network and allowed it to train and self-play for 40 days. It not only rediscovered the sum total of Go knowledge accumulated by humans, it beat the most advanced version of AlphaGo 90% of the time—using completely new strategies. This means that Zero evolved into both a better student than the world’s greatest Go masters and a better teacher than its human trainers, and we don’t entirely understand what it did to make itself that smart.40 Just how smart, you may be wondering? Well, a Go player’s strength is measured using something called an Elo rating, which determines a win/loss probability based on past performance. Grandmasters and world champions tend to have ratings near 3,500. Zero had a rating of more than 5,000. Comparatively, those brilliant world champions played like amateurs, and it would be statistically improbable that any human player could ever beat the AI system.
We do know one condition that enabled this kind of learning. By not using any human data or expertise, Zero’s creators removed the constraints of human knowledge on artificial intelligence. Humans, as it turned out, would have held the system back. The achievement was architecting a system that had the ability to think in an entirely new way and to make its own choices.41 It was a sudden, unexpected leap, one that portended a future in which AI systems could look at cancer screenings, evaluate climate data, and analyze poverty in nonhuman ways—potentially leading to breakthroughs that human researchers never would have thought of on their own.
As Zero played games against itself, it actually discovered Go strategies that humans had developed over 1,000 years—which means it had learned to think just like the humans who created it. In the early stages, it made the same mistakes, figured out the same patterns and variations, and ran into the same obstacles as we would. But once Zero got strong enough, it abandoned our human moves and came up with something it preferred.42 Once Zero took off on its own, it developed creative strategies that no one had ever seen before, suggesting that maybe machines were already thinking in ways that are both recognizable and alien to us.
What Zero also proved is that algorithms were now capable of learning without guidance from humans, and it was us humans who’d been holding AI systems back. It meant that in the near future, machines could be let loose on problems that we, on our own, could not predict or solve.
In December 2017, the DeepMind team published a paper showing that Zero was now generally capable of learning—not just Go but other information. On its own, Zero was playing other games, like chess and shoji (a Japanese game similar to chess), which are admittedly less complex but still require strategy and creativity. Only now, Zero was learning much faster than before. It managed to develop incomprehensible, superhuman power with less than 24 hours of game play. The team then started to work on applying the techniques they developed for Zero to build a “general-purpose learning machine,” a set of adaptive algorithms that mimic our own biological systems, capable of being trained. Rather than filling AI systems with a massive amount of information and set of instructions for how it can be queried, the team is instead teaching machines how to learn. Unlike humans, who might get tired, bored, or distracted when studying, machines will ruthlessly pursue a goal at all costs.
This was a defining moment in the long history of AI for a few reasons. First, the system behaved in unpredictable ways, making decisions that didn’t entirely make sense to its creators. And it beat a human player in ways that could neither be replicated nor fully understood. It portended a future in which AI could build its own neural pathways and gain knowledge that we may never understand. Second, it cemented the two parallel tracks AI is now moving along: China, alarmed, throws money and people
at making its domestic products more competitive, while in the United States, our expectations are that fantastical AI products will soon hit the marketplace. The viability of deep neural networks and deep learning are what’s behind the current frenzy surrounding AI—not to mention the sudden explosion of funding in the US and of China’s national proclamations about its plans for the future.
As a business unit within Alphabet (Google’s parent company), DeepMind has 700 employees, some of whom have been tasked with developing commercial products as quickly as possible. In March 2018, Google’s cloud business announced that it was selling a DeepMind-powered text-to-speech service for $16 per million characters of processed text.43 One of the breakout announcements from Google’s 2018 I/O conference was Duplex, a voice assistant that will automatically make calls on behalf of customers and talk to human receptionists to make restaurant reservations or appointments at salons, complete with “ums” and “ahs.” That product uses WaveNet, an AI-based generative program that’s part of DeepMind.44
Meanwhile, AI researchers in a different division of Alphabet called Google Brain revealed that they had built an AI that’s capable of generating its own AIs. (Got that?) The system, called AutoML, automated the design of machine-learning models using a technique called “reinforcement learning.” AutoML operated as a sort of “parent”—a top-level controller DNN that would decide to create “child” AI networks for narrow, specific tasks. Without being asked, AutoML generated a child called NASNet and taught it to recognize objects like people, cars, traffic lights, purses, and more in videos. Not burdened by stress, ego, doubt, or a lack of self-confidence—traits found in even the most brilliant computer scientists—NASNet had an 82.7% accuracy rate at predicting images. This meant that the child system was outperforming human coders—including the humans who originally created its parent.45