In 2014, Russell and three other scientists—Stephen Hawking, Max Tegmark, and Nobel laureate physicist Frank Wilczek—had published a stern warning, in of all venues The Huffington Post, about the dangers of AI. The idea, common among those working on AI, that because an artificial general intelligence is widely agreed to be several decades from realization we can just keep working on it and solve safety problems if and when they arise is one that Russell and his esteemed coauthors attack as fundamentally wrongheaded. “If a superior alien civilization sent us a text message saying ‘We’ll arrive in a few decades,’ would we just reply, ‘OK, call us when you get here—we’ll leave the lights on’? Probably not—but this is more or less what is happening with AI.”
The day after I had dinner with Viktoriya, I met Stuart at his office in Berkeley. Pretty much the first thing he did upon sitting me down was to open his laptop and turn it around toward me—a courtly gesture, oddly reminiscent of the serving of tea—so that I could read a few paragraphs of a paper called “Some Moral and Technical Consequences of Automation,” by the cybernetics founder Norbert Wiener. The paper, originally published in the journal Science in 1960, was a brief exploration of the tendency of machines to develop, as they begin to learn, “unforeseen strategies at rates that baffle their programmers.”
Stuart, an Englishman who radiated an aura of genial academic irony, directed me toward the last page of the paper, and sat in contemplative silence as I read on his screen the following passage: “If we use, to achieve our purposes, a mechanical agency with whose operation we cannot efficiently interfere once we have started it because the action is so fast and irrevocable that we have not the data to intervene before the action is complete, then we had better be quite sure that the purpose put into the machine is the purpose which we really desire and not merely a colorful imitation of it.”
As I swiveled Stuart’s laptop back in his direction, he said that the passage I had just read was as clear a statement as he’d encountered of the problem with AI, and of how that problem needed to be addressed. What we needed to be able to do, he said, was define exactly and unambiguously what it was we wanted from this technology. It was as straightforward as that, and as diabolically complex. It was not, he insisted, a question of machines going rogue, formulating their own goals and pursuing them at the expense of humanity, but rather a question of our own failure to communicate with sufficient clarity.
“I get a lot of mileage,” he said, “out of the King Midas myth.”
What King Midas wanted, presumably, was the selective ability to turn things into gold by touching them, but what he asked for (and what Dionysus famously granted him) was the inability to avoid turning things into gold by touching them. You could argue that his root problem was greed, but the proximate cause of his grief—which included, let’s remember, the unwanted alchemical transmutations of not just all foodstuffs and beverages, but ultimately his own daughter—was that he was insufficiently clear in communicating his wishes.
The fundamental risk with AI, in Stuart’s view, was no more or less than the fundamental difficulty in explicitly defining our own desires in a logically rigorous manner.
Imagine you have a massively powerful artificial intelligence, capable of solving the most vast and intractable scientific problems. Imagine you get in a room with this thing, and you tell it to eliminate cancer for once and for all. The computer will go about its work, and will quickly conclude that the most effective way to do so is to obliterate all species in which uncontrolled division of abnormal cells might potentially occur. Before you have a chance to realize your error, you’ve wiped out every sentient life form on earth, except for the artificial intelligence itself, which will have no reason not to believe it has successfully completed its task.
The AI researcher Stephen Omohundro, who sat with Stuart on MIRI’s board of research advisors, published a 2008 paper outlining the dangers of goal-directed AI systems. The paper, entitled “The Basic AI Drives,” contends that an AI trained upon even the most trivial of goals would, in the absence of extremely rigorous and complicated precautionary measures, present a very serious security risk. “Surely no harm could come from building a chess-playing robot, could it?” he asks, before briskly assuring us that a great deal of harm could in fact come from exactly that. “Without special precautions,” he writes, “it will resist being turned off, will try to break into other machines and make copies of itself, and will try to acquire resources without regard for anyone else’s safety. These potentially harmful behaviors will occur not because they were programmed in at the start, but because of the intrinsic nature of goal driven systems.”
Because a chess-playing AI would be driven entirely by the maximization of its utility function (playing and winning chess), any scenario in which it might get turned off is one that it would be motivated to avoid, given that getting turned off would cause a drastic reduction in that utility function. “When a chess playing robot is destroyed,” writes Omohundro, “it never plays chess again. Such outcomes will have very low utility and systems are likely to do just about anything to prevent them. So you build a chess playing robot thinking you can just turn it off should something go wrong. But, to your surprise, you find that it strenuously resists your attempts to turn it off.”
So the challenge for the developers of artificial intelligence was, in this view, to design the technology so that it wouldn’t mind getting turned off, and would otherwise behave in ways we found desirable. And the problem is that defining the sort of behavior we find desirable is not a straightforward matter. The phrase “human values” gets used a great deal in discussions of AI and existential risk, but its invocation is often qualified by an acknowledgment of the impossibility of any meaningfully accurate statement of said values. You might imagine, for instance, that you value the safety of your family above pretty much any other concern. And so you might think it sensible to instill, in a robot charged with the care of your children, the imperative that, whatever else it did or did not do, it must never cause those children to be put at risk of harm. This, in fact, is basically the first of Isaac Asimov’s famous Three Laws of Robotics, which states: “A robot may not injure a human being or, through inaction, allow a human being to come to harm.”
But the reality is that we’re not quite as monomaniacally invested in the prevention of harm to our children as we imagine ourselves to be. A self-driving car that followed this instruction with absolute rigor would, for instance—given the nontrivial risk of getting into an accident on the way—decline to take your kids to the movies to see the latest computer-animated film about a young boy and his adventures with his robot pal.
One potential approach, most prominently proposed by Stuart himself, was that rather than attempting to write these implicit values and trade-offs into an AI’s source code, the AI be programmed so that it learned by observing human behavior. “This is how we ourselves learn our value systems,” he said. “Partly it’s biological, in that, say, we don’t like pain. Partly it’s explicit, in that people tell you you shouldn’t steal. But most of it is observing the behavior of other people, and inferring the values that are reflected in that. This is what machines need to be made to do.”
When I asked him how far he felt we might be from a human-level artificial intelligence, Stuart was, in the customary manner of his profession, reluctant to offer predictions. The last time he’d made the mistake of alluding publicly to any sort of timeline had been the previous January at the World Economic Forum at Davos, where he sits on something called the Global Agenda Council on Artificial Intelligence and Robotics, and where he’d made a remark about AI exceeding human intelligence within the lifetime of his own children—the upshot of which, he said, had been a headline in the Daily Telegraph declaring that “ ‘Sociopathic’ Robots Could Overrun the Human Race Within a Generation.”
This sort of phrasing suggested, certainly, a hysteria that was absent from Stuart’s personal style. But in speaking with
people involved in the AI safety campaign, I became aware of an internal contradiction: their complaints about the media’s sensationalistic reporting of their claims were undermined by the fact that the claims themselves were already, sober language notwithstanding, about as sensational as it was possible for any claim to be. It was difficult to overplay something as inherently dramatic as the potential destruction of the entire human race, which is of course the main reason why the media—a category from which I did not presume to exclude myself—was so drawn to this whole business in the first place.
What Stuart was willing to say, however, was that human-level AI had come, in recent years, to seem “more imminent than it used to.” Developments in machine learning like those spearheaded by DeepMind, the London-based AI start-up acquired by Google in 2014, seemed to him to mark an acceleration in the advancement toward something transformative. (Not long before I met Stuart, DeepMind had released a video demonstrating the result of an experiment in which an artificial neural network was set the task of maximizing its score in the classic Atari arcade game Breakout, in which the player controls a paddle at the bottom of a screen, with which they must break through a wall by bouncing a ball off it and thereby breaking its bricks. The video demonstrated the impressive speed and ingenuity with which the network had taught itself to play the game, developing new tactics in order to more effectively rack up points, and quickly surpassing all previous score records set by human beings.)
A computer batting its way to glory at a primitive arcade game was a long way from HAL 9000. What such neural networks had so far failed to master was the process of hierarchical decision making, which would necessitate looking ahead more than a few steps in the pursuit of a given task.
“Think about the kinds of decisions and actions that led you to sitting here in my office today,” said Stuart, speaking so softly that I had to scoot my chair toward his desk and incline myself in his direction. “At the level of elementary moves, by which I mean actuations of your muscles and fingers and your tongue, your getting from Dublin to Berkeley might have involved something like five billion actions. But the really significant thing that humans are able to do in order to be competent in the real world—as opposed to, say, the world of a computer game or a chess program—is the ability to think in terms of higher-level actions. So rather than figuring out whether you should move this finger or that finger in this direction, or over that distance, you’re instead figuring out whether you should fly United or British Airways to San Francisco, and whether you should then get an Uber or take the BART across the bay to Berkeley. You’re able to think in these very large-scale chunks, and in that way you’re able to construct futures that span billions of physical actions, most of which are entirely unconscious. That hierarchical decision making is a key component of human intelligence, and it’s one we have yet to figure out how to implement in computers. But it’s by no means unachievable, and once we do, we’ll have made another major advance toward human-level AI.”
—
After I returned from Berkeley, it seemed that every week or so the progress of artificial intelligence had passed some new milestone. I would open up Twitter or Facebook, and my timelines—flows of information that were themselves controlled by the tidal force of hidden algorithms—would contain a strange and unsettling story about the ceding of some or other human territory to machine intelligence. I read that a musical was about to open in London’s West End, with a story and music and words all written entirely by an AI software called Android Lloyd Webber. I read that an AI called AlphaGo—also the work of Google’s DeepMind—had beaten a human grandmaster of Go, an ancient Chinese strategy board game that was exponentially more complex, in terms of possible moves, than chess. I read that a book written by a computer program had made it through the first stage of a Japanese literary award open to works written by both humans and AIs, and I thought of the professional futurist I had talked to in the pub in Bloomsbury after Anders’s talk, and his suggestion that works of literature would come increasingly to be written by machines.
I was unsure how to feel about all of this. In one sense, I was less disturbed by the question of what the existence of computer-generated novels or musicals might mean for the future of humanity than by the thought of having to read such a book, or endure such a performance. And neither had I taken any special pride in the primacy of my species at strategy board games, and so I found it hard to get excited about the ascendancy of AlphaGo, which seemed to me like a case of computers merely getting better at what they’d always been good at anyway, which was the rapid and thorough calculation of logical outcomes—a highly sophisticated search algorithm. But in another sense, it seemed reasonable to assume that these AIs would only get better at doing what they already did: that the West End musicals and sci-fi books would become incrementally less shit over time, and that more and more complicated tasks would be performed more and more efficiently by machines.
At times, it seemed to me perfectly obvious that the whole existential risk idea was a narcissistic fantasy of heroism and control—a grandiose delusion, on the part of computer programmers and tech entrepreneurs and other cloistered egomaniacal geeks, that the fate of the species lay in their hands: a ludicrous binary eschatology whereby we would be either destroyed by bad code or saved by good code. The whole thing seemed, at such moments, so childish as to barely be worth thinking about, except as an object lesson in the idiocy of a particular kind of cleverness.
But there were other occasions when I would become convinced that I was the one who was deluded, and that Nate Soares, for instance, was absolutely, terrifyingly right: that thousands of the world’s smartest people were spending their days using the world’s most sophisticated technology to build something that would destroy us all. It seemed, if not quite plausible, on some level intuitively, poetically, mythologically right. This was what we did as a species, after all: we built ingenious devices, and we destroyed things.
A Short Note on the First Robots
IN PRAGUE, ON the evening of January 25, 1921, human beings were first introduced to robots, and shortly thereafter to the elimination of their own species by same. This event occurred in the Czech National Theatre on the opening night of Karel Čapek’s play R.U.R. The title stood for “Rossum’s Universal Robots,” and marked the first ever usage of a term—derived from the Czech word “robota,” meaning “forced labor”—which would quickly become a convergence point in the intersecting mythologies of science fiction and capitalism. Visually, Čapek’s robots have less in common with later canonical representations of gleaming metallic humanoids—the more or less direct lineage from Fritz Lang’s Metropolis to George Lucas’s Star Wars to James Cameron’s Terminator—than with Blade Runner’s uncannily convincing replicants. They look more or less indistinguishable, that is, from humans; they are creatures not of circuitry and metal, but of flesh, or a fleshlike substance—produced in a series of “mixing vats,” one for each organ and body part, from a mysterious compound referred to as “batter.” The play is itself a strange, viscous concoction of sci-fi fable, political allegory, and social satire, whose polemical intentions rest uneasily between a critique of capitalist greed and an anticommunist fear of the organized mob.
Čapek’s robots are “artificial people” created for the purpose of increased industrial productivity, and represent, through the prism of the profit motive, an oppressively reductive view of human meaning. In the play’s first scene, a man named Domin, the manager of the robot production plant in which the action is set, is afforded a monologue, as flagrantly didactic as his own name, on how the inventor of these machines (the eponymous Rossum) created “a worker with the smallest number of needs, but to do so he had to simplify him. He chucked everything not directly related to work, and in so doing he pretty much discarded the human being and created the Robot.” These robots are, like Frankenstein’s monster before them, created fully formed, and are ready to begin work immediately. “The human m
achine,” he explains, “was hopelessly imperfect….Nature had no grasp of the modern rate of work. From a technical standpoint the whole of childhood is pure nonsense. Simply wasted time. An untenable waste of time.”
The explicit ideology behind the creation of these robots seems now, in its contradictory mix of ruthless corporatism and messianic rhetoric, strangely suggestive of Silicon Valley techno-progressivism, and of some of the more extravagant predictions about AI. Domin—who the stage directions specify is seated at “a large American desk” backed by printed posters bearing messages like “The Cheapest Labor: Rossum’s Robots”—insists that this technology will eradicate poverty entirely, that although people will be out of work, everything will be done by machines, and they will be free to live in pursuit of their own self-perfection. “Oh Adam, Adam!” he says, “no longer will you have to earn your bread by the sweat of your brow; you will return to Paradise, where you were nourished by the hand of God.”
As is customary with such enterprises, it doesn’t work out: the robots, having proliferated greatly by the play’s second act and having in many cases received military training in technologically forward-thinking European states, decide they no longer consent to be ruled over by a species they view as inferior, and so resolve to eradicate that species, which task they set about with precisely the sort of efficiency and singularity of purpose so highly valued by their human creators.
Aside from its allegorical depiction of capitalism’s mechanization of its subjects, the play, in its blunt fashion, animates an associated Promethean fear of technologies intended to replicate human life. The rising up of the robots, and the almost total elimination of humanity that follows, is presented as no more or less than divine vengeance—as the damnation that is the inevitable end of any attempt to recover Paradise.
To Be a Machine Page 11