AI Superpowers

Page 2

by Kai-Fu Lee

I believe that the skillful application of AI will be China’s greatest opportunity to catch up with—and possibly surpass—the United States. But more important, this shift will create an opportunity for all people to rediscover what it is that makes us human.

To understand why, we must first grasp the basics of the technology and how it is set to transform our world.

A BRIEF HISTORY OF DEEP LEARNING

Machine learning—the umbrella term for the field that includes deep learning—is a history-altering technology but one that is lucky to have survived a tumultuous half-century of research. Ever since its inception, artificial intelligence has undergone a number of boom-and-bust cycles. Periods of great promise have been followed by “AI winters,” when a disappointing lack of practical results led to major cuts in funding. Understanding what makes the arrival of deep learning different requires a quick recap of how we got here.

Back in the mid-1950s, the pioneers of artificial intelligence set themselves an impossibly lofty but well-defined mission: to recreate human intelligence in a machine. That striking combination of the clarity of the goal and the complexity of the task would draw in some of the greatest minds in the emerging field of computer science: Marvin Minsky, John McCarthy, and Herbert Simon.

As a wide-eyed computer science undergrad at Columbia University in the early 1980s, all of this seized my imagination. I was born in Taiwan in the early 1960s but moved to Tennessee at the age of eleven and finished middle and high school there. After four years at Columbia in New York, I knew that I wanted to dig deeper into AI. When applying for computer science Ph.D. programs in 1983, I even wrote this somewhat grandiose description of the field in my statement of purpose: “Artificial intelligence is the elucidation of the human learning process, the quantification of the human thinking process, the explication of human behavior, and the understanding of what makes intelligence possible. It is men’s final step to understand themselves, and I hope to take part in this new, but promising science.”

That essay helped me get into the top-ranked computer science department of Carnegie Mellon University, a hotbed for cutting-edge AI research. It also displayed my naiveté about the field, both overestimating our power to understand ourselves and underestimating the power of AI to produce superhuman intelligence in narrow spheres.

By the time I began my Ph.D., the field of artificial intelligence had forked into two camps: the “rule-based” approach and the “neural networks” approach. Researchers in the rule-based camp (also sometimes called “symbolic systems” or “expert systems”) attempted to teach computers to think by encoding a series of logical rules: If X, then Y. This approach worked well for simple and well-defined games (“toy problems”) but fell apart when the universe of possible choices or moves expanded. To make the software more applicable to real-world problems, the rule-based camp tried interviewing experts in the problems being tackled and then coding their wisdom into the program’s decision-making (hence the “expert systems” moniker).

The “neural networks” camp, however, took a different approach. Instead of trying to teach the computer the rules that had been mastered by a human brain, these practitioners tried to reconstruct the human brain itself. Given that the tangled webs of neurons in animal brains were the only thing capable of intelligence as we knew it, these researchers figured they’d go straight to the source. This approach mimics the brain’s underlying architecture, constructing layers of artificial neurons that can receive and transmit information in a structure akin to our networks of biological neurons. Unlike the rule-based approach, builders of neural networks generally do not give the networks rules to follow in making decisions. They simply feed lots and lots of examples of a given phenomenon—pictures, chess games, sounds—into the neural networks and let the networks themselves identify patterns within the data. In other words, the less human interference, the better.

Differences between the two approaches can be seen in how they might approach a simple problem, identifying whether there is a cat in a picture. The rule-based approach would attempt to lay down “if-then” rules to help the program make a decision: “If there are two triangular shapes on top of a circular shape, then there is probably a cat in the picture.” The neural network approach would instead feed the program millions of sample photos labeled “cat” or “no cat,” letting the program figure out for itself what features in the millions of images were most closely correlated to the “cat” label.

During the 1950s and 1960s, early versions of artificial neural networks yielded promising results and plenty of hype. But then in 1969, researchers from the rule-based camp pushed back, convincing many in the field that neural networks were unreliable and limited in their use. The neural networks approach quickly went out of fashion, and AI plunged into one of its first “winters” during the 1970s.

Over the subsequent decades, neural networks enjoyed brief stints of prominence, followed by near-total abandonment. In 1988, I used a technique akin to neural networks (Hidden Markov Models) to create Sphinx, the world’s first speaker-independent program for recognizing continuous speech. That achievement landed me a profile in the New York Times. But it wasn’t enough to save neural networks from once again falling out of favor, as AI reentered a prolonged ice age for most of the 1990s.

What ultimately resuscitated the field of neural networks—and sparked the AI renaissance we are living through today—were changes to two of the key raw ingredients that neural networks feed on, along with one major technical breakthrough. Neural networks require large amounts of two things: computing power and data. The data “trains” the program to recognize patterns by giving it many examples, and the computing power lets the program parse those examples at high speeds.

Both data and computing power were in short supply at the dawn of the field in the 1950s. But in the intervening decades, all that has changed. Today, your smartphone holds millions of times more processing power than the leading cutting-edge computers that NASA used to send Neil Armstrong to the moon in 1969. And the internet has led to an explosion of all kinds of digital data: text, images, videos, clicks, purchases, Tweets, and so on. Taken together, all of this has given researchers copious amounts of rich data on which to train their networks, as well as plenty of cheap computing power for that training.

But the networks themselves were still severely limited in what they could do. Accurate results to complex problems required many layers of artificial neurons, but researchers hadn’t found a way to efficiently train those layers as they were added. Deep learning’s big technical break finally arrived in the mid-2000s, when leading researcher Geoffrey Hinton discovered a way to efficiently train those new layers in neural networks. The result was like giving steroids to the old neural networks, multiplying their power to perform tasks such as speech and object recognition.

Soon, these juiced-up neural networks—now rebranded as “deep learning”—could outperform older models at a variety of tasks. But years of ingrained prejudice against the neural networks approach led many AI researchers to overlook this “fringe” group that claimed outstanding results. The turning point came in 2012, when a neural network built by Hinton’s team demolished the competition in an international computer vision contest.

After decades spent on the margins of AI research, neural networks hit the mainstream overnight, this time in the form of deep learning. That breakthrough promised to thaw the ice from the latest AI winter, and for the first time truly bring AI’s power to bear on a range of real-world problems. Researchers, futurists, and tech CEOs all began buzzing about the massive potential of the field to decipher human speech, translate documents, recognize images, predict consumer behavior, identify fraud, make lending decisions, help robots “see,” and even drive a car.

PULLING BACK THE CURTAIN ON DEEP LEARNING

So how does deep learning do this? Fundamentally, these algorithms use massive amounts of data from a specific domain to make a decision that
optimizes for a desired outcome. It does this by training itself to recognize deeply buried patterns and correlations connecting the many data points to the desired outcome. This pattern-finding process is easier when the data is labeled with that desired outcome—“cat” versus “no cat”; “clicked” versus “didn’t click”; “won game” versus “lost game.” It can then draw on its extensive knowledge of these correlations—many of which are invisible or irrelevant to human observers—to make better decisions than a human could.

Doing this requires massive amounts of relevant data, a strong algorithm, a narrow domain, and a concrete goal. If you’re short any one of these, things fall apart. Too little data? The algorithm doesn’t have enough examples to uncover meaningful correlations. Too broad a goal? The algorithm lacks clear benchmarks to shoot for in optimization.

Deep learning is what’s known as “narrow AI”—intelligence that takes data from one specific domain and applies it to optimizing one specific outcome. While impressive, it is still a far cry from “general AI,” the all-purpose technology that can do everything a human can.

Deep learning’s most natural application is in fields like insurance and making loans. Relevant data on borrowers is abundant (credit score, income, recent credit-card usage), and the goal to optimize for is clear (minimize default rates). Taken one step further, deep learning will power self-driving cars by helping them to “see” the world around them—recognize patterns in the camera’s pixels (red octagons), figure out what they correlate to (stop signs), and use that information to make decisions (apply pressure to the brake to slowly stop) that optimize for your desired outcome (deliver me safely home in minimal time).

People are so excited about deep learning precisely because its core power—its ability to recognize a pattern, optimize for a specific outcome, make a decision—can be applied to so many different kinds of everyday problems. That’s why companies like Google and Facebook have scrambled to snap up the small core of deep-learning experts, paying them millions of dollars to pursue ambitious research projects. In 2013, Google acquired the startup founded by Geoffrey Hinton, and the following year scooped up British AI startup DeepMind—the company that went on to build AlphaGo—for over $500 million. The results of these projects have continued to awe observers and grab headlines. They’ve shifted the cultural zeitgeist and given us a sense that we stand at the precipice of a new era, one in which machines will radically empower and/or violently displace human beings.

AI AND INTERNATIONAL RESEARCH

But where was China in all this? The truth is, the story of the birth of deep learning took place almost entirely in the United States, Canada, and the United Kingdom. After that, a smaller number of Chinese entrepreneurs and venture-capital funds like my own began to invest in this area. But the great majority of China’s technology community didn’t properly wake up to the deep-learning revolution until its Sputnik Moment in 2016, a full decade behind the field’s breakthrough academic paper and four years after it proved itself in the computer vision competition.

American universities and technology companies have for decades reaped the rewards of the country’s ability to attract and absorb talent from around the globe. Progress in AI appeared to be no different. The United States looked to be out to a commanding lead, one that would only grow as these elite researchers leveraged Silicon Valley’s generous funding environment, unique culture, and powerhouse companies. In the eyes of most analysts, China’s technology industry was destined to play the same role in global AI that it had for decades: that of the copycat who lagged far behind the cutting edge.

As I demonstrate in the following chapters, that analysis is wrong. It is based on outdated assumptions about the Chinese technology environment, as well as a more fundamental misunderstanding of what is driving the ongoing AI revolution. The West may have sparked the fire of deep learning, but China will be the biggest beneficiary of the heat the AI fire is generating. That global shift is the product of two transitions: from the age of discovery to the age of implementation, and from the age of expertise to the age of data.

Core to the mistaken belief that the United States holds a major edge in AI is the impression that we are living in an age of discovery, a time in which elite AI researchers are constantly breaking down old paradigms and finally cracking longstanding mysteries. This impression has been fed by a constant stream of breathless media reports announcing the latest feat performed by AI: diagnosing certain cancers better than doctors, beating human champions at the bluff-heavy game of Texas Hold’em, teaching itself how to master new skills with zero human interference. Given this flood of media attention to each new achievement, the casual observer—or even expert analyst—would be forgiven for believing that we are consistently breaking fundamentally new ground in artificial intelligence research.

I believe this impression is misleading. Many of these new milestones are, rather, merely the application of the past decade’s breakthroughs—primarily deep learning but also complementary technologies like reinforcement learning and transfer learning—to new problems. What these researchers are doing requires great skill and deep knowledge: the ability to tweak complex mathematical algorithms, to manipulate massive amounts of data, to adapt neural networks to different problems. That often takes Ph.D.-level expertise in these fields. But these advances are incremental improvements and optimizations that leverage the dramatic leap forward of deep learning.

THE AGE OF IMPLEMENTATION

What they really represent is the application of deep learning’s incredible powers of pattern recognition and prediction to different spheres, such as diagnosing a disease, issuing an insurance policy, driving a car, or translating a Chinese sentence into readable English. They do not signify rapid progress toward “general AI” or any other similar breakthrough on the level of deep learning. This is the age of implementation, and the companies that cash in on this time period will need talented entrepreneurs, engineers, and product managers.

Deep-learning pioneer Andrew Ng has compared AI to Thomas Edison’s harnessing of electricity: a breakthrough technology on its own, and one that once harnessed can be applied to revolutionizing dozens of different industries. Just as nineteenth-century entrepreneurs soon began applying the electricity breakthrough to cooking food, lighting rooms, and powering industrial equipment, today’s AI entrepreneurs are doing the same with deep learning. Much of the difficult but abstract work of AI research has been done, and it’s now time for entrepreneurs to roll up their sleeves and get down to the dirty work of turning algorithms into sustainable businesses.

That in no way diminishes the current excitement around AI; implementation is what makes academic advances meaningful and what will truly end up changing the fabric of our daily lives. The age of implementation means we will finally see real-world applications after decades of promising research, something I’ve been looking forward to for much of my adult life.

But making that distinction between discovery and implementation is core to understanding how AI will shape our lives and what—or which country—will primarily drive that progress. During the age of discovery, progress was driven by a handful of elite thinkers, virtually all of whom were clustered in the United States and Canada. Their research insights and unique intellectual innovations led to a sudden and monumental ramping up of what computers can do. Since the dawn of deep learning, no other group of researchers or engineers has come up with innovation on that scale.

THE AGE OF DATA

This brings us to the second major transition, from the age of expertise to the age of data. Today, successful AI algorithms need three things: big data, computing power, and the work of strong—but not necessarily elite—AI algorithm engineers. Bringing the power of deep learning to bear on new problems requires all three, but in this age of implementation, data is the core. That’s because once computing power and engineering talent reach a certain threshold, the quantity of data becomes decisive in determining the overa
ll power and accuracy of an algorithm.

In deep learning, there’s no data like more data. The more examples of a given phenomenon a network is exposed to, the more accurately it can pick out patterns and identify things in the real world. Given much more data, an algorithm designed by a handful of mid-level AI engineers usually outperforms one designed by a world-class deep-learning researcher. Having a monopoly on the best and the brightest just isn’t what it used to be.

Elite AI researchers still have the potential to push the field to the next level, but those advances have occurred once every several decades. While we wait for the next breakthrough, the burgeoning availability of data will be the driving force behind deep learning’s disruption of countless industries around the world.

ADVANTAGE CHINA

Realizing the newfound promise of electrification a century ago required four key inputs: fossil fuels to generate it, entrepreneurs to build new businesses around it, electrical engineers to manipulate it, and a supportive government to develop the underlying public infrastructure. Harnessing the power of AI today—the “electricity” of the twenty-first century—requires four analogous inputs: abundant data, hungry entrepreneurs, AI scientists, and an AI-friendly policy environment. By looking at the relative strengths of China and the United States in these four categories, we can predict the emerging balance of power in the AI world order.

Both of the transitions described on the previous pages—from discovery to implementation, and from expertise to data—now tilt the playing field toward China. They do this by minimizing China’s weaknesses and amplifying its strengths. Moving from discovery to implementation reduces one of China’s greatest weak points (outside-the-box approaches to research questions) and also leverages the country’s most significant strength: scrappy entrepreneurs with sharp instincts for building robust businesses. The transition from expertise to data has a similar benefit, downplaying the importance of the globally elite researchers that China lacks and maximizing the value of another key resource that China has in abundance, data.

‹ Prev Next ›