AI Superpowers

Page 15

by Kai-Fu Lee

He knows my weakness for California wines, and I take him up on the offer. It’s indeed fantastic.

“I love it,” I say, handing the wineglass to the young man. “I’ll take two bottles.”

“Excellent choice—you can continue with your shopping, and I’ll bring those bottles to you in just a moment. If you’d like to schedule regular deliveries to your home or need recommendations on what else to try, you can find those in the Yonghui app or with me here.”

All the concierges are knowledgeable, friendly, and trained in the art of the upsell. It’s far more socially engaged work than traditional supermarket jobs, with all employees ready to discuss recipes, farm-to-table sourcing, and how each product compares with what I’ve tried in the past.

The shopping trip goes on like this, with my cart leading me through our typical purchases, and concierges occasionally nudging me to splurge on items that algorithms predict I’ll like. As a concierge is bagging my goods, my phone buzzes with this trip’s receipt in my WeChat Wallet. When they’re finished, the shopping cart guides itself back to its rack, and I stroll the two blocks home to my family.

Perception AI–powered shopping trips like this will capture one of the fundamental contradictions of the AI age before us: it will feel both completely ordinary and totally revolutionary. Much of our daily activity will still follow our everyday established patterns, but the digitization of the world will eliminate common points of friction and tailor services to each individual. They will bring the convenience and abundance of the online world into our offline reality. Just as important, by understanding and predicting the habits of each shopper, these stores will make major improvements in their supply chains, reducing food waste and increasing profitability.

And a supermarket like the one I’ve described isn’t far off. The core technologies already exist, and it’s largely a matter now of working out minor kinks in the software, integrating the back end of the supply chain, and building out the stores themselves.

AN OMO-POWERED EDUCATION

These kinds of immersive OMO scenarios go far beyond shopping. These same techniques—visual identification, speech recognition, creation of a detailed profile based on one’s past behavior—can be used to create a highly tailored experience in education.

Present-day education systems are still largely run on the nineteenth-century “factory model” of education: all students are forced to learn at the same speed, in the same way, at the same place, and at the same time. Schools take an “assembly line” approach, passing children from grade to grade each year, largely irrespective of whether or not they absorbed what was taught. It’s a model that once made sense given the severe limitations on teaching resources, namely, the time and attention of someone who can teach, monitor, and evaluate students.

But AI can help us lift those limitations. The perception, recognition, and recommendation abilities of AI can tailor the learning process to each student and also free up teachers for more one-on-one instruction time.

The AI-powered education experience takes place across four scenarios: in-class teaching, homework and drills, tests and grading, and customized tutoring. Performance and behavior in these four settings all feed into and build off of the bedrock of AI-powered education, the student profile. That profile contains a detailed accounting of everything that affects a student’s learning process, such as what concepts they already grasp well, what they struggle with, how they react to different teaching methods, how attentive they are during class, how quickly they answer questions, and what incentives drive them. To see how this data is gathered and used to upgrade the education process, let’s look at the four scenarios described above.

During in-class teaching, schools will employ a dual-teacher model that combines a remote broadcast lecture from a top educator and more personal attention by the in-class teacher. For the first half of class, a top-rated teacher delivers a lecture via a large-screen television at the front of the class. That teacher lectures simultaneously to around twenty classrooms and asks questions that students must answer via handheld clickers, giving the lecturer real-time feedback on whether students comprehend the concepts.

During the lecture, a video conference camera at the front of the room uses facial recognition and posture analysis to take attendance, check for student attentiveness, and assess the level of understanding based on gestures such as nodding, shaking one’s head, and expressions of puzzlement. All of this data—answers to clicker questions, attentiveness, comprehension—goes directly into the student profile, filling in a real-time picture of what the students know and what they need extra help with.

But in-class learning is just a fraction of the whole AI-education picture. When students head home, the student profile combines with question-generating algorithms to create homework assignments precisely tailored to the students’ abilities. While the whiz kids must complete higher-level problems that challenge them, the students who have yet to fully grasp the material are given more fundamental questions and perhaps extra drills.

At each step along the way, students’ time and performance on different problems feed into their student profiles, adjusting the subsequent problems to reinforce understanding. In addition, for classes such as English (which is mandatory in Chinese public schools), AI-powered speech recognition can bring top-flight English instruction to the most remote regions. High-performance speech recognition algorithms can be trained to assess students’ English pronunciation, helping them improve intonation and accent without the need for a native English speaker on site.

From a teacher’s perspective, these same tools can be used to alleviate the burden of routine grading tasks, freeing up teachers to spend more time on the students themselves. Chinese companies have already used perception AI’s visual recognition abilities to build scanners that can grade multiple-choice and fill-in-the-blank tests. Even in essays, standard errors such as spelling or grammar can be marked automatically, with predetermined deductions of points for certain mistakes. This AI-powered technology will save teachers’ time in correcting the basics, letting them shift that time to communicating with students about higher-level writing concepts.

Finally, for students who are falling behind, the AI-powered student profile will notify parents of their child’s situation, giving a clear and detailed explanation of what concepts the student is struggling with. The parents can use this information to enlist a remote tutor through services such as VIPKid, which connects American teachers with Chinese students for online English classes. Remote tutoring has been around for some time, but perception AI now allows these platforms to continuously gather data on student engagement through expression and sentiment analysis. That data continually feeds into a student’s profile, helping the platforms filter for the kinds of teachers that keep students engaged.

Almost all of the tools described here already exist, and many are being implemented in different classrooms across China. Taken together, they constitute a new AI-powered paradigm for education, one that merges the online and offline worlds to create a learning experience tailored to the needs and abilities of each student. China appears poised to leapfrog the United States in education AI, in large part due to voracious demand from Chinese parents. Chinese parents of only children pour money into their education, a result of deeply entrenched Chinese values, intense competition for university spots, and a public education system of mixed quality. Those parents have already driven services like VIPKid to a valuation of over $3 billion in just a few years’ time.

PUBLIC SPACES AND PRIVATE DATA

Creating and leveraging these OMO experiences requires vacuuming up oceans of data from the real world. Optimizing traffic flows via Alibaba’s City Brain requires slurping up video feeds from around the city. Tailoring OMO retail experiences for each shopper requires identifying them via facial recognition. And accessing the power of the internet via voice commands requires technology that listens to our every word.

T
hat type of data collection may rub many Americans the wrong way. They don’t want Big Brother or corporate America to know too much about what they’re up to. But people in China are more accepting of having their faces, voices, and shopping choices captured and digitized. This is another example of the broader Chinese willingness to trade some degree of privacy for convenience. That surveillance filters up from individual users to entire urban environments. Chinese cities already use a dense network of cameras and sensors to enforce traffic laws. That web of surveillance footage is now feeding directly into optimization algorithms for traffic management, policing, and emergency services.

It’s up to each country to make its own decisions on how to balance personal privacy and public data. Europe has taken the strictest approach to data protection by introducing the General Data Protection Regulation, a law that sets a variety of restrictions on the collection and use of data within the European Union. The United States continues to grapple with implementing appropriate protections to user privacy, a tension illustrated by Facebook’s Cambridge Analytica scandal and subsequent congressional hearings. China began implementing its own Cybersecurity Law in 2017, which included new punishments for the illegal collection or sale of user data.

There’s no right answer to questions about what level of social surveillance is a worthwhile price for greater convenience and safety, or what level of anonymity we should be guaranteed at airports or subway stations. But in terms of immediate impact, China’s relative openness with data collection in public places is giving it a massive head start on implementation of perception AI. It is accelerating the digitization of urban environments and opening the door to new OMO applications in retail, security, and transportation.

But pushing perception AI into these spheres requires more than just video cameras and digital data. Unlike internet and business AI, perception AI is a hardware-heavy enterprise. As we turn hospitals, cars, and kitchens into OMO environments, we will need a diverse array of sensor-enabled hardware devices to sync up the physical and digital worlds.

MADE IN SHENZHEN

Silicon Valley may be the world champion of software innovation, but Shenzhen (pronounced “shun-jun”) wears that crown for hardware. In the last five years, this young manufacturing metropolis on China’s southern coast has turned into the world’s most vibrant ecosystem for building intelligent hardware. Creating an innovative app requires almost no real-world tools: all you need is a computer and a programmer with a clever idea. But building the hardware for perception AI—shopping carts with eyes and stereos with ears—demands a powerful and flexible manufacturing ecosystem, including sensor suppliers, injection-mold engineers, and small-batch electronics factories.

When most people think of Chinese factories, they envision sweatshops with thousands of underpaid workers stitching together cheap shoes and teddy bears. These factories do still exist, but the Chinese manufacturing ecosystem has undergone a major technological upgrade. Today, the greatest advantage of manufacturing in China isn’t the cheap labor—countries like Indonesia and Vietnam offer lower wages. Instead, it’s the unparalleled flexibility of the supply chains and the armies of skilled industrial engineers who can make prototypes of new devices and build them at scale.

These are the secret ingredients powering Shenzhen, whose talented workers have transformed it from a dirt-cheap factory town to a go-to city for entrepreneurs who want to build new drones, robots, wearables, or intelligent machines. In Shenzhen, those entrepreneurs have direct access to thousands of factories and hundreds of thousands of engineers who help them iterate faster and produce goods cheaper than anywhere else.

At the city’s dizzying electronics markets, they can choose from thousands of different variations of circuit boards, sensors, microphones, and miniature cameras. Once a prototype is assembled, the builders can go door to door at hundreds of factories to find one capable of producing their product in small batches or at large scale. That geographic density of parts suppliers and product manufacturers accelerates the innovation process. Hardware entrepreneurs say that a week spent working in Shenzhen is equivalent to a month in the United States.

As perception AI transforms our lived environment, the ease of experimentation and the production of smart devices gives Chinese startups an edge. Shenzhen is open to international hardware startups, but locals have a heavy home-court advantage. The many frictions of operating in a foreign country—language barrier, visa issues, tax complications, and distance from headquarters—can slow down American startups and raise the cost of their products. Massive multinationals like Apple have the resources to leverage Chinese manufacturing to the fullest, but for foreign startups small frictions can spell doom. Meanwhile, homegrown hardware startups in Shenzhen are like kids in a candy store, experimenting freely and building cheaply.

MI FIRST

The Chinese hardware startup Xiaomi (pronounced “sheow-me”) gives a glimpse of what a densely woven web of perception-AI devices could look like. Launched as a low-cost smartphone maker that took the country by storm, Xiaomi is now building a network of AI-empowered home devices that will turn our kitchens and living rooms into OMO environments.

Central to that system is the Mi AI speaker, a voice-command AI device similar to the Amazon Echo but at around half the price, thanks to the Chinese home-court manufacturing advantage. That advantage is then leveraged to build a range of smart, sensor-driven home devices: air purifiers, rice cookers, refrigerators, security cameras, washing machines, and autonomous vacuum cleaners. Xiaomi doesn’t build all of these devices itself. Instead, it has invested in 220 companies and incubated 29 startups—many operating in Shenzhen—whose intelligent home products are hooked into the Xiaomi ecosystem. Together they are creating an affordable, intelligent home ecosystem, with WiFi-enabled products that find each other and make configuration easy. Xiaomi users can then simply control the entire ecosystem via voice command or directly on their phone.

It’s a constellation of price, diversity, and capability that has created the world’s largest network of intelligent home devices: 85 million by the end of 2017, far ahead of any comparable U.S. networks. It’s also an ecosystem built on the Made-in-Shenzhen advantage. Low prices and China’s massive market are turbocharging the data-gathering process for Xiaomi, fueling a virtuous cycle of stronger algorithms, smarter products, better user experience, more sales, and even more data. It’s also an ecosystem that has produced four unicorn startups within Xiaomi’s ecosystem alone and is driving Xiaomi toward an IPO predicted to value the company at around $100 billion.

As perception AI finds its way into more pieces of hardware, the entire home will feed into and operate off digitized real-world data. Your AI fridge will order more milk when it sees that you’re running low. Your cappuccino machine will kick into gear at your voice command. The AI-equipped floors of your elderly parents will alert you immediately if they’ve tripped and fallen.

Third-wave AI products like these are on the verge of transforming our everyday environment, blurring lines between the digital and physical world until they disappear entirely. During this transformation, Chinese users’ cultural nonchalance about data privacy and Shenzhen’s strength in hardware manufacturing give it a clear edge in implementation. Today, China’s edge is slight (60–40), but I predict that in five years’ time, the above factors will give China a more than 80–20 chance of leading the United States and the rest of the world in the implementation of perception AI.

These third-wave AI innovations will create tremendous economic opportunities and also lay the foundation for the fourth and final wave, full autonomy.

FOURTH WAVE: AUTONOMOUS AI

Once machines can see and hear the world around them, they’ll be ready to move through it safely and work in it productively. Autonomous AI represents the integration and culmination of the three preceding waves, fusing machines’ ability to optimize from extremely complex data sets with their newfound sensory powers. Combining t
hese superhuman powers yields machines that don’t just understand the world around them—they shape it.

Self-driving cars may be on everyone’s mind these days, but before we dive into autonomous vehicles, it’s important to widen the lens and recognize just how deep and wide a footprint fourth-wave AI will have. Autonomous AI devices will revolutionize so much of our daily lives, including our malls, restaurants, cities, factories, and fire departments. As with the different waves of AI, this won’t happen all at once. Early autonomous robotics applications will work only in highly structured environments where they can create immediate economic value. That means primarily factories, warehouses, and farms.

But aren’t these places already highly automated? Hasn’t heavy machinery already taken over many blue-collar line jobs? Yes, the developed world has largely replaced raw human muscle with high-powered machines. But while these machines are automated, they are not autonomous. While they can repeat an action, they can’t make decisions or improvise according to changing conditions. Entirely blind to visual inputs, they must be controlled by a human or operate on a single, unchanging track. They can perform repetitive tasks, but they can’t deal with any deviations or irregularities in the objects they manipulate. But by giving machines the power of sight, the sense of touch, and the ability to optimize from data, we can dramatically expand the number of tasks they can tackle.

STRAWBERRY FIELDS AND ROBOTIC BEETLES

Some of these applications are already at hand. Picking strawberries sounds like a straightforward task, but the ability to find, judge, and pluck fruits from plants proved impossible to automate before autonomous AI. Instead, tens of thousands of low-paid workers had to walk, hunched over, through strawberry fields all day, using their eyes and dexterous fingers to get the job done. It’s grueling and tedious work, and many California farmers have watched fruit rot in their fields when they can’t find people willing to take it on.

‹ Prev Next ›