Hit Refresh
Page 13
The intellectual history of how computers augment the human intellect and build a collective IQ has always fascinated me. Doug Engelbart in the 1960s performed “the mother of all demos,” introducing the mouse, hypertext, and shared-screen teleconferencing. Engelbart’s Law states that the rate of human performance is exponential; that while technology will augment our capabilities, our ability to improve upon improvements is a uniquely human endeavor. He essentially founded the field of human-computer interaction. There are many other visionaries who influenced me and the industry, but around the time I joined Microsoft in 1992, two futuristic novels were being eagerly consumed by engineers all over campus. Neal Stephenson’s Snow Crash popularized the term metaverse, envisioning a collective virtual and shared space. David Gelernter wrote Mirror Worlds, foreseeing software that would revolutionize computing and transform society by replacing reality with a digital imitation. These ideas are now within sight.
* * *
It is a magical feeling, at least for me, the first time you experience a profound new technology. In the 1980s, when I first learned to write a few lines of BASIC code for that Z80 computer my dad bought for me, the lightbulb went off. Suddenly I was communicating with a machine. I wrote something and it generated output, a response. I could change the program and instantly change the response. I had discovered software, the most malleable resource humans have made. It was like lightning in a bottle. I clearly remember the excitement I felt the first time I encountered the spreadsheet. A data structure like pivot tables was now second nature to how one thought about numbers.
Our industry is full of those eureka moments of discovery. My most startling moment arrived, surprisingly, on the surface of planet Mars—standing in the basement of Microsoft’s Building 92.
It was there that I first slipped on a HoloLens device, a small head-mounted computer that is completely self-contained. Suddenly HoloLens transported me—virtually, of course—onto the surface of the Red Planet, 250 million miles away, thanks to a feed from NASA’s Mars rover, Curiosity. Through HoloLens, I could see my two street shoes walking, in the most convincing and baffling way, on the dusty Martian plain near a rocky waypoint called Kimberley along the rover’s journey to Murray Buttes. HoloLens made it possible for me both to walk around the actual room—to see a desk and to interact with people around me—and to inspect rocks on Mars’s surface. That’s the amazing, unprecedented nature of what we call mixed reality. The experience was so inspiring, so moving, that one member of my leadership team cried during that virtual excursion.
What I saw and experienced that day was a glimpse of Microsoft’s future. Perhaps this particular moment will be remembered as the advent of a mixed reality revolution, one in which everyone works and plays in an immersive environment that blends the real world and a virtual world. Will there one day be mixed reality natives—young people who expect all of their computer experiences to be immersive blends of the real and the virtual—just as today we recognize digital natives, those for whom the Internet has always been there?
Companies are taking different approaches with head-mounted computers. Virtual reality, as provided by our Windows 10 MR devices or Facebook’s Oculus Rift, largely blocks out the real world, immersing the user in a completely digital world. Google Glass, for example, projects information onto your eyeglasses. Snapchat Spectacles let you augment what you see with relevant content and filters. HoloLens provides access to mixed reality in which the users can navigate both their current location—interact with people in the same room—and a remote environment while also manipulating holograms and other digital objects. Analysts at Gartner Inc., the technology research firm, have made an art from the study of the hype cycles and arcs followed by new technologies as they move from invention to widespread adoption (or demise), and believe virtual reality technologies are likely five to ten years away from mainstream adoption.
Just getting to the starting line proved difficult for us. My colleague Alex Kipman had been perfecting a prototype of HoloLens for some time. Alex and his team had already created one breakthrough: They’d developed Microsoft Kinect, the motion-sensing technology that today is an ingredient in leading-edge robots (enabling them to move in a more human-like manner), while also providing a fun way of using your body to play games on Xbox. However, Alex’s HoloLens project had bounced around the company in search of continued funding. It was unclear whether Microsoft would invest in mixed reality, a new business in an unproven market. The quest seemed so ridiculous at times that Alex whimsically code-named the project Baraboo in honor of a town in Wisconsin that is home to a circus and clown museum.
Once I got a chance to see what HoloLens could do, I was sold. While HoloLens has obvious applications in video gaming, I instantly saw its potential in classrooms, hospitals, and, yes, space exploration. NASA was, in fact, one of the first organizations to see the value of HoloLens, adopting an early version to enable astronauts on Earth to collaborate with astronauts in space. If anyone was on the fence after the Mars demonstration, Bill Gates’s email after his experience convinced even the most skeptical.
I was VERY impressed with 2 things about the Mars demo:
First, the fidelity was VERY good. The image looks real and when I moved my head it felt real. I felt like I was there.
Second, the ability to move physically around the space was quite natural while using peripheral vision to avoid hitting anything. Although I am still not sure what applications will take off, the latest demo really has me enthused about the project and that we will find a way to make this a success. I have been converted.
Yes, Alex, we’ll invest.
To understand the soul of HoloLens, it helps to understand Alex and his past. In some ways, our stories have a lot in common. The son of a career government diplomat in Brazil, Alex moved around a lot as a kid and found that math, science, and eventually computers were his only consistent companions. “If you know how to paint with math and science, you can make anything,” he once told me. His parents bought him an Atari 2600 home video console that he broke repeatedly but eventually learned to program. His passion for technology led him to the Rochester Institute of Technology, an internship with NASA, and, later, highly sophisticated computer programming roles in Silicon Valley.
His quest, however, was to find a place where he could design software for the sake of software, a place that treated software as an art form. He came to Microsoft where he would play a role in designing Windows Vista, the long-awaited successor to Windows XP. When Vista received lukewarm reviews despite its advanced features, no one was more disappointed than Alex. He took it personally and returned to Brazil to reflect—to hit refresh on his own career outlook. Alex is very philosophical and turned to Nietzsche for direction: “He who has a ‘why’ to live for can bear almost any ‘how.’” Alex was upset with himself because he did not yet have his “why,” a point of view about where computing should be headed.
He would later tell journalist Kevin Dupzyk that he visited a farm on Brazil’s eastern shore, wandering around with a notebook and pondering the contribution he wanted to make to computing. He began to think about how computing could displace time and space. Why are we chained to keyboards and screens? Why can’t I use my computer to be with anyone I want, no matter where they are? Alex sensed that the evolution of computing had only reached the equivalent of prehistoric cave paintings. MR was to become a new paintbrush that would create an entirely new computing paradigm.
Alex defined a new career quest for himself: “I am going to make machines that perceive the real world.” Perception—not a mouse, keyboard, and screen—would be the protagonist of his story. Machines that perceive us became his “why.”
The “how,” the blueprint, became to build a new computing experience designed around sensors that can perceive humans, their environment, and the objects around them. This new computing experience must enable three kinds of interactions: the ability to input analog data, the ability to ou
tput digital data, and the ability to feel or touch data—something known as haptics.
Kinect was the first step in this journey—it checked the box for a human to provide input to a computer simply by moving. Suddenly we could dance with a computer. Now HoloLens is checking multiple boxes. It enables humans, the environment, and objects to give and receive input and output across time and space. Suddenly an astronaut on Earth can inspect a crater on Mars. The final piece, the haptics, will include the ability to touch and feel. When we dance using Kinect or reach for a rock using HoloLens, we cannot yet feel our dance partner or that rock. But one day we will.
Today, our focus at Microsoft is to democratize mixed reality, to make it available to everyone. Our launch of HoloLens has been based on a proven strategy for Microsoft—inviting outside developers to help us create imaginative applications for the HoloLens platform. Soon after we announced HoloLens, more than five thousand developers submitted ideas for applications they wanted to build. We ran a twenty-four-hour Twitter poll to ask which idea we should build first. Developers and fans chose Galaxy Explorer, which enables you to look out your window and navigate the Milky Way—moving through it at your own pace, zooming in, annotating what you see, and storing the experience for later. It replicates the environment of a planet on your room’s walls—dusty winds, hot plasma, and ice formations.
Now other developers are crafting tremendously useful new applications for HoloLens. Lowe’s home improvement stores, for example, are using HoloLens to allow their customers to stand in their own kitchens and bathrooms, and superimpose holograms of new cabinets, appliances, and accessories so they can see exactly what their remodel will look like.
The trajectory of this technology begins with simply tracking what the machine is seeing but someday will completely understand more complex tasks, which we’ll learn about as we get to artificial intelligence. Kinect gave a computer the ability to track your movements—to see you and make sense of your actions. That’s where AI, machine learning, and mixed reality are today. Technology can increasingly see, speak, and analyze, but it cannot yet feel. But mixed reality may help machines empathize with humans. Through these technologies, we will be able increasingly to experience what a refugee or a crime victim experiences, potentially enhancing our ability to make emotional connections across barriers that currently divide people from one another. In fact, I had a chance to meet several student developers from Australia who participated in our Imagine Cup competition. They built an MR application that helps certain caregivers learn to see the world through the eyes of someone with autism.
* * *
Artificial Intelligence has been portrayed in myriads of ways by Hollywood, which has practically made the technology its own subgenre. In 1973’s Westworld, Yul Brenner plays a robot—an AI-infused, tough-guy cowboy—who walks into a saloon itching for a gunfight. Year’s later, Disney had a different depiction. In its Big Hero 6, a pillowy giant robot named Baymax lovingly helps his fourteen-year-old owner get through a suspenseful journey. “He’ll change your world,” the film proclaims.
And that’s just it. AI will change our world. It will augment and assist humans, much more like Baymax than Brenner.
A confluence of three breakthroughs—Big Data, massive computing power, and sophisticated algorithms—is accelerating AI from sci-fi to reality. At astonishing rates, data is being gathered and made available thanks to the exponential growth of cameras and sensors in our everyday life. AI needs data to learn. The cloud has made tremendous computing power available to everyone, and complex algorithms can now be written to discern insights and intelligence from the mountains of data.
But far from Baymax or Brenner, AI today is some ways away from becoming what’s known as artificial general intelligence (AGI), the point at which a computer matches or even surpasses human intellectual capabilities. Like human intelligence, artificial intelligence can be categorized by layer. The bottom layer is simple pattern recognition. The middle layer is perception, sensing more and more complex scenes. It’s estimated that 99 percent of human perception is through speech and vision. Finally, the highest level of intelligence is cognition—deep understanding of human language.
These are the building blocks of AI, and for many years Microsoft has invested in advancing each of these tiers—statistical machine learning tools to make sense of data and recognize patterns; computers that can see, hear, and move, and even begin to learn and understand human language. Under the leadership of our chief speech scientist, Xuedong Huang, and his team, Microsoft set the accuracy record with a computer system that can transcribe the contents of a phone call more accurately than a human professional trained in transcription. On the computer vision and learning front, in late 2015 our AI group swept first prize across five challenges even though we only trained our system for one of those challenges. In the Common Objects in Context challenge, an AI system attempts to solve several visual recognition tasks. We trained our system to accomplish just the first one, simply to look at a photograph and label what it sees. Yet, through early forms of transfer learning, the neural network we built managed to learn and then accomplish the other tasks on its own. It not only could explain the photograph, but it was also able to draw a circle around every distinct object in the photograph and produce an English sentence that described the action it saw in the photo.
I believe that in ten years AI speech and visual recognition will be better than a human’s. But just because a machine can see and hear doesn’t mean it can truly learn and understand. Natural language understanding, the interaction between computers and humans, is the next frontier.
And so how will AI ever live up to its hype? How will AI scale to benefit everyone? Again, the answer is layered.
Bespoke. Today we are very much on the ground floor of AI. It is bespoke, customized. Tech companies with privileged access to data, computing power, and algorithms handcraft an AI product and make it available to the world. Only a few can make AI for the many. This is where most AI is today.
Democratized. The next level is democratization. As a platform company—one that has always built foundational technologies and tools upon which others can innovate—Microsoft’s approach is to put the tools for building AI in the hands of everyone. Democratizing AI means enabling every person and every organization to dream about and create amazing AI solutions that serve their specific needs. It’s analogous to the democratization that movable type and the printing press created. It’s estimated that in the 1450s there were only about thirty thousand books in Europe—each one handcrafted by someone working in a monastery. The Gutenberg Bible was the first book produced using movable type technology, and within fifty years the number of books grew to an estimated 12 million, unleashing a renaissance in learning, science, and the arts.
That’s the same trajectory we need for AI. To get there we have to be inclusive, democratic. And so our vision is to build tools that have true artificial intelligence infused across agents, applications, services, and infrastructure:
We’re harnessing artificial intelligence to fundamentally change how people interact with agents like Cortana, which will become more and more common in our lives.
Applications like Office 365 and Dynamics 365 will have AI baked-in so that they can help us focus on things that matter the most and get more out of every moment.
We’ll make the underlying intelligence capabilities of our own services—the pattern recognition, perception, and cognitive capabilities—available to every application developer in the world.
And, lastly, we’re building the world’s most powerful AI supercomputer and making that infrastructure available to anyone.
A range of industries are using these AI tools. McDonald’s is creating an AI system that can help its workers take your order in the drive-through line, making ordering food simpler, more efficient, and more accurate. Uber is using our cognitive services tools to prevent fraud and improve passenger safety by matching the driver’s
photograph to ensure the right driver is at the wheel. And Volvo is using our AI tools to help recognize when drivers are distracted to warn them and prevent accidents.
If you’re a business owner or manager, imagine if you had an AI system that could literally see your entire operation, understand what’s happening, and notify you about the things you care most about. Prism Skylabs has innovated on top of our cognitive services so that computers monitor video surveillance cameras and analyze what’s happening. If you have a construction company, the system will notify you when it sees the cement truck arrive at one of your work sites. For retailers, it can keep track of inventory or help you find a manager in one of your stores. One day, in a hospital setting, it might watch the surgeon and supporting staff to warn the team, before it’s too late, if it sees a medical error.
Learn to Learn. Ultimately, the state of the art is when computers learn to learn—when computers generate their own programs. Like humans, computers will go beyond mimicking what people do and will invent new, better solutions to problems. Deep neural networks and transfer learning are leading to breakthroughs today, but AI is like a ladder and we are just on the first step of that ladder. At the top of the ladder is artificial general intelligence and complete machine understanding of human language. It’s when a computer exhibits intelligence that is equal to or indistinguishable from a human.