Non-sticky-mode keys require being held down while using their mode, like the Alt or Control keys on personal computers. So, pressing only an S key while editing a document types the letter “s,” but pressing and holding Control while typing S (in Windows) saves the document.
Sometimes, the application you’re running changes what your keyboard’s keys do entirely. Many computer games use the A, S, D, and W keys for moving a character through a game environment, corresponding to left, back, right, and forward, respectively—until a “chat” mode is activated, in which case typing displays letters and numbers, as usual.
When users are aware of modes and motivated to learn them, the negative effects can be minimal. However, when users are not aware of the mode, or their “muscle memory” causes them to take an action for effect A because, in the moment, they forgot the system’s control was engaged for effect B, the results can range from annoying to disastrous. I was recently using a rental car that had a new, interactive dashboard system. It was a horrible mess of confused signifiers and functions, in which controls that are usually physical knobs or buttons were subsumed by multilevel, menu-driven interfaces. That’s not such a big deal if you’re doing something infrequent and complicated such as syncing your phone to the car’s Bluetooth system. But in this case, the car required me to interact with several layers of buttons just to switch between ventilation modes.
I’ve been driving for many years, so I’m used to controlling heat output with a quick twist of a knob or tap of an obvious button. Because auto temperature can change quickly on a trip, depending on the sun and other factors, it’s one of those things I prefer to have “ready-to-hand” (as discussed in Chapter 6). In this rental car, however, the controls were more “unready-to-hand.” It took at least four different interactions—all while staring at the system and not the road—to not only switch the screen into the Climate mode, but then select the mode of climate control I wished to use. And after all that, as Figure 13-12 shows, I then had to confirm my selection by pressing Done. Modes nested within modes within modes—something that could be truly dangerous while in motion.
Figure 13-12. Poking at a confusing modal interface (photo by author)
Design leaders have argued against using poorly implemented modes almost since consumer software was invented. Donald Norman was writing about them as early as 1981.[268] And interaction design pioneer Jef Raskin infamously railed against using any sort of mode-based inputs, because they almost always result in problems.[269] (He actually preferred the Control-key sort of mode that required a continuous action, calling them “quasimodes.”) Even Raskin’s son, designer Aza Raskin, has continued the mission against their misuse. He writes, “If a system contains modes, people will make mode errors; if we design systems that are not humane—responsive to human needs and considerate of human frailties, we can be guaranteed that people will make mistakes with sometimes cataclysmic consequences.”[270] For example, between 1988 and 1996, at least five fatal airplane crashes were directly attributable to mode errors; the pilots used systems in the cockpit in ways that were incorrect for the current mode setting.[271] The affordance of “pulling a switch” actually meant something entirely different depending on the setting of some other overall mode.
What modes do is change the fundamental meaning of action, and this is not something our perceptual systems intuitively know how to handle. In the natural world, a physical action has the same invariant effect, as long as you’re nested in the same environmental layout—that is, physically situated in the same spot, with the same objects and surfaces. When we learn that doing X in layout Y has effect Z, we don’t have to keep learning it. And our bodies evolved with that assumption nicely tucked away.
It took advanced technology to change the way the world worked. Wireless microphones, left in the “on” mode, can unwittingly broadcast private conversations. Opening your home’s door with the alarm enabled can prompt the police to show up at your doorstep. These are very simple mechanisms, though, compared to the complexly and sometimes incoherently nested rulesets that digital technology now adds to our environment.
The poster child for digital modes is the post-iPhone smartphone. Before the iPhone, even the smartest of smartphones was essentially a cell phone with a physical keyboard, such as the Palm Treo or the once-dominant RIM Blackberry. The iPhone fundamentally changed the nature of what a phone is. Now, a phone is a “phone” like Doctor Who’s Sonic Screwdriver is a “screwdriver”—the label is a quaint, vestigial nickname.
The iPhone turned the phone into an almost entirely modal device: a slab of glass that can be just about anything, depending on what application it is running. It can be a pedometer, a walkie-talkie, a synthesizer, a video-game console, a musical keyboard, and on and on. It transforms into anything that can be simulated on its surface. Apple’s more recent Watch product continues this mode-device tradition, using its “crown” as a mode-based control that has totally different functions depending on the currently active software.
And unlike a physical device—such as an analog wristwatch—digital software allows a small object like a smartphone or smartwatch to contain an overwhelming legion of modal rules, all nested many layers deep.
For example, most smartphones now have geolocation technology that can provide your latitude and longitude at any given time, based on GPS satellites and other network location information, including WiFi access points. The capability is always present, but it’s a mode that is only “on” when certain apps need to access it. Usually there is some visual clue that the mode is active, but it’s easy to overlook whether your phone is in “object that tracks my location” mode or not.
Imagine the surprise of someone like software magnate John McAfee who, while hiding in Belize from serious criminal charges, found his location was disclosed by people on the Internet. How? Because reporters he’d invited to his confidential location took his picture with a smartphone, with his permission, and posted it as part of their story on their publication’s website. Normally, that wouldn’t be a big deal, but someone downloaded the picture and discovered that it still held the geolocation metadata added to it by the phone’s camera application.[272] The folks who committed this gaffe were not techno-neophytes—they were net-savvy reporters and a guy who made his millions from designing and selling computer security software. It just didn’t occur to them that a hidden mode in a smartphone was adding obscure location data to what looked, to the eye, like a harmless photo with no location-specific information.
The photo metadata issue is only one of thousands of mode-based complexities in a garden-variety smartphone. Celebrities whose personal pictures are hacked and shared can attest to this problem, as well: most were likely not even aware their phones were in an “object that puts all my pictures on a distant computer” mode.
Modes are not going away. In spite of their challenges or the thought-leaders who complain about them, there’s an insatiable market demand for more capabilities, but a limited amount of space for visible settings controls. It’s likely that mode complexity will only increase. So, our systems will need even more complex and multilayered mode-control settings that will make our current situation seem primitive in just a few years. We comprehend context most easily at a human scale, not at vast microscopic or macroscopic levels that we can’t see or easily learn through exploration. This is another reason why digital agency is becoming more necessary: we need digital agents to help manage our digital agents.
* * *
[263] Wikimedia Commons http://bit.ly/1rpOXdA
[264] Photo by author.
[265] Photo by author.
[266] All photos and screenshots by author.
[267] Carr, Austin. “AOL May Have Invented Email’s Next UI Paradigm,” Fast Company (fastcodesign.com) October 18, 2012 (http://bit.ly/1yV8zPD).
[268] Norman, D. A. “Categorization of action slips.” Psychological review, 1981, 1(88):1–15.
[269] Raskin, Jef. Th
e Humane Interface: New Directions for Designing Interactive Systems. Boston: Addison-Wesley Professional, 2000.
[270] http://www.azarask.in/blog/post/is_visual_feedback_enough_why_modes_kill/
[271] Degani, Asaf et al. “MODE USAGE IN AUTOMATED COCKPITS: SOME INITIAL OBSERVATIONS.” Proceedings of International Federation of Automatic Control (IFAC). Boston, MA. June 27–29, 1995.
[272] Honan, Mat. “Oops! Did Vice Just Give Away John McAfee’s Location With Photo Metadata?” Wired magazine (wired.com) December 3, 2012 (http://wrd.cm/1s34PCT).
Chapter 14. Digital Environment
No metaphor is more misleading than “smart.”
—MARK WEISER
Variant Modes and Digital Places
THE RULE-DRIVEN MODES AND simulated affordances of interfaces are also the objects, events, and layouts that function as places, whether on a screen alone or in the ambient digital activity in our surroundings. So, changing the mode of objects can affect the mode of places as well, especially when objects work as parts of interdependent systems.
For example, let’s take a look at a software-based place: Google’s search site. Suppose that we want to research what qualities to look for in a new kitchen knife. When we run a query using Google’s regular mobile Web search, we see the sort of general results to which we’ve become accustomed: supposedly neutral results prioritized by Google’s famed algorithm, which prioritizes based on the general “rank” of authority, determined by analyzing links across the Web. Google also defines “quality” for sites in general. On the desktop Google site, there are paid ad-word links that are clearly labeled as sponsored results, but the main results are easy to distinguish from the paid ones.
But Google’s Shopping site is a different story: when searching for products there, all of the results are affected by paid promotion, as illustrated in Figure 14-1. I actually wasn’t aware of this until a colleague pointed it out to me at lunch one day.[273]
Figure 14-1. When in the Shopping tab, search results are driven by different rules, which you can see by clicking the “Why these products?” link
At least several factors are in play here:
Google’s long-standing web-search function has established cultural expectations in its user base: Google is providing the most relevant results based on the search terms entered. Although it’s arguable as to whether these results are effective—or if they’ve been corrupted by site owners’ search-engine manipulation—we still tend to equate Google with no-nonsense, just-the-facts search results, as a cultural invariant.
Google’s Shopping tab is listed right next to the Images, Maps, and “more” tabs—which, when expanded, show other Google services. The way these are displayed together implies that the rules behind how they give us results are equivalent. Objects that look the same are perceived as having similar properties. So, we aren’t expecting results for Images to be prioritized by advertising dollar, just as we don’t assume searching for a town in Maps will take us to a different-but-similar town because of someone’s marketing plan.
In the object layout of the search results view, Google Shopping provides no clear indication of this tab’s different mode. The “Near: Philadelphia, PA” label indicates location is at work in some way, but the only way to know about paid-priority results is to tap the vaguely named “Why these products?” link at the upper right. It’s nice of Google to provide this explanation, but users might not engage it because they assume they already know the answer, based on learned experience within Google’s other environments.
There’s nothing inherently wrong with making a shopping application function differently from a search application. But the invariant features of the environment need to make the difference more clear by using semantic function to better establish context.[274]
Foraging for Information
When we use these information environments, we’re not paying explicit attention to a lot of these factors. In fact, we’re generally feeling our way through with little conscious effort. So, the finer points of logical difference between one mode of an environment and another are easily lost on us. According to a number of related theories on information-seeking, humans look for information by using behavior patterns similar to those used by other terrestrial animals when foraging for food. Research has shown that people don’t formulate fully logical queries and then go about looking in a rationally efficient manner; instead, they tend to move in a direction that feels right based on various semantic or visual cues, wandering through the environment in a nonlinear, somewhat unconsciously driven path, “sniffing out” whatever seems related to whatever it is for which they’re searching.
We take action in digital-semantic environments using the same bodies that we evolved to use in physical environments. Instead of using a finger to poke a mango to see if it’s ripe, or cocking an ear to listen for the sound of water, we poke at the environment with words, either by tapping or clicking words and pictures, or giving our own words to the environment through search queries, to see what it says back to us.
Marcia Bates’ influential article from 1989, “The Design of Browsing and Berrypicking Techniques for the Online Search Interface,” argues that people look for information by using behaviors that are repurposed from early human evolution. Bates points out that the classic information retrieval model, which assumes a linear, logical approach to matching a query with the representation of a document, was becoming inadequate as technology was presenting users with information environments of even greater scale and complexity.[275]
Bates has gone on to further develop her theoretical framework by folding “berrypicking” into a more comprehensive approach (see Figure 14-2), such as her 2002 article, “Toward an Integrated Model of Information Seeking and Searching.” In that article, Bates argues that an “enormous part of all we know and learn...comes to us through passive undirected behavior.”[276] That is, most search activity is really tacit, nondeliberate environmental action, not unlike the way we find our way through a city. Bates also points out that people tend to arrange their physical and social surroundings in ways to help them find information, essentially extending their cognition into the structures of their environment.
Bates’ work often references another, related theoretical strand called information foraging theory. Introduced in a 1999 article by Stuart Card and Peter Pirolli, information foraging makes use of anthropological research on food-foraging strategies. It also borrows from theories developed in ecological psychology. Card and Pirolli propose several mathematical models for describing these behaviors, including information scent models. These models “address the identification of information value from proximal cues.”[277] Card, Pirolli, and others have continued developing information foraging theory and have been influential in information science and human-computer interaction fields.
Figure 14-2. Bates’ “berrypicking” model
I should point out that information foraging theory originated from a traditional cognitive-science perspective, assuming the brain works like a computer to sort out all the mathematics of how much energy might be conserved by using one path over another. However, even though Bates, Card, and Pirolli come from that tradition, it’s arguable their work ends up being more in line with embodiment than not. The essential issue these theories address is that environments made mostly of semantic information lack most of the physical cues our perceptual systems evolved within; so our perception does what it can with what information is available, still using the same old mechanisms our cognition relied on long before writing existed.
Inhabiting Two Worlds at Once
There are also on-screen capabilities that change the meaning of physical places, without doing anything physical to those places. Here’s a relatively harmless example: in the podcast manager and player app called Downcast, you can instruct the software to update podcast feeds based on various modality settings, including geolocation, as illustrated in Figure 14-3.
Figur
e 14-3. Refresh podcasts by geolocation in the Downcast podcast app
This feature allows Downcast to save on cellular data usage by updating only at preselected locations that have WiFi. It’s a wonderful convenience. But, we should keep in mind that it adds a layer of digital behavior that changes, if only slightly, the functional meaning of one physical context versus another. Even without a legion of intelligent objects, our smartphones make every place we inhabit potentially “smart.”
Another way digital has changed how we experience “place” is by replicating on the Web much of the semantic information we’d find in a physical store, and allowing us to shop online instead of in a building. These on-screen places for shopping are expected more and more to be integrated with physical store locations. For big-box retailers that want to grow business across all channels, this means interesting challenges for integrating those dimensions.
Large traditional retailers are in the midst of major transition. Most of them were already big corporations long before the advent of the Web, so they have deeply entrenched, legacy infrastructures, supply chains, and organizational silos. These structures evolved in an environment in which everything about the business was based on terrestrially bound, local stores. Since the rise of e-commerce, most have struggled with how to exist in both dimensions at once. For example, if a customer wants to order something to be delivered or picked up in the store, it means the system must know where that user is and what store is involved to provide accurate price and availability information, as illustrated in Figure 14-4.
Understanding Context Page 25