Algorithms to Live By
Page 13
Each day, Ebbinghaus would sit down and memorize a list of nonsense syllables. Then he would test himself on lists from previous days. Pursuing this habit over the course of a year, he established many of the most basic results in human memory research. He confirmed, for instance, that practicing a list multiple times makes it persist longer in memory, and that the number of items one can accurately recall goes down as time passes. His results mapped out a graph of how memory fades over time, known today by psychologists as “the forgetting curve.”
Ebbinghaus’s results established the credibility of a quantitative science of human memory, but they left open something of a mystery. Why this particular curve? Does it suggest that human memory is good or bad? What’s the underlying story here? These questions have stimulated psychologists’ speculation and research for more than a hundred years.
In 1987, Carnegie Mellon psychologist and computer scientist John Anderson found himself reading about the information retrieval systems of university libraries. Anderson’s goal—or so he thought—was to write about how the design of those systems could be informed by the study of human memory. Instead, the opposite happened: he realized that information science could provide the missing piece in the study of the mind.
“For a long time,” says Anderson, “I had felt that there was something missing in the existing theories of human memory, including my own. Basically, all of these theories characterize memory as an arbitrary and non-optimal configuration.… I had long felt that the basic memory processes were quite adaptive and perhaps even optimal; however, I had never been able to see a framework in which to make this point. In the computer science work on information retrieval, I saw that framework laid out before me.”
A natural way to think about forgetting is that our minds simply run out of space. The key idea behind Anderson’s new account of human memory is that the problem might be not one of storage, but of organization. According to his theory, the mind has essentially infinite capacity for memories, but we have only a finite amount of time in which to search for them. Anderson made the analogy to a library with a single, arbitrarily long shelf—the Noguchi Filing System at Library of Congress scale. You can fit as many items as you want on that shelf, but the closer something is to the front the faster it will be to find.
The key to a good human memory then becomes the same as the key to a good computer cache: predicting which items are most likely to be wanted in the future.
Barring clairvoyance, the best approach to making such predictions in the human world requires understanding the world itself. With his collaborator Lael Schooler, Anderson set out to perform Ebbinghaus-like studies not on human minds, but on human society. The question was straightforward: what patterns characterize the way the world itself “forgets”—the way that events and references fade over time? Anderson and Schooler analyzed three human environments: headlines from the New York Times, recordings of parents talking to their children, and Anderson’s own email inbox. In all domains, they found that a word is most likely to appear again right after it had just been used, and that the likelihood of seeing it again falls off as time goes on.
In other words, reality itself has a statistical structure that mimics the Ebbinghaus curve.
This suggests something remarkable. If the pattern by which things fade from our minds is the very pattern by which things fade from use around us, then there may be a very good explanation indeed for the Ebbinghaus forgetting curve—namely, that it’s a perfect tuning of the brain to the world, making available precisely the things most likely to be needed.
Human memory and human environments. The left panel shows the percentage of nonsense syllables Ebbinghaus correctly recalled from a list, as a function of the number of hours he waited after first memorizing the list. The right panel shows the chance that a word appears in the headlines of the New York Times on a given day, as a function of the time since its previous appearance in print.
In putting the emphasis on time, caching shows us that memory involves unavoidable tradeoffs, and a certain zero-sumness. You can’t have every library book at your desk, every product on display at the front of the store, every headline above the fold, every paper at the top of the pile. And in the same way, you can’t have every fact or face or name at the front of your mind.
“Many people hold the bias that human memory is anything but optimal,” wrote Anderson and Schooler. “They point to the many frustrating failures of memory. However, these criticisms fail to appreciate the task before human memory, which is to try to manage a huge stockpile of memories. In any system responsible for managing a vast data base there must be failures of retrieval. It is just too expensive to maintain access to an unbounded number of items.”
This understanding has in turn led to a second revelation about human memory. If these tradeoffs really are unavoidable, and the brain appears to be optimally tuned to the world around it, then what we refer to as the inevitable “cognitive decline” that comes with age may in fact be something else.
The Tyranny of Experience
A big book is a big nuisance.
—CALLIMACHUS (305–410 BC), LIBRARIAN AT ALEXANDRIA
Why don’t they make the whole plane out of that black box stuff?
—STEVEN WRIGHT
The need for a computer memory hierarchy, in the form of a cascade of caches, is in large part the result of our inability to afford making the entire memory out of the most expensive type of hardware. The fastest cache on current computers, for instance, is made with what’s called SRAM, which costs roughly a thousand times as much per byte as the flash memory in solid-state drives. But the true motivation for caching goes deeper than that. In fact, even if we could get a bespoke machine that used exclusively the fastest form of memory possible, we’d still need caches.
As John Hennessy explains, size alone is enough to impair speed:
When you make something bigger, it’s inherently slower, right? If you make a city bigger, it takes longer to get from point A to point B. If you make a library bigger, it takes longer to find a book in the library. If you have a stack of papers on your desk that’s bigger, it takes longer to find the paper you’re looking for, right? Caches are actually a solution to that problem.… For example, right now, if you go to buy a processor, what you’ll get is a Level 1 cache and a Level 2 cache on the chip. The reason that there are—even just on the chip there are two caches!—is that in order to keep up with the cycle rate of the processor, the first-level cache is limited in size.
Unavoidably, the larger a memory is, the more time it takes to search for and extract a piece of information from it.
Brian and Tom, in their thirties, already find themselves more frequently stalling a conversation as, for instance, they wait for the name of someone “on the tip of the tongue” to come to mind. Then again, Brian at age ten had two dozen schoolmates; twenty years later he has hundreds of contacts in his phone and thousands on Facebook, and has lived in four cities, each with its own community of friends, acquaintances, and colleagues. Tom, by this point in his academic career, has worked with hundreds of collaborators and taught thousands of students. (In fact, this very book involved meeting with about a hundred people and citing a thousand.) Such effects are by no means limited to social connections, of course: a typical two-year-old knows two hundred words; a typical adult knows thirty thousand. And when it comes to episodic memory, well, every year adds a third of a million waking minutes to one’s total lived experience.
Considered this way, it’s a wonder that the two of us—or anyone—can mentally keep up at all. What’s surprising is not memory’s slowdown, but the fact that the mind can possibly stay afloat and responsive as so much data accumulates.
If the fundamental challenge of memory really is one of organization rather than storage, perhaps it should change how we think about the impact of aging on our mental abilities. Recent work by a team of psychologists and linguists led by Michael Ramscar at the University of T
übingen has suggested that what we call “cognitive decline”—lags and retrieval errors—may not be about the search process slowing or deteriorating, but (at least partly) an unavoidable consequence of the amount of information we have to navigate getting bigger and bigger. Regardless of whatever other challenges aging brings, older brains—which must manage a greater store of memories—are literally solving harder computational problems with every passing day. The old can mock the young for their speed: “It’s because you don’t know anything yet!”
Ramscar’s group demonstrated the impact of extra information on human memory by focusing on the case of language. Through a series of simulations, the researchers showed that simply knowing more makes things harder when it comes to recognizing words, names, and even letters. No matter how good your organization scheme is, having to search through more things will inevitably take longer. It’s not that we’re forgetting; it’s that we’re remembering. We’re becoming archives.
An understanding of the unavoidable computational demands of memory, Ramscar says, should help people come to terms with the effects of aging on cognition. “I think the most important tangible thing seniors can do is to try to get a handle on the idea that their minds are natural information processing devices,” he writes. “Some things that might seem frustrating as we grow older (like remembering names!) are a function of the amount of stuff we have to sift through … and are not necessarily a sign of a failing mind.” As he puts it, “A lot of what is currently called decline is simply learning.”
Caching gives us the language to understand what’s happening. We say “brain fart” when we should really say “cache miss.” The disproportionate occasional lags in information retrieval are a reminder of just how much we benefit the rest of the time by having what we need at the front of our minds.
So as you age, and begin to experience these sporadic latencies, take heart: the length of a delay is partly an indicator of the extent of your experience. The effort of retrieval is a testament to how much you know. And the rarity of those lags is a testament to how well you’ve arranged it: keeping the most important things closest to hand.
*For unknown reasons, My Own Private Idaho is best loved in Maine.
*You can force your computer to show your electronic documents in a pile, as well. Computers’ default file-browsing interface makes you click through folders in alphabetical order—but the power of LRU suggests that you should override this, and display your files by “Last Opened” rather than “Name.” What you’re looking for will almost always be at or near the top.
5 Scheduling
First Things First
How we spend our days is, of course, how we spend our lives.
—ANNIE DILLARD
“Why don’t we write a book on scheduling theory?” I asked.… “It shouldn’t take much time!” Book-writing, like war-making, often entails grave miscalculations. Fifteen years later, Scheduling is still unfinished.
—EUGENE LAWLER
It’s Monday morning, and you have an as-yet blank schedule and a long list of tasks to complete. Some can be started only after others are finished (you can’t load the dishwasher unless it’s unloaded first), and some can be started only after a certain time (the neighbors will complain if you put the trash out on the curb before Tuesday night). Some have sharp deadlines, others can be done whenever, and many are fuzzily in between. Some are urgent, but not important. Some are important, but not urgent. “We are what we repeatedly do,” you seem to recall Aristotle saying—whether it’s mop the floor, spend more time with family, file taxes on time, learn French.
So what to do, and when, and in what order? Your life is waiting.
Though we always manage to find some way to order the things we do in our days, as a rule we don’t consider ourselves particularly good at it—hence the perennial bestseller status of time-management guides. Unfortunately, the guidance we find in them is frequently divergent and inconsistent. Getting Things Done advocates a policy of immediately doing any task of two minutes or less as soon as it comes to mind. Rival bestseller Eat That Frog! advises beginning with the most difficult task and moving toward easier and easier things. The Now Habit suggests first scheduling one’s social engagements and leisure time and then filling the gaps with work—rather than the other way around, as we so often do. William James, the “father of American psychology,” asserts that “there’s nothing so fatiguing as the eternal hanging on of an uncompleted task,” but Frank Partnoy, in Wait, makes the case for deliberately not doing things right away.
Every guru has a different system, and it’s hard to know who to listen to.
Spending Time Becomes a Science
Though time management seems a problem as old as time itself, the science of scheduling began in the machine shops of the industrial revolution. In 1874, Frederick Taylor, the son of a wealthy lawyer, turned down his acceptance at Harvard to become an apprentice machinist at Enterprise Hydraulic Works in Philadelphia. Four years later, he completed his apprenticeship and began working at the Midvale Steel Works, where he rose through the ranks from lathe operator to machine shop foreman and ultimately to chief engineer. In the process, he came to believe that the time of the machines (and people) he oversaw was not being used very well, leading him to develop a discipline he called “Scientific Management.”
Taylor created a planning office, at the heart of which was a bulletin board displaying the shop’s schedule for all to see. The board depicted every machine in the shop, showing the task currently being carried out by that machine and all the tasks waiting for it. This practice would be built upon by Taylor’s colleague Henry Gantt, who in the 1910s developed the Gantt charts that would help organize many of the twentieth century’s most ambitious construction projects, from the Hoover Dam to the Interstate Highway System. A century later, Gantt charts still adorn the walls and screens of project managers at firms like Amazon, IKEA, and SpaceX.
Taylor and Gantt made scheduling an object of study, and they gave it visual and conceptual form. But they didn’t solve the fundamental problem of determining which schedules were best. The first hint that this problem even could be solved wouldn’t appear until several decades later, in a 1954 paper published by RAND Corporation mathematician Selmer Johnson.
The scenario Johnson examined was bookbinding, where each book needs to be printed on one machine and then bound on another. But the most common instance of this two-machine setup is much closer to home: the laundry. When you wash your clothes, they have to pass through the washer and the dryer in sequence, and different loads will take different amounts of time in each. A heavily soiled load might take longer to wash but the usual time to dry; a large load might take longer to dry but the usual time to wash. So, Johnson asked, if you have several loads of laundry to do on the same day, what’s the best way to do them?
His answer was that you should begin by finding the single step that takes the least amount of time—the load that will wash or dry the quickest. If that shortest step involves the washer, plan to do that load first. If it involves the dryer, plan to do it last. Repeat this process for the remaining loads, working from the two ends of the schedule toward the middle.
Intuitively, Johnson’s algorithm works because regardless of how you sequence the loads, there’s going to be some time at the start when the washer is running but not the dryer, and some time at the end when the dryer is running but not the washer. By having the shortest washing times at the start, and the shortest drying times at the end, you maximize the amount of overlap—when the washer and dryer are running simultaneously. Thus you can keep the total amount of time spent doing laundry to the absolute minimum. Johnson’s analysis had yielded scheduling’s first optimal algorithm: start with the lightest wash, end with the smallest hamper.
Beyond its immediate applications, Johnson’s paper revealed two deeper points: first, that scheduling could be expressed algorithmically, and second, that optimal scheduling solutions existed. Thi
s kicked off what has become a sprawling literature, exploring strategies for a vast menagerie of hypothetical factories with every conceivable number and kind of machines.
We’re going to focus on a tiny subset of this literature: the part that, unlike bookbinding or laundry, deals with scheduling for a single machine. Because the scheduling problem that matters the most involves just one machine: ourselves.
Handling Deadlines
With single-machine scheduling, we run into something of a problem right off the bat. Johnson’s work on bookbinding was based on minimizing the total time required for the two machines to complete all of their jobs. In the case of single-machine scheduling, however, if we are going to do all the tasks assigned, then all schedules will take equally long to complete; the order is irrelevant.
This is a sufficiently fundamental and counterintuitive point that it’s worth repeating. If you have only a single machine, and you’re going to do all of your tasks, then any ordering of the tasks will take you the same amount of time.
Thus we encounter the first lesson in single-machine scheduling literally before we even begin: make your goals explicit. We can’t declare some schedule a winner until we know how we’re keeping score. This is something of a theme in computer science: before you can have a plan, you must first choose a metric. And as it turns out, which metric we pick here will directly affect which scheduling approaches fare best.
The first papers on single-machine scheduling followed quickly on the heels of Johnson’s bookbinding work and offered several plausible metrics to consider. For each metric, they discovered a simple, optimal strategy.
It is of course common, for instance, for tasks to have a due date, with the lateness of a task being how far it has gone overdue. So we can think of the “maximum lateness” of a set of tasks as the lateness of whatever task has gone furthest past its due date—the kind of thing your employer might care about in a performance review. (Or what your customers might care about in a retail or service setting, where the “maximally late” task corresponds to the customer subjected to the longest wait time.)