Algorithms to Live By
Page 37
LRU consistently performed the closest to clairvoyance: A couple of years later, Bélády also showed that FIFO has some curious additional drawbacks—in particular, rare cases where increasing the cache size can actually worsen performance, a phenomenon known as Bélády’s Anomaly. Bélády, Nelson, and Shedler, “An Anomaly in Space-Time Characteristics of Certain Programs Running in a Paging Machine.”
“the digital equivalent of shuffling papers”: Aza Raskin, “Solving the Alt-Tab Problem,” http://www.azarask.in/blog/post/solving-the-alt-tab-problem/.
The literature on eviction policies: If you’re interested in trying a more complex caching algorithm, some popular variants on LRU are the following:
• LRU-K: O’Neil, O’Neil, and Weikum, “The LRU-K Page Replacement Algorithm for Database Disk Buffering,” which looks at the time elapsed since the K-th most recent use (which is maximal for items in the cache that have not been used K times). This introduces a frequency bias. LRU-2, which focuses on the penultimate use, is most common.
• 2Q: Johnson and Shasha, “2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm,” which organizes items into two separate “queues” to capture a little bit of frequency information. Items start in the first queue, and are promoted to the second queue if they are referred to again while they are in the cache. Items are evicted from this second queue back into the first queue using LRU, which is also used to evict items from the first queue.
• LRFU: Lee et al., “LRFU: A Spectrum of Policies That Subsumes the Least Recently Used and Least Frequently Used Policies,” which combines recency and frequency by assigning a numerical score to each item that is incremented when the item is used but decreases gradually over time.
• The Adaptive Replacement Cache (ARC): Megiddo and Modha, “Outperforming LRU with an Adaptive Replacement Cache Algorithm,” which uses two queues in a similar fashion to 2Q but adapts the length of the queues based on performance.
All of these algorithms have been shown to outperform LRU in tests of cache-management performance.
overwhelming favorite of computer scientists: For instance, Pavel Panchekha wrote an article in 2012 for the Dropbox blog where he lays out Dropbox’s reasoning for using LRU, at https://tech.dropbox.com/2012/10/caching-in-theory-and-practice/.
Deep within the underground Gardner Stacks: For those curious to know exactly what UC Berkeley students had been reading when we visited: Thoreau’s Walden; critical texts on Song of Myself, Cormac McCarthy, James Merrill, Thomas Pynchon, Elizabeth Bishop, J. D. Salinger, Anaïs Nin, and Susan Sontag; Drown by Junot Díaz; Telegraph Avenue and The Yiddish Policemen’s Union by Michael Chabon; Bad Dirt and Bird Cloud by Annie Proulx; Mr. and Mrs. Baby by Mark Strand; The Man in the High Castle by Philip K. Dick; the collected poetry and prose of William Carlos Williams; Snuff by Chuck Palahniuk; Sula by Toni Morrison; Tree of Smoke by Denis Johnson; The Connection of Everyone with Lungs by Juliana Spahr; The Dream of the Unified Field by Jorie Graham; Naked, Me Talk Pretty One Day, and Dress Your Family in Corduroy and Denim by David Sedaris; Ariel by Sylvia Plath and Oleanna by David Mamet; D. T. Max’s biography of David Foster Wallace; Like Something Flying Backwards, Translations of the Gospel Back into Tongues, and Deepstep Come Shining by C. D. Wright; the prose of T. S. Eliot; Eureka by Edgar Allan Poe; Billy Budd, Sailor and a collection of short works in poetry and prose by Herman Melville; The Aspern Papers, The Portrait of a Lady, and The Turn of the Screw by Henry James; Harold Bloom on Billy Budd, Benito Cereno, and “Bartleby the Scrivener”; the plays of Eugene O’Neill; Stardust by Neil Gaiman; Reservation Blues by Sherman Alexie; No Country for Old Men by Cormac McCarthy; and more.
“twelve years, that’s the cutoff”: Elizabeth Dupuis, personal interview, September 16, 2014.
“on the scale of a mile to the mile!”: Carroll, Sylvie and Bruno Concluded.
A quarter of all Internet traffic: Stephen Ludin, “Akamai: Why a Quarter of the Internet Is Faster and More Secure than the Rest,” lecture, March 19, 2014, International Computer Science Institute, Berkeley, California. As Akamai claims on their own site, “Akamai delivers between 15–30% of all Web traffic” (http://www.akamai.com/html/about/facts_figures.html).
“distance matters”: Ludin, “Akamai.”
eschew any type of human-comprehensible organization: Amazon’s “chaotic storage” system is described here: http://www.ssi-schaefer.de/blog/en/order-picking/chaotic-storage-amazon/.
Amazon was granted a patent: The patent on preshipping commonly requested items is US Patent No. 8,615,473, granted December 24, 2013, “Method and system for anticipatory package shipping” by Joel R. Spiegel, Michael T. McKenna, Girish S. Lakshman, and Paul G. Nordstrom, on behalf of Amazon Technologies Inc.
which the press seized upon: See, e.g., Connor Simpson, “Amazon Will Sell You Things Before You Know You Want to Buy Them,” The Wire, January 20, 2014, http://www.thewire.com/technology/2014/01/amazon-thinks-it-can-predict-your-future/357188/; Chris Matyszczyk, “Amazon to Ship Things Before You’ve Even Thought of Buying Them?,” CNET, January 19, 2014, http://www.cnet.com/news/amazon-to-ship-things-before-youve-even-thought-of-buying-them/.
each state’s “Local Favorites” from Netflix: Micah Mertes, “The United States of Netflix Local Favorites,” July 10, 2011, http://www.slacktory.com/2011/07/united-states-netflix-local-favorites/.
the enormous files that comprise full-length HD video: In 2012, Netflix announced that it was tired of paying firms like Akamai and had started building its own global CDN. See Eric Savitz, “Netflix Shifts Traffic to Its Own CDN,” Forbes, June 5, 2012, http://www.forbes.com/sites/ericsavitz/2012/06/05/netflix-shifts-traffic-to-its-own-cdn-akamai-limelight-shrs-hit/. More information about Netflix’s Open Connect CDN can be found at https://www.netflix.com/openconnect.
“Caching is such an obvious thing”: John Hennessy, personal interview, January 9, 2013.
“a crate on the floor of my front coat closet”: Morgenstern, Organizing from the Inside Out.
“extra vacuum cleaner bags behind the couch”: Jones, Keeping Found Things Found.
search engines from a cognitive perspective: See Belew, Finding Out About.
recommended the use of a valet stand: Rik Belew, personal interview, October 31, 2013.
“a very fundamental principle in my method”: Yukio Noguchi, personal interview, December 17, 2013.
the “super” filing system was born: Noguchi’s filing system is described in his book Super Organized Method, and was initially presented in English by the translator William Lise. The blog article describing the system is no longer available on Lise’s site, but it can still be visited via the Internet Archive at https://web.archive.org/web/20031223072329/http://www.lise.jp/honyaku/noguchi.html. Further information comes from Yukio Noguchi, personal interview, December 17, 2013.
The definitive paper on self-organizing lists: Sleator and Tarjan, “Amortized Efficiency of List Update and Paging Rules,” which also provided the clearest results on the theoretical properties of the LRU principle.
“God’s algorithm if you will”: Robert Tarjan, personal interview, December 17, 2013.
if you follow the LRU principle: This application of the LRU principle to self-organizing lists is known as the Move-to-Front algorithm.
not merely efficient. It’s actually optimal: This doesn’t mean you must entirely give up on categorization. If you want to make things a bit more gaudy and speed up the search process, Noguchi suggests putting colored tabs on files that fall into different categories. That way if you know you’re looking for, say, accounts, you can restrict your linear search to just those items. And they will still be sorted according to the Move-to-Front Rule within each category.
the information retrieval systems of university libraries: Anderson’s findings on human memory are published in Anderson and Milson, “Human Memory,” and in the book The Adaptive Character of Thought. This book has been influential for laying out a strategy for ana
lyzing everyday cognition in terms of ideal solutions, used by Tom and many others in their research. Anderson and Milson, “Human Memory,” in turn, draws from a statistical study of library borrowing that appears in Burrell, “A Simple Stochastic Model for Library Loans.”
the missing piece in the study of the mind: Anderson’s initial exploration of connections between information retrieval by computers and the organization of human memory was conducted in an era when most people had never interacted with an information retrieval system, and the systems in use were quite primitive. As search engine research has pushed the boundaries of what information retrieval systems can do, it’s created new opportunities for discovering parallels between minds and machines. For example, Tom and his colleagues have shown how ideas behind Google’s PageRank algorithm are relevant to understanding human semantic memory. See Griffiths, Steyvers, and Firl, “Google and the Mind.”
“I saw that framework laid out before me”: Anderson, The Adaptive Character of Thought.
analyzed three human environments: The analysis of the environment of human memory is presented in Anderson and Schooler, “Reflections of the Environment in Memory.”
reality itself has a statistical structure: “Human memory mirrors, with a remarkable degree of fidelity, the structure that exists in the environment.” Ibid.
“fail to appreciate the task before human memory”: Ibid.
“A big book is a big nuisance”: The quotation in Greek is “μέγα βιβλίον μέγα κακόν” (mega biblion, mega kakon), which has also been translated as “Big book, big evil.” The original reference is intended as a disparagement of epic poetry, but presumably being a scholar at a time when books were in the form of scrolls dozens of feet long meant that big books were a nuisance in more ways than aesthetic. There’s a reason why the practice of citation and quotation didn’t properly begin until books came in codices with numbered pages. For an excellent recounting of this history, see Boorstin, The Discoverers.
“If you make a city bigger”: John Hennessy, personal interview, January 9, 2014.
an unavoidable consequence of the amount of information: Ramscar et al., “The Myth of Cognitive Decline.”
“minds are natural information processing devices:” Michael Ramscar, “Provider Exclusive: Michael Ramscar on the ‘Myth’ of Cognitive Decline,” interview with Bill Myers, February 19, 2014. http://www.providermagazine.com/news/Pages/0214/Provider-Exclusive-Michael-Ramscar-On-The-Myth-Of-Cognitive-Decline.aspx.
5. SCHEDULING
“How we spend our days”: Dillard, The Writing Life.
“Book-writing, like war-making”: Lawler, “Old Stories.”
“We are what we repeatedly do”: In fact, this phrase, frequently attributed to Aristotle himself, originated with scholar Will Durant, as a summary (in Durant’s words) of Aristotle’s thinking. See Durant, The Story of Philosophy.
any task of two minutes or less: Allen, Getting Things Done.
beginning with the most difficult task: Tracy, Eat That Frog! The book ascribes its titular quotation—“Eat a live frog first thing in the morning and nothing worse will happen to you the rest of the day”—to Mark Twain, although this attribution may be apocryphal. The Quote Investigator website cites eighteenth-century French writer Nicolas Chamfort as the more likely source. See http://quoteinvestigator.com/2013/04/03/eat-frog/ for more.
first scheduling one’s social engagements: Fiore, The Now Habit.
“the eternal hanging on of an uncompleted task”: William James, in a letter to Carl Stumpf, January 1, 1886.
deliberately not doing things right away: Partnoy, Wait.
developed the Gantt charts: The role of Taylor and Gantt in the history of scheduling is summarized in Herrmann, “The Perspectives of Taylor, Gantt, and Johnson.” Additional biographical details on Taylor are from Kanigel, The One Best Way.
firms like Amazon, IKEA, and SpaceX: Gantt chart software company LiquidPlanner boasts Amazon, IKEA, and SpaceX among its clients at the (counterintuitive) URL http://www.liquidplanner.com/death-to-gantt-charts/.
first hint that this problem even could be solved: Johnson’s seminal result (on what is now called “flowshop” scheduling, where jobs flow from one machine to another) appears in “Optimal Two- and Three-Stage Production Schedules with Setup Times Included.”
start with the task due soonest: Earliest Due Date (EDD), also known as Jackson’s Rule, was derived in Jackson, Scheduling a Production Line to Minimize Maximum Tardiness. James R. Jackson grew up in Los Angeles in the 1930s, and through his work with UCLA’s Logistics Research Project spent time visiting machine shops run by various aerospace companies in the area. His thinking about how jobs moved from one machine to another ultimately led him to develop a mathematics for analyzing “network flows”—work that would later be used in the design of algorithms for routing the flow of traffic on the Internet. A brief biography appears in Production and Operations Management Society, “James R. Jackson.”
Moore’s Algorithm: Presented in Moore, “An N Job, One Machine Sequencing Algorithm for Minimizing the Number of Late Jobs.” In the paper, Moore acknowledged a simplification and optimization that had been suggested to him by Thom J. Hodgson. Today the terms “Moore’s Algorithm,” “Hodgson’s Algorithm,” and the “Moore–Hodgson Algorithm” are sometimes used interchangeably.
do the quickest task you can: Shortest Processing Time (SPT), or Smith’s Rule, was shown to minimize the sum of completion times in Smith, “Various Optimizers for Single-Stage Production.”
shows up in studies of animal foraging: Stephens and Krebs, Foraging Theory.
known as the “debt snowball”: In the popular sphere, author and speaker Dave Ramsey is perhaps the best-known popularizer and advocate of the “debt snowball” strategy, and has garnered many supporters and detractors alike. On the academic side, a 2012 paper by business school researchers at Northwestern, Gal and McShane, “Can Small Victories Help Win the War?” and a 2014 paper by economists at Texas A&M Brown and Lahey, Small Victories, for instance, have looked at the impact of “small victories” in helping people get out of consumer debt.
an obsessive-compulsive vampire: This episode is Season 5, Episode 12, “Bad Blood,” which originally aired February 22, 1998.
“a tendency to pre-crastinate”: Rosenbaum, Gong, and Potts, “Pre-Crastination.”
Reeves would blame the bug on “deadline pressures”: This comes from an email dated December 15, 1997, from Glenn Reeves to his colleagues, subject line “What really happened on Mars?,” available online at http://research.microsoft.com/en-us/um/people/mbj/Mars_Pathfinder/Authoritative_Account.html.
“If you’re flammable and have legs”: Hedberg’s tale can be found on his 1999 comedy album Strategic Grill Locations.
“Things which matter most”: The first appearance of this quotation in English seems to be in Covey, How to Succeed with People, where it’s attributed to Goethe without citation.
“That’s how I get things done every day”: Laura Albert McLay, personal interview, September 16, 2014.
“Gene was postponing something”: Jan Karel Lenstra, personal interview, September 2, 2014; and personal correspondence.
Lawler took an intriguingly circuitous route: Lawler’s biography is drawn from Lawler, “Old Stories,” and Lenstra, “The Mystical Power of Twoness.”
“the social conscience” of the computer science department: Richard Karp, “A Personal View of Computer Science at Berkeley,” EECS Department, University of California, Berkeley, http://www.eecs.berkeley.edu/BEARS/CS_Anniversary/karp-talk.html.
an award in Lawler’s name: See http://awards.acm.org/lawler/.
build the schedule back to front: Lawler’s analysis of precedence constraints for the maximum lateness problem is in Lawler, “Optimal Sequencing of a Single Machine Subject to Precedence Constraints.”
it’s what the field calls “intractable”: This analysis is in Lawler, “Sequencing Jobs t
o Minimize Total Weighted Completion Time Subject to Precedence Constraints.” More precisely, the problem is “NP-hard,” meaning it has no known efficient solution, and might never have one.
a quest to map the entire landscape of scheduling theory: The quest emerged one afternoon in 1975, as Lawler, Lenstra, and their colleagues Richard Karp and Ben Lageweg sat around talking scheduling theory in the Mathematisch Centrum in Amsterdam. Perhaps it was the “pungent odors of malt and hops” in the air from the Amstel brewery next door, but something inspired the group to decide that a book containing a list of all scheduling problems and whether they had been solved would make a nice gift for their friend and colleague Alexander Rinnooy Kan, who was about to defend his thesis. (This story appears in Lawler, “Old Stories,” and Lenstra, “The Mystical Power of Twoness.”) Rinnooy Kan would go on to make important contributions not just to academia but also to the Dutch economy, sitting on the board of directors at ING and being named by the newspaper De Volkskrant as the most influential person in the Netherlands—three years in a row. See “Rinnooy Kan weer invloedrijkste Nederlander,” De Volkskrant, December 4, 2009, http://nos.nl/artikel/112743-rinnooy-kan-weer-invloedrijkste-nederlander.html.
Lageweg wrote a computer program that generated the list, enumerating some 4,536 different permutations of the scheduling problem: every possible combination of metrics (maximum lateness, number of late jobs, sum of completion times, etc.) and constraints (weights, precedence, start times, and so on) that they could think of. Over a series of enthralling days, the group “had the pleasure of knocking off one obscure problem type after another in rapid succession.”
Their organizational schema for describing the zoo of scheduling problems was a language “laced with shorthand,” which they called “Schedulese” (Graham et al., “Optimization and Approximation in Deterministic Sequencing”). The basic idea is that scheduling problems are described by three variables: the nature of the machines involved, the nature of the jobs, and the goal of scheduling. These three variables are specified in that order, with standard codes describing factors such as precedence constraints, preemption, release times, and the goal. For example, 1|rj|∑Cj (pronounced “one-arejay-sum-ceejay”) represents a single machine, release times, and the goal of minimizing the sum of completion times. As Eugene Lawler recounts: