Finding the shortest path of rotating a crystal, the most economical way of positioning a telescope, the shortest wiring route between transistors—these are all different flavors of the same mathematical problem.4
But even more important than these applications of the routing problem is that the problem harbors profound lessons about the very nature of hard problems everywhere—the kind that require the most creative solutions. It is another stepping-stone to understanding creativity in general, because solving it requires the same tricks that nature has used to create bucky-balls and buttercups. They are the tricks needed to traverse landscapes vast beyond comprehension.
Take the diagram in Figure 6.1, which stands for a route that a delivery truck takes from a depot to visit each one of ten customers. The customers are arbitrarily labeled with the numbers one through ten. Underneath the diagram, a string of numbers stands for the order in which each customer is visited. This string encodes the route itself.
Figure 6.1.
There are more than three million possible different routes to a mere ten customers, and each route can be written as one string like this. Each of these routes has a cost, expressed in distance traveled, time expended, fuel combusted, or—for the environmentally conscious—carbon dioxide exhaled by the engine.5 Just like every conceivable DNA string of a gene encodes a solution—good or bad—to one of life’s many problems, every sequence of customers to be visited encodes a solution to the routing problem. Most of these solutions are poor, excessively long routes, but a few of them are good, short, efficient routes. And just like a DNA sequence—a genotype—specifies a location in an adaptive landscape, each route corresponds to a place in yet another landscape, that of all solutions to the routing problem. The elevation in that location corresponds to the length of the route. It is the quality of the solution, the analogue to an organism’s fitness.
This cost landscape is yet another abstract and high-dimensional landscape, not the three-dimensional landscape in which trucks deliver goods. The landscape’s profile of undulating dunes and cragged mountains, shallow dips and deep ravines, is the cost profile of different routes. This cost profile differs from evolution’s landscapes in the same way as the energy landscapes of atoms and molecules do: its peaks are the worst places to be because that’s where the longest routes dwell.
Any algorithm to route vehicles or salespeople is a prescription to scour the cost landscape for the deepest valleys—those corresponding to the best and shortest routes. The algorithm is a bit like a molecule seeking an especially stable configuration of atoms, a deep valley in its energy landscape. It explores the cost landscape of the routing problem, just like vibrating molecules explore their energy landscape, and just like evolving organisms explore their adaptive landscape.
In evolution’s landscapes, the smallest steps correspond to point mutations, those DNA mutations that cause single-letter changes. Unfortunately, these same kinds of steps don’t work in the cost landscape of the routing problem. Imagine you replace a single customer, say customer five, in the sequence underneath Figure 6.1a, with an arbitrary other customer, say customer ten. That would be the closest analogue to how point mutations change DNA. But it creates two problems. First, because you have eliminated customer five, the resulting route would not visit this customer at all. Second, the route would visit customer ten twice. In other words, mimicking point mutation violates two basic rules for vehicle routing: all customers must be visited, and each customer can be visited only once. Neighboring routes cannot just differ in a single customer. We need a different kind of “mutation.”
What we need is to permute or swap the order in which two customers are visited, as Figure 6.1b illustrates for customers five and six. Swapping ensures that each customer is visited exactly once. More than that, any route connecting any number of customers can be created with the right sequence of such swaps. In other words, starting anywhere on this landscape, you can travel anywhere else by customer swaps, which are the smallest possible steps on this landscape.6
Such travel ultimately has one goal: to find the deepest valley. Computer scientists have invented many algorithms—some better, some worse—to meet this goal. Let’s look at an especially simple one.
Choose an arbitrary route—no matter how long and inefficient—and change it through a random swap of two customers. If the swap reduces the length of the route, keep it. If it does not, go back to the starting point. Then repeat. Swap another two randomly chosen customers, keeping the new route only if it is shorter. Keep repeating. Each successful swap changes your place in the solution landscape. Step by step—swap by successful swap—you will eventually reach a route that cannot be shortened further by a swap.
This algorithm will always descend in the landscape, aiming to find the lowest point. It is the analogue to natural selection’s uphill movement in a fitness landscape. Where natural selection permits only steps that increase fitness, the algorithm permits only steps that decrease a route’s length. If the starting solution was a marble on that landscape, the algorithm would act like gravity on the marble, pulling it downward toward the nearest resting point.
Algorithms like this are important because they are simple. Computer scientists also call them “greedy.” And their greed need not be bad. If the cost landscape is like a meteor crater weathered and smoothed by the ages, then any sequence of downhill steps will get you to the bottom.7
But that’s a big if.
Vehicle routing belongs to a huge class of problems that computer scientists call combinatorial optimization problems. That’s because each solution consists of various building blocks—customers, in this case—and searching for a best or optimal solution requires combining these building blocks in different ways. When routing trucks, you combine the same customers in different orders to find the shortest route. When designing an electric power grid, you combine multiple power plants in different locations to ensure that customers get enough electricity. When scheduling nurses in a hospital, you assign different combinations of nurses to different shifts. When planning a battle, you maximize your enemy’s hurt by attacking different targets with different combinations of weapons. And so on.
Not all combinatorial optimization problems are hard, but all of them have a landscape of solutions. And after exploring and mapping and studying these landscapes for over half a century, computer scientists have learned a key lesson about what makes a problem—any kind of problem—hard.8
A problem is not hard simply because it has many possible solutions—all combinatorial optimization problems do. It is hard for a reason already familiar from looking at self-assembling molecules and adaptive landscapes: the solutions form a rugged moonscape, with myriad shallow craters, fewer deep craters, and no more than a handful of deepest craters, which can be nearly impossible to find among all the others.
The cost landscape of vehicle rerouting is just like that, with countless craters big and small. Chances are that an algorithm that can only descend will climb down a small crater—there are so many of them—and quickly get stuck at a bad solution from which there is no escape. Even if by dumb luck it descended into a large crater, it might not get very far. That’s because big craters are pockmarked by smaller craters, which themselves contain even smaller craters. The algorithm might end up in a tiny crater close to the rim of the big one, with a route that’s far longer than it could be and no way to escape.
Hard problems have another nasty property: they get even harder—much harder—as they grow in size. As the number of customers, power plants, nurses, or weapons increases, the number of craters—the technical term is local minima—increases exponentially. And just like those minima in a molecule’s energy landscape, their numbers quickly overwhelm even the most powerful computers. Whereas the vehicle-routing landscape for ten customers has some one hundred local minima, the same landscape for fifteen customers can harbor more than one thousand local minima,9 and for realistic numbers—a hundred or more cust
omers—the number of minima is too large to count. Only one of them is the best route, or global minimum, and finding it for large problems is like finding a drop of oil in an ocean. In practice, the best one can hope for is a reasonably good solution, one of multiple local minima that are deep, but not quite as deep as the global one. They are the snowflakes of difficult problem solving.
Any algorithm scouring a landscape like this for a good solution faces an already familiar problem: to get out of a shallow valley and explore a deeper one, it needs to overcome the relentless downward pull of gravity. It’s the same kind of problem evolution faces with its own greedy algorithm—natural selection.
Fortunately, where greedy algorithms fail, others can succeed. One such algorithm is called simulated annealing.
The word annealing comes from metallurgy and refers to a treatment that renders steel and other metals less brittle and more workable. Bladesmiths, for example, use it when forging swords. When a piece of steel is annealed, it is first heated to temperatures at which the vibrations of its iron atoms become so powerful that these atoms begin to change their location and drift through the metal. The piece is then cooled down very slowly, which allows atoms to form small crystals and renders the steel more malleable. The treatment’s central ingredient is heat—the same that allows molecules to explore their energy landscapes and discover bucky-balls, gold clusters, and diamonds. And that’s not a coincidence. It reflects the deep connections between different realms of creation.
Simulated annealing—the name says it all—simulates the process of annealing with a computer algorithm. The algorithm explores the solution landscape of a problem and journeys the landscape step-by-step. (If the problem is how to route trucks, each step might swap two customers.) After each step, the algorithm computes whether the step leads to a better solution, such as a shorter route. If so, it accepts the step. This much is identical to a greedy algorithm. But the crucial difference is this: even if the step leads to a worse solution, the algorithm accepts the step with some probability. Early in the algorithm’s journey, this probability is high—the algorithm might be just as likely to accept uphill steps as downhill steps. As time goes on, while the algorithm explores the landscape, “cooling” begins—the algorithm steadily lowers the probability of accepting uphill steps. After a thousand steps, the algorithm might accept only half of all uphill steps; after ten thousand steps, it might accept only one in ten; after one hundred thousand steps, only one in one hundred, and so on, until eventually it accepts only downhill steps.
To see how this procedure resembles the cooling of a material, compare the energy landscape of a bucky-ball or snowflake from Chapter 5 with the solution landscape of a problem. If a greedy algorithm is like gravity pulling down a metaphorical marble—a configuration of carbon atoms or a solution to the routing problem—then simulated heat acts like the jitters that allow the marble to escape that pull. Just like the strong jitters of particles in a hot material allow uphill motion through an energy landscape (to more unstable carbon molecules), so too does the early phase of simulated annealing allow moves to much worse solutions (to much longer routes). Whenever that marble lands in a valley early on, the jitters will quickly jerk it out of there, no matter how deep the valley—how stable the molecule or how good the solution. During this early stage, the marble can explore wide swaths of the landscape, however rough the terrain. As cooling progresses, the jitters in the landscape become weaker. The marble is more likely to move downhill, toward more-stable molecules or shorter routes, and become locked into the current valley. Escape is especially unlikely if that valley is deep, because it would require many consecutive uphill steps. But the deepest valleys in a rugged landscape often contain shallow valleys, and even if the landscape’s jitters have become weak, the marble can still climb out of a shallow valley and roll into the next one. Only at the very end will the marble become confined to a valley. It will travel to the bottom of this valley, to its final resting place, which corresponds to the most stable molecule that nature can discover or to the best solution that an algorithm can find.
The appeal of simulated annealing lies in a simple mathematical fact: one can prove that the deepest valley—the global minimum or shortest delivery route—can be found with certainty if cooling is slow enough; that is, if the tremors’ intensity declines slowly enough.10 These are the same kinds of tremors that enable genetic drift to shake up populations that evolve on an adaptive landscape. Evolution in a small population, where drift is strong, is like the initial stage of simulated annealing. And simulated annealing itself is a bit like evolution in a population that starts out small and grows, such that genetic drift becomes weaker and weaker over time.11
Simulated annealing, however, has one advantage over biological evolution. Because it is a computer algorithm, we can control every detail about it. In contrast, no mastermind of biological evolution manipulates evolving populations for its maximum benefit. Population sizes, for example, depend on many environmental vagaries, including climate, food, and competitors, whereas temperature, which is crucial for the success of simulated annealing, can be controlled with great accuracy via a computer algorithm. Because of this ability, simulated annealing and other human algorithms may be able not only to solve hard problems, but to solve them better than even nature could.
But couldn’t we apply the same idea to evolution? Couldn’t we make evolution more controllable by simulating it inside a computer? This is not so far-fetched, because Darwinian evolution is itself a kind of algorithm, a sequence of simpler steps: mutate, select, repeat. Since this algorithm transforms not bits and bytes but living matter, it could be applied to simulated DNA, simulated phenotypes, and simulated organisms. And in such simulations, we could control every little detail—how often mutations occur, how rigorously we select the best variants, whether we allow sex and recombination, and so on. This level of control could again help us improve the algorithm’s problem-solving ability. What is more, engineers could not only modify the algorithm, but also exploit it to solve problems they care about, problems that may be as alien to nature as routing trucks.
The dream of harnessing the algorithms of evolution for our benefit is not new. It goes back at least to computer science pioneer Alan Turing. In a seminal 1950 article entitled “Computing Machinery and Intelligence,” Turing envisioned machines that do not just think but also learn, and he argued that a “random element” analogous to DNA mutation may help them do so.12
The realization of Turing’s vision had to wait until the 1960s and 1970s, when computers became powerful enough to simulate evolution. That’s when a new research field called evolutionary computation emerged. One of its pioneers was John Holland, an engineering and computer science professor at the University of Michigan. Gifted with an ebullient and cheerful personality, Holland exuded joy about science from every pore of his body. His enthusiasm for new ideas was contagious, and his contempt of established dogma profound. These traits helped him become a pioneer of evolutionary computation in the 1970s. That’s when he developed a class of computer algorithms that mimicked evolution—he called them genetic algorithms—and remain among the most powerful in evolutionary computation.13 Through his work and that of others, evolutionary computation ballooned into a new discipline of science that today employs thousands of researchers, whose goal is to improve on nature’s algorithmic handiwork. And improve it they have.
Unlike simulated annealing, which drives a single marble across the solution landscape of a problem, a genetic algorithm uses multiple marbles, an entire population of them. Inside a computer, each member of this population contains a “chromosome,” a string like that in Figure 6.1a that encodes a solution—good or poor—much like DNA encodes a well-adapted or ill-adapted organism. Each string can “mutate” to a new one, for example, by swapping two random customers in a chromosome encoding a solution to vehicle routing. The computer then chooses the best solutions in the population to select a new and improv
ed population. Mutate, select, repeat—just like biological evolution.
Because genetic algorithms follow biological evolution’s time-honored principles, they also face the same hurdle in solving tough problems: the unforgiving nature of selection. Much like selection in the jungles of Borneo, selection inside a silicon chip is a poor guide to finding the highest adaptive peaks or the lowest-cost valleys on a rugged landscape. Selection is shortsighted, so it exterminates imperfect solutions. Even worse, it prohibits the kinds of failures that can be essential for eventual success.
Fortunately, genetic drift can help genetic algorithms avoid this trap, just as it does for biological evolution. Most genetic algorithms evolve populations that comprise anywhere from ten to a few thousand individuals—simulating much larger populations simply requires too much computer memory and time.14 These populations are much smaller than the billion-strong populations of microbes and are more like those of large animals and plants, where drift is strong enough to overpower some of selection’s aversion to downhill steps. In other words, genetic algorithms overcome the trap of relentless selection for free, courtesy of their modest populations, themselves a consequence of limited computing power.
Genetic algorithms also simulate sex. In their populations, some individuals “mate” and produce “children”—new solutions to a problem—by swapping chunks of their simulated chromosomes, just like recombination swaps chunks of real DNA chromosomes inside organisms.15 This artificial recombination can also teleport individuals through a solution landscape, exploring new solutions that would take forever to reach with smaller steps. And in a development that mirrors the spectacular success of sex in biological evolution, recombination turns out to be so important in genetic algorithms that some scientists program the algorithms to have recombination cause 90 percent of all chromosome changes, leaving only the remaining 10 percent to mutations.16
Life Finds a Way Page 11