Book Read Free

Arrival of the Fittest: Solving Evolution's Greatest Puzzle

Page 10

by Andreas Wagner


  FIGURE 8. Hypercubes

  This trick no longer works in five dimensions, much less higher ones. But although it is hopeless to imagine higher-dimensional spaces, they follow the same laws as our three-dimensional space: The edges of a hypercube are equally long, adjacent edges are at right angles to one another, and each corner corresponds to a possible metabolism. And such cubes in high-dimensional space turn out to have curious properties well suited to house the metabolic library.

  The number of corners in a square is four, in a cube it doubles to eight, and in a four-dimensional hypercube it doubles again to sixteen. With every added dimension, it doubles, and by the time you have reached 5,000 dimensions, this number has become the hyperastronomical 25000, the size of the metabolic library. In other words, we can arrange the library’s metabolic texts on the corners of a hypercube in a 5,000-dimensional space. This is why off-the-shelf shelving would not work. You cannot cram the metabolic library into three puny dimensions. It needs thousands of dimensions to breathe.

  A hypercube is also well suited to accommodate the thousands of neighbors near each of the library’s texts. In a simple universe of three reactions, each of the library’s texts—the corner of a cube—has three adjacent corners as its neighbors. Take one of these texts, such as the string 100 in figure 8c, and you reach its neighbors via the edges leading from 100 to the adjacent corners. We get to them either by adding the third reaction to 100, which yields 101, or by adding the second reaction (110), or by eliminating the first reaction (000). All three neighbors—101, 110, and 000—differ from 100 in exactly one character. And what holds for one corner of the cube holds for any other corner: It has three neighbors. Likewise, in a 5,000-dimensional cube, each and every metabolism has as many neighbors as there are dimensions, five thousand in all. You can walk from each metabolic text in five thousand different directions, to find one of its five thousand neighbors in a single step. Each of these neighbors differs from the text in exactly one reaction. Either the neighbor has an additional reaction—in this case one entry of the string changes from 0 to 1—or it has one fewer reaction—one entry changes from 1 to 0.

  Evolving organisms are like visitors to the metabolic library. Gene deletions and gene transfer allow them to walk through the library, to step from one metabolic text to another, often an immediate neighbor. All of a text’s neighbors form a neighborhood in this library, and such neighborhoods are as important for evolution as a city neighborhood is for people’s lives. City neighborhoods are useful because of proximity—everything is reachable within a few easy steps—and neighborhoods in the metabolic library are important for the same reason. Evolution can reach them in a few small steps, minor edits in a genotype. But residents of a city’s neighborhoods can walk in only four cardinal directions—north, south, east, or west—whereas evolution can head in five thousand directions. (Don’t even bother trying to visualize that.) And therefore the neighborhood of a metabolic text may be vastly more interesting, surprising, and diverse. This diversity will be crucial to understanding innovability, as we shall see shortly.

  Over time, as alterations in an organism’s genotypic text accumulate, it walks farther and farther, to more and more distant shelves in the library. To gauge how far, we must be able to measure distance. Without that ability, we would be lost, and the library would become a useless maze of stacks—we could not find our way from one shelf to another.38 Fortunately, the distance D that I had used to study the diversity of known metabolic texts does the job. It tells us how far apart in the library two metabolic texts are, and it already told us that some viable texts are very distant indeed. The next insight it provides is the real bombshell, though: We can travel enormous distances through the library and encounter very different stories with the same moral, everywhere.

  One day we may know millions of metabolic texts, but even that number would be a tiny fraction of the hyperastronomical metabolic library, less than a few specks of dust in the universe, because the library contains many more metabolisms than the number of organisms that have existed on earth since life began. Even after 3.8 billion years of evolution, life has explored only a tiny fraction of the library.

  For all of those billions of years, nature did not need to know what was around the next corner of the library for evolution to proceed. But if we humans want to understand the library, rather than simply live in it, we need to have some way to grasp where new and meaningful texts are. And we need a catalog that classifies texts, like the Dewey Decimal System, or the Library of Congress Classification, grouping books according to subject categories—Art History, Economics, Linguistics—with smaller subcategories such as Romance, Germanic, Slavic languages nested within them. Metabolic phenotypes, the possible meanings of a metabolic text, are the natural subject categories of this library. Their number is larger than those in a library of books, but that’s simply because the library itself is so vast.

  A catalog is like a map for this library—it is a genotype-phenotype map that tells us where to find the genotypes with any one phenotype. Without this map, we do not know whether texts with the same subject are scattered or grouped—as they would be in a human library—whether the same shelf houses texts on different subjects, and so on. And because no librarian is in sight, we need to create this map ourselves, roam the library and explore it, like the ancient voyagers who mapped the earth and its continents on their journeys. The library’s huge size will prevent us from mapping every single text, but we can draw the contours of the continents, mountain ranges, rivers, lakes, and deserts, and hope that we can grasp the shape of the whole from their hazy outlines.

  But where to start, and how to travel?

  Here is a puzzle that will point the way. Take a metabolism with any one phenotype, such as viability on glucose, and ask, What if only one text in our library of more than 101500 metabolisms expressed its meaning? As many as five nonillion (5 × 1030) bacteria exist on earth today. This number is vast, a 1 with more than 30 zeroes. But even if each of these bacteria had tried a new enzyme combination every second since life began almost four billion years ago, they would have tried only about 1048 such combinations.39 Their chances of having found the one and only working combination would be vanishingly small, smaller than one in 101450. This number—so small as to be effectively meaningless—means that it would be utterly impossible to find this text through a blind search.

  On the one hand, the odds against finding just one useful metabolism are vast. On the other hand, life’s diversity shows that evolution had no problem finding it. This means that our premise must be wrong: There has to be more than one metabolism—perhaps even many—that solves the problem of surviving on glucose.

  To find them, let’s do what evolution does: journey through the library and edit genomes—through a series of gene transfer or deletion events that add or eliminate at least one gene, enzyme, or reaction. The starting point for such a journey isn’t terribly important. It could be any text in the library, any text that encodes a metabolism viable on glucose or on any other fuel.

  So let’s start with a metabolism viable on glucose, and either delete a randomly chosen reaction or add a randomly chosen reaction from the known reaction universe. Nature would make a simple and brutal evaluation of the new text: life or death. But we scientist travelers are privileged, because we can retrace our steps. We compute the meaning of the altered text, and, if it turns out not to be viable on glucose, return to the starting text, and add or delete another random reaction—remember, there are five thousand ways of doing that. But if the neighbor is viable on glucose, the journey continues. We add or delete a second reaction, compute the phenotype, and repeat, more or less ad infinitum.

  In other words, step from a starting text to its neighbor, to the neighbor’s neighbor, to the neighbor’s neighbor’s neighbor, and see how far you could walk without ever changing its chemical meaning, viability on glucose. Because each step alters a text at random, this walk is a random walk
through the metabolic library, similar to how a drunkard might stagger home from a night out at the bar, with one difference: Each step in our random walk must encounter a text with the same meaning, the same phenotype.

  If there were only one metabolism viable on glucose, this random walk would lead literally nowhere, because the starting text would have no viable neighbors. We would be rooted to the spot. The same would be true if there were a few such texts scattered widely through the library—we could not reach them without destroying viability on the way. And even if they were close together the random walk might not lead far. A few neighbors of the starting text might be viable, but their neighbors might not be.

  Only if many such texts existed could we roam the library. But in that case we would face a different problem altogether: computing power. To compute one text’s meaning is a breeze, but what if this random walk had thousands of steps, and each could lead in thousands of different directions. This is the sort of problem that could take an off-the-shelf desktop computer years or decades to solve. An entire network of computers—a computing cluster—is required to speed up that computation. And that costs money.

  While I was slowly advancing from a Ph.D. student to a postdoctoral researcher, and eventually to a tenured professor at a U.S. research university, funding for the kind of basic research that addresses the problem of evolutionary innovation began drying up. This drought combined with the ailing health of my European family, so when a job offer arrived from Switzerland, I was ready to take a leap across the Atlantic, back to my European roots.

  I knew that Switzerland was a world leader in science, enormously productive, and technologically sophisticated.40 Its world-class system of public education, generous support for academic research, and attractive living conditions are behind this success. I would be sad to leave many dear academic colleagues behind, but the opportunity to join the Swiss scientific community was a privilege both humbling and enticing. Most important, the offer was good enough to finance not only a computing cluster but also a state-of-the-art experimental laboratory. Even better, it would allow me to recruit multiple like-minded researchers from all over the world. It was an offer I could not refuse.

  On a crisp fall day in 2006 at the University of Zürich, I was sitting in my newly furnished office, inside an austerely elegant building whose simple geometric contours are drawn in a gleaming blend of glass and metal, when a young Portuguese man walked in. Handsome, soft-spoken, with curious deep brown eyes and a quick smile, he introduced himself as João Rodrigues.

  João had studied physics, but he had heard that there were many exciting problems waiting to be solved in biology. He was looking for a new challenge, a difficult problem to crack that would get him a Ph.D. He did not know much biology at the time, but he had assets that many biologists lacked: He was good at mathematics, knew how to program computers, and had already performed large and complicated computations. When I first saw his résumé, I could hardly contain my excitement. João had exactly the skills needed to navigate the vast metabolic library. During his job interview, I shared my passion about learning how nature creates. Fortunately for me, we connected. His eyes lit up. He signed on.

  João’s background is typical for researchers in my lab. They hail from a dozen different countries in the Americas, Europe, Asia, and Australia, and from many disciplines, including biology, chemistry, physics, and mathematics. This is not a coincidence, because the problems we tackle require new skill combinations, so much so that I like to compare our work to that of evolution: Studying innovations, like creating them, benefits enormously from novel combinations—not of enzymatic but of intellectual skills.

  I soon became impressed with João’s computing wizardry, even though I remained worried that the cluster of more than one hundred computers we had built would still be too slow, that we would never leave the first shelf of the library. But João tricked the machines into working faster, accelerated their computations many times, and eventually launched us far into the library’s vast stacks.

  João’s exploration started with a single well-studied metabolism, that of the bacterium E. coli and its viability on glucose—the ability to synthesize all its sixty-odd essential biomass molecules from this single sugar.41 To find out whether only one metabolism with this ability existed, João first created more than a thousand of E. coli’s neighbors, each of them a metabolism that differs in a single chemical reaction from E. coli. If E. coli’s metabolism is an instruction manual to make all essential biomass molecules, then these neighbors are minor variations on the manual. The first question: Do any of them contain sufficient information to produce all sixty biomass building blocks from glucose?

  João computed the answer and quickly found that not one, not two, not three, but hundreds of E. coli’s neighbors are viable on glucose. This discovery contained a simple but vital lesson: The uniqueness of this phenotype is but a deeply flawed prejudice.42 The neighborhood of any one text contains many other viable texts like it. But nothing had prepared us for what came next, when we began to venture further.

  João used E. coli as a starting point for deep probes of the metabolic library that led further and further away from the starting text. The objective was to learn how far we could travel—hopping from one viable text to a viable neighbor, to the neighbor’s neighbor, and so on—without losing viability on glucose. How radically could a metabolic text be edited without losing its meaning? When João showed me the answer, my first reaction was disbelief. The furthest viable metabolism he found—the one with the highest D—shared only 20 percent of its reactions with E. coli. We had walked, computationally speaking, almost all the way through the library—80 percent of the distance that separates the furthest volumes—before we were finally unable to find a glucose-viable text by taking a single step.

  Worried that this might be a fluke, I asked João for many more random walks, a thousand more, each preserving metabolic meaning, each leading as far as possible, each leaving in a different direction—possible only because the library has so many dimensions. When the answer came back, I was stunned once again. These random walks had led just as far away as the first one. Each of them led to a metabolism that differed in almost 80 percent of its reactions from E. coli. They had found a thousand metabolic texts that shared very little with E. coli, except that all of them could produce everything a cell needs from the carbon and energy stored in glucose. If we had kept on walking, we would have found even more texts, too many even to count, although later we were able to estimate their number in parts of the library.43 For example, the number of metabolisms with two thousand reactions that are viable on glucose exceeds 10750.

  The number of texts with the same meaning is itself hyperastronomical. The metabolic library is packed to its rafters with books that tell the same story in different ways.

  While we surely had not expected this, our explorations had revealed an even more bizarre feature of the library. The thousand random walks did not end in a few stacks of the library, where texts with similar meanings might huddle in small groups—groups of metabolisms with similar sets of reactions. These texts were just as different from each other as they were from that of E. coli—they encoded metabolisms with very different sets of chemical reactions. The library does not have clearly distinct sections, like rooms that separate all texts on history from those on science.44

  FIGURE 9. A genotype network

  And even more surprising was what we found when we started new random walks from these texts—as we had done from E. coli—and walked toward other texts without ever changing viability. We always succeeded in reaching them, no matter how far away they were from our starting point. Every single time. This taught us that a connected network of paths linking texts with the same meaning extends throughout the library. I call this network a genotype network. It might look a bit like the network of straight lines in figure 9, where the large rectangle stands for the metabolic library, and the lines connect neighboring t
exts (circles) with the same meaning. Pictures like this are wobbly visual crutches—two dimensions instead of five thousand, a handful of texts instead of unimaginably many—but they are all we have to visualize places as strange as this.

  In an ordinary public library, you might find biographical information about Charles Darwin in one text on a shelf in the history section, and another in biography. In a large research library using the Library of Congress’s classification system, you might find some such texts in section QH (for “Science: Natural History, Biology”), but others in sections DA (“World History, Great Britain”), GN (“Anthropology”), PR (“English Literature”), and even BL (“Religion, Mythology, Rationalism”). But you would find nothing that resembles the organizing principles of the metabolic library. You would not find a network of meaning-preserving paths connecting the Darwin biography in HM (“Sociology, General”) with another in BT (“Doctrinal Theory”). You would not be able to walk from one book to its neighbor, to the neighbor’s neighbor, and so on, almost all the way through the library, without ever being farther than one book away from another that told Darwin’s life story in different words.

 

‹ Prev