Arrival of the Fittest: Solving Evolution's Greatest Puzzle
Page 9
If all this horizontal gene transfer went on unchecked, the size of a genome would constantly increase over time and become grotesquely bloated. But excessively long DNA strings break more easily, and copying them wastes energy and materials—a mortal sin that nature won’t tolerate.19 Fortunately, such bloating does not occur, because gene transfer is balanced by gene deletions. These are the by-products of errors that happen when cells cut or splice DNA molecules as they repair and copy their DNA. Unlike DNA mutations that alter one letter at a time, deletions can strike out thousands of letters and many genes. As long as a deletion affects no essential genes, the cell can live with it. Such survivable deletions occur all the time. They ensure that only useful genes stay around in the long run, and they keep genomes lean.
In another difference from sex as we know it, gene transfer occurs not just between similar organisms but also between baker’s yeast and fruit flies, between microbes and plants, and especially among bacteria, which can be as different from one another as humans are from oak trees.20 This is why gene transfer is so powerful, and the most important reason why bacteria are masters of metabolic innovation. Very different organisms harbor very different metabolic texts, and gene transfer can edit one text with borrowed passages that are very different yet meaningful within another text, the microbial equivalent of a musical mash-up that combines a Baroque instrumental track with a pop vocal. Only some edits will improve a text, because the recipient cannot pick and choose which new genes it gets—they are a random subset of the donor’s genome. But because gene transfer is incredibly frequent, the odds for innovation aren’t bad: Even though many edits lack luster, the shelves of life’s universal library contain a virtually infinite number of masterpieces waiting to be found.
An example of nature’s editorial prowess is our friend E. coli and its multiple varieties, E. coli strains that were long thought to be like closely related ethnic groups.21 At the beginning of the twenty-first century, biologists first deciphered the genomes of many such strains, expecting them to be very similar. False. Two E. coli strains can differ in more than one million letters, or one-quarter of their DNA, such that one strain can harbor a thousand genes that the other strain lacks.22 Every million years—a blip of evolutionary time, 20 percent of the time since humans diverged from chimpanzees—an E. coli genome acquires some sixty new genes, all of them through horizontal gene transfer.23 And these are only the successful edits—many others have gone out of print and left no descendants.
FIGURE 5. Genotype distance
We already know the DNA sequences of more than a thousand bacterial species, and they testify that E. coli is not an exception, but the rule.24 Most bacterial genomes are just as packed with genes trafficked from other sources, many with an unknown origin, though this is scarcely surprising. Trying to discover the provenance of a particular gene is a bit like trying to trace the literary influences revealed in a single paragraph of a novel by reading a small and random selection of the Library of Congress. A thousand species—or even a hundred thousand—would still be a drop in the ocean of bacterial diversity with countless millions of bacterial species, most of them unknown, all of them potential gene donors.
Not all of this genomic change causes metabolic change, because only a third of a genome is devoted to metabolism—the proteins it encodes also have a lot of other business, helping a cell move around, transporting building materials, and so on.25 So what if gene transfer mostly shuffles the nonmetabolic parts of the genome? Then evolution’s journey through the metabolic library might not take it far, and most metabolisms would therefore be very similar.
Are they? A few years ago, I asked this question of hundreds of bacterial species with known genomic DNA sequences, in a study that relied on decades of research done before my time. This research had discovered thousands of genes that code for particular enzymes, and allowed us to draw maps that connect genes with enzymes, and enzymes with chemical reactions.26 In other words, we could translate a genome sequence into a metabolic genotype, and compare these genotypes among organisms.27 And that is what I did.
Figure 5 shows how easy it is to compare two metabolic texts, using the simple example of two short snippets corresponding to ten enzymes in two organisms. Four of the ten enzymes cannot be made by either organism (gray zeroes), six are encoded by the first organism—its genotype string has six ones—and five are encoded by the second organism. We take the number of enzymes (six) made by at least one of the two organisms, and the number of enzymes made by only one but not the other organism (one enzyme), and calculate the ratio (1/6) of these numbers. If this ratio were zero, then two organisms would encode exactly the same enzymes. If it were 1/2, then half of the enzymes that one organism can produce would also be produced by the other organism. If the ratio were equal to 1, then the first organism wouldn’t be able to produce a single enzyme that the second organism can produce—their metabolisms would be maximally different. Those ratios, ranging from 0 to 1, reflect the difference between the enzymatic portfolios of two organisms, but that’s a little unwieldy to write over and over again—better to replace it with the symbol D for difference or distance. 28
Unspeakably tedious as comparing genotypes for hundreds of bacteria—each encoding more than a thousand reactions—would be on pencil and paper, my trusted computer can finish it in the blink of an eye. When I asked it to calculate D for hundreds of pairs of bacteria, I was surprised to see—although their highly diverse genomes should have warned me—that even closely related organisms had highly diverse metabolic texts. Thirteen different strains of E. coli differed in more than 20 percent of their enzymes.29 An average pair of microbes differed in more than half of them.30 I had also suspected that bacteria living in the same environment—the soil, for example, or the ocean—might encounter similar nutrients and thus have similar metabolic texts. Wrong. Their metabolic texts were just as diverse, with a D just as different as that from bacteria living in different environments.
This exercise underscores the staggering scale at which nature experiments through gene shuffling. Everywhere on this planet, a relentless shuffling and mixing and recombining of genes takes place. Wherever microbial life occurs, in the depth of the oceans and on arid mountaintops, in scalding hot springs and on frigid glaciers, in fertile soils and desiccated deserts, inside and around our bodies, life is experimenting with every conceivable combination of new genes, rereading, editing, and rejuggling its metabolic texts without pause, yielding an enormous and still growing diversity of metabolisms.
Without readers, a book is a bundle of cellulose sheets with meaningless ink stains. Likewise, a text in the metabolic library needs to be read to reveal its meaning: the metabolic phenotype that determines which fuels an organism can use, and which molecules it can manufacture. We think of a phenotype as something we can see, and many metabolic phenotypes are plain as daylight. They include the melanins that protect our skin against radiation, that camouflage a lion’s fur, and that color the ink of an octopus. All of them are molecules synthesized by metabolism. And so are the various pigments that dye tree leaves, lobsters, flowers, and chameleons, whether for defense, courtship, or sometimes for no good reason at all.31 But metabolic phenotypes do not end at this visible surface. They extend to depths that are hidden from our eyes yet visible to chemical instruments—and to natural selection. Their most important role is to ensure viability itself, which boils down to the ability to synthesize sixty-odd molecules very different from those pretty pigments—they are the essential biomass molecules I mentioned in chapter 2. Viability, viewed as the phenotypic meaning of a genotypic text, is like the simple moral of a complex story, or like a brutally straightforward court judgment: If you can’t make all essential biomass molecules, your sentence is death, and it is carried out immediately. Organisms with a mutation that has compromised the ability to synthesize essential molecules don’t just fail to live long enough to reproduce. They don’t live at all.
To grasp
this phenotypic meaning—viability or death—we need to read an organism’s metabolic genotype. This is a tall order, not only because the meaning of a text is so much more complex than the text itself—to understand the moral, we have to understand the whole story—but also because our brains are not well practiced in reading chemical language. Fortunately, we can program the artificial intelligence of computers to assist us.
A genotype tells us which reactions a metabolism can catalyze, the molecules these reactions consume, and the molecules they produce. To decipher its meaning, we would first need to know which nutrients are available—without the right ingredients, you cannot bake a cake—and whether the metabolism can use them to build an essential biomass molecule such as tryptophan. This is easiest for the austere minimal environments where survivalists like E. coli can thrive, because they contain so few nutrients, sometimes only a single sugar that provides all the carbon and energy the organism needs.
Starting from the available nutrients, we would then write a list of all the molecules the metabolism’s reactions produce from the available nutrients, find the reactions in the genotype that consume these product molecules, and list their products, iterating in this way until we find one or more reactions whose products include tryptophan. If no such reaction exists, then the metabolism cannot produce tryptophan.
FIGURE 6. Metabolic phenotypes
We would then move on to another biomass molecule, perhaps another amino acid, or one of the four DNA building blocks, repeating the entire procedure for each of the building blocks to find out whether the metabolism can manufacture it. Only when all essential biomass molecules can be produced is it viable.32
All of this is done on computers, because computation—done right—is faster, cheaper, and can even be better than experimentation. But as the saying goes, the map is not the territory, and we biologists do not fully trust any computation until we can check it. So like a factory that spot-checks its output randomly, we expose organisms with known metabolic genotypes to known chemical environments, and wait, somewhat ghoulishly, for them to grow or die. This has been done, for example, to several hundred mutant E. coli strains, each of them engineered to lack one enzyme, and it shows that their computed viability is highly accurate—it is correct for more than 90 percent of strains.33
Most biologists who know about this computation think of it as ordinary and do not dwell on how remarkable it is. But more than just remarkable, the capacity to compute viability is profound and revolutionary, a legacy of a hundred years of research in biology and computer science. Darwin and generations of biologists after him could not even dream of it, yet it is crucial to understanding metabolic innovability—nature’s ability to create new metabolic phenotypes.
This computation works for any organism whose metabolism we know, and for any known chemical environment, whether Arctic soil, tropical rain forest, oceanic abyss, or a mountain meadow. It also applies to any aspect of a metabolic phenotype—to any molecule a metabolism could make. But among all these aspects, viability is the most fundamental, and new methods of making biomass and using chemical fuels are by far the most important innovations. They are also the most far-reaching, opening new territories to life and its metabolic engines.
The reason for the importance of fuel innovations is simple: The world changes all the time, and no matter how successful a metabolism is today, it will almost certainly become unsuccessful at some point in the future, like an economy that depends on exhaustible fossil fuels. Chemical environments always change as consumed nutrients ebb and new foods flow. Organisms that depend on a single, specific combination of nutrients are evolutionary dead ends, and ongoing innovation is needed to survive.34 Fortunately, many different kinds of molecules can provide energy and chemical elements like carbon. Some are as familiar as glucose and sucrose, others less so, like the poison pentachlorophenol.
Even a modest number of potential fuel molecules gives rise to an astounding number of fuel combinations on which a metabolism may or may not be viable—an astounding number of metabolic phenotypes. To see how many, imagine a list like that shown in figure 6, comprising a hundred or so potential fuels. Then compute whether the known metabolism of your favorite animal, plant, or bacterium is viable on a specific fuel molecule, such as glucose. If it can synthesize all biomass molecules from the carbon in glucose, write a “1” next to glucose, otherwise write a “0.” Then repeat this computation for the next fuel molecule, the next one after that, and so on, until each fuel has either a “0” or a “1” next to it. Every single “1” in this list means that the metabolism can synthesize the complete suite of biomass molecules from that particular fuel.
The resulting string of a hundred ones and zeroes encapsulates the fuel molecules that a given metabolism can use to sustain life. It is an extremely compact way of summarizing a metabolic phenotype. Metabolic generalists like E. coli can survive on dozens of carbon sources, and their phenotype string contains many ones.35 Metabolic specialists can live on only a few carbon sources, and their phenotypes contain mostly zeroes.
To count how many such phenotypes exist, the different combinations of a hundred-odd fuels on which an organism could be viable, we just need to keep in mind that an organism may (1) or may not (0) be able to live on each fuel—these two and no other possibilities exist. To calculate the total number of possible phenotypes, multiply 2 by itself a hundred times, which yields 2100. This number is greater than 1030, or a 1 with 30 zeroes added, not quite as large as the number of possible metabolisms, but still a very large number, much larger than, say, the number of stars in our galaxy—approximately 1011, or 100 billion.
I was not kidding when I told you that phenotypes are more complex than the modern synthesis would have you believe.
This huge number of phenotypes implies an equally huge number of metabolic innovations. Figure 7 shows one example. The figure’s left side displays the fuel phenotype of a metabolism that can survive on some carbon sources, but not on ethanol, hence the zero next to ethanol. New genes—acquired through gene transfer or otherwise—can change the genotype that brings forth this phenotype. If this change allows the mutant to metabolize ethanol, we replace the “0” next to ethanol with a “1.” Because every conceivable metabolic innovation can be written like this, by replacing a “0” with a “1” in a metabolic phenotype, there are about as many possible metabolic innovations as there are phenotypes.36
Designing a space to house the library of all possible metabolisms would be challenging, in part because its volumes exceed the number of hydrogen atoms in the universe. To allow us to find specific volumes fast, the library would also have to be supremely well organized. It would take me only seconds to find my copy of Darwin’s Origin in the small library of my office, but searching for any one book while grazing through the stacks of an average university’s library would be a bad idea. And if somebody had reshelved the Origin in the wrong place, it might be lost forever. The problem is much worse in a hyperastronomical library. The universal library might well contain the secret to immortality—or at least the perfect recipe for turkey stuffing—yet the library is so large that we would never find it unless we knew where to look.
FIGURE 7. A metabolic innovation
An especially simple way to organize the library is to place the most similar texts next to each other. Human librarians do exactly that when they shelve different editions of the same book together. If the metabolic library were organized along these lines, the most similar texts would be immediate neighbors. But there is a problem: To buy or build shelving for this library would be a real pain.
In a human library, every book has two immediate neighbors, one to the left and one to the right, or maximally four, if you want to count the volumes on the shelf above and below as well. How many neighbors would any one text in the metabolic library have? Recall that a string of five-thousand-odd ones and zeroes describes a metabolic genotype. Any neighbor would differ in exactly one of these lette
rs, one chemical reaction that may be either present or absent. (It cannot possibly differ in less than that, and if it differed in more, it would no longer be a neighbor.) There is one neighbor that differs in the first letter of this string, another that differs in the second letter, one that differs in the third letter, and so on, until the very last of these letters. In other words, each metabolic text has not two, not four, but thousands of neighbors, as many as there are biochemical reactions, each of these neighbors differing in a single letter and reaction. Shelves that can hold this sort of inventory aren’t easy to find.
To see how peculiar they would have to be, imagine a much simpler world than ours, the simplest possible chemical world with only one chemical reaction. In this world the metabolic library has only two texts. One of them consists of the letter 1, containing the one and only reaction, the other of the letter 0—it lacks this reaction. Figure 8a shows these texts as the endpoints of a straight line.
A slightly larger universe with two reactions would be big enough for 2 × 2 = 4 possible metabolic texts. One of them has both of these reactions (11), two of them have one reaction but not the other (10, 01), and the fourth metabolism has no reaction (00). Figure 8b shows these metabolisms as the corners of a square.
You may already see where this is going. The next larger reaction universe would have three reactions and 2 × 2 × 2 = 8 possible metabolisms that form the corners of a cube (figure 8c). For a universe with four reactions, we have 2 × 2 × 2 × 2 = 16 possible metabolisms. But which geometric object would correspond to it? As our reaction universe increased from one to two to three reactions, its metabolic texts occupied the endpoints of a line, a square, or a cube, which exist in a one-, two-, and three-dimensional space. Taking it one step further, we need an object in a four-dimensional space. Spaces with four or more dimensions are hard to visualize, but mathematicians routinely work with them, because we can extend our geometrical laws to them.37 Just as in a square and a cube, the edges of the object we are looking for have to be equally long, and adjacent edges would have right angles to one another. Such an object is a four-dimensional hypercube. Figure 8d uses a geometric trick to show this hypercube on paper. It has sixteen corners, each one corresponding to one metabolic text—from 0000 to 1111—that is no longer shown in the figure.