Darwin's Doubt

Page 25

by Stephen C. Meyer

Protein Folds: Plausible but Irrelevant Scenarios

There is a second and closely related difficulty associated with the scenarios cited by Long. Typically, they do not even try to explain the origin of new protein folds, and few of them analyze genes different enough from each other that their protein products could, even conceivably, exemplify different folds. Instead, they usually attempt to explain the origin of homologous genes—genes that produce proteins with the same folded structure performing the same function or a closely related one.

For example, Long cites one study comparing the two genes RNASE1 and RNASE1B, which code for homologous digestive enzymes.32 The two proteins perform nearly the same function: breaking down RNA molecules in the digestive tracts of colobine leaf-eating monkeys, though each does its job at a slightly different optimal chemical pH. More important, given that the amino-acid sequences of the two enzymes are 93 percent identical, structural biologists would expect both enzymes to utilize the same protein fold to accomplish their closely related tasks.

Long also references a study of a gene that codes for a histone protein, Cid, in two closely related species of fruit flies—Drosophila melanogaster and Drosophila simulans. The study didn’t try to explain the “origin” of the gene—it merely compared the gene in the two species, catalogued some minor differences between them, and asked how those differences arose. The study identified some two dozen nucleotide differences between the genes for Cid in the two species—only 17 of which might have changed an amino acid in the sequence out of 226 total amino acids in the Cid protein.33 Such a slight (7.5 percent) difference would be extremely unlikely to translate into different protein folds. Indeed, natural sequences known to have different folds do not have anything like the correspondingly high degree (92.5 percent) of sequence identity. Instead, known natural sequences with this high level of sequence identity have the same fold.

Long also cites two studies of FOXP2, a gene involved in regulating gene expression in humans, chimps, other primates, and mammals. In humans and other mammals, this gene is involved in brain development.34 Nevertheless, according to one study, the protein coded for by this gene in humans acquired just “two amino-acid changes on the human lineage”35 during the entire course of its evolution from a common chimp-human ancestor—again, not likely a sufficient enough change to generate a new protein fold.

The Long review essay cites numerous scenarios of this type—scenarios attempting to explain the evolution of slight gene variants (and their similar proteins), not the origin of new protein folds. This is an important distinction because, as we saw in Chapter 10, new protein folds represent the smallest unit of selectable structural innovation, and much larger structural innovations in the history of life depend upon them. Explaining the origin of structural innovation requires more than just explaining the origin of variant versions of the same gene and protein or even the origin of new genes capable of coding for new protein functions. It requires producing enough genetic information—truly novel genes—to produce new protein folds.

Thus, even where these scenarios are plausible, they are not relevant to explaining the origin of the genetic information necessary to produce the kind of structural innovation that occurs in the Cambrian explosion (or many other events in the history of life).

Protein Folds: Relevant but Implausible Scenarios

In a few cases, the evolutionary scenarios cited in the Long paper appear to be attempts to explain genes that are different enough from each other that they could conceivably code for proteins with different folds. For example, Long discusses several papers that equate exon shuffling with the shuffling of protein domains. Recall that a protein domain is a stable “tertiary” protein structure or fold made of many smaller “secondary” structures such as alpha helices or beta strands (see Fig. 10.2). Many complex proteins have numerous domains, each exhibiting a unique fold or tertiary structure. One version of the exon-shuffling hypothesis assumes that each exon codes for a specific protein domain. It envisions random cutting and splicing—excising, shuffling, and recombination—of the exon portions of the genome, resulting in the modular rearrangement of genetic information. The resulting composite gene will then code for a new composite protein structure. As Long proposes, “Exon shuffling, which is also known as domain shuffling, often recombines sequences that encode various protein domains to create mosaic proteins.”36

Of the mechanisms that Long discusses, exon shuffling (and the closely related idea of gene fusion) provides perhaps the most plausible means of generating new (composite) proteins.37 Nevertheless, the idea that exon shuffling can explain the origin of the genetic information necessary to produce new protein folds or whole composite proteins is problematic for several reasons.

First, the exon-shuffling hypothesis seems to assume that each exon involved in the process codes for a protein domain that folds into a distinctive tertiary structure. To a protein scientist, a protein domain is equivalent to a protein fold, though distinct protein structures (folds) may be composed of several smaller domains (or folds). Thus, at the very least, the exon-shuffling hypothesis presupposes the prior existence of a significant amount of genetic information—enough information to build an independent protein domain or fold. As such, it fails as an explanation for the origin of protein folds and the information necessary to produce them.

Some advocates of exon shuffling, however, may be using the term “protein domain” in a slightly fuzzier way. They might be equating domains with smaller structural units such as fragments of a fold made of several units of secondary structure such as alpha helices or beta strands. Conceived this way, the exon-shuffling hypothesis would then entail the construction of a new protein structure by combining these smaller “fragments.”

But in most cases, if the amino-acid chain that forms a domain is chopped into fragments, then the resulting isolated pieces would cease to retain their original shapes. Why? Because the three-dimensional shape of one small section of a protein is heavily dependent upon the overall structure and shape of the rest of the protein. Snip out a section or fragment, or synthesize a fragment in isolation from the rest of the protein, and a floppy amino-acid chain will result—one that has entirely lost its original shape, or ability to form a stable structure. Thus, this version of the exon-shuffling hypothesis lacks credibility because it incorrectly assumes that shapeless protein fragments can be mixed and matched in a modular fashion to form new stable, functional protein folds. Moreover, even if such shuffling were physically plausible, this version of the hypothesis would have another problem. It still presupposes unexplained functional information—in particular, the information necessary to specify, not just the smaller fragments, but also the information required to arrange these smaller units into stable folds, and ultimately functional proteins.

Second, since the exon-shuffling hypothesis assumes that each exon involved in the shuffling codes for a specific protein domain, it also assumes that exon boundaries correspond to the boundaries of protein domains or folds. In existing genes, however, exon boundaries do not typically correspond to the boundaries of folded domains within the larger proteins.38 If the shuffling of exons explained how actual proteins had come into existence, then there should be a clear correspondence or correlation between exon boundaries within genes and the corresponding protein domains within larger composite structures (i.e., whole proteins). The absence of such a correspondence suggests that exon shuffling does not account for the origin of known compound protein structures.

Third, relying on exon shuffling to cobble together a new protein fold from smaller units of protein structure is physically implausible for another reason. To see why, we need to examine what a “side chain” is. All twenty protein-forming amino acids have a common backbone (made of nitrogen, carbon, and oxygen), but each one has a different chemical group called a side chain sticking out at roughly right angles from that backbone. The interactions between side chains determine whether secondary units
of protein structure made from chains of amino acids will fold into larger stable three-dimensional folds.39 Though many different sequences will generate secondary structures (alpha helices and beta strands), generating stable folds is much more difficult and requires much more specificity in the arrangement of amino acids and their side chains. Specifically, since the elements in smaller secondary structural units in proteins are surrounded by side chains, they cannot be combined into new folds unless the elements have the sequence specificity required for the side chains to complement one another.40 That means smaller secondary structural units will rarely41 fuse together to form stable tertiary structures or folds. Instead, attempts to form new folds from smaller units of structure repeatedly encounter adverse interactions between the side chains of the amino acids within units of secondary structure.

The need for extreme specificity in the sequential arrangements of amino acids, discussed in the previous chapter, means that the overwhelming majority of amino-acid sequences in units of secondary structure will not result in stable folds as these units of structure come into contact with each other. As discussed in Chapter 10, the extreme rarity of functional proteins (with stable folds) in sequence space ensures that the probability of finding a correct fold-stabilizing sequence will be astonishingly small. For this reason, even skilled protein scientists have struggled to design sequences that will produce stable protein folds.42 Almost invariably the units of secondary structure that they attempt to combine or otherwise place into stable composite structures will not fold because of the interactions of their amino-acid side chains.43 As molecular biologist Ann Gauger explains, “Thus, [alpha] helices and [beta] sheets are sequence-dependent structural elements within protein folds. You can’t swap them around like Lego bricks.”44

Nor is it an easy matter to simply find different sequences of amino acids that will stabilize folds from smaller secondary units of structure, again, because of the extreme rarity of functional (and folding) sequences within amino-acid sequence space. Generating specific sequences that will fold into stable structures, whether in the lab or during the history of life, requires solving the combinatorial inflation problem. Even small folds will require five or six units of secondary structure with 10 or so amino acids in each unit, that is, 60 or more precisely sequenced amino acids. Modest-size folds will require a dozen or more units of secondary structure and 150 to 200 specifically arranged amino acids in order to stabilize a fold. Larger protein folds will require many more secondary units and specifically arranged amino acids. Since, however, many mission-critical functions within even the simplest cell require many folds (of at least 150 amino acids) working in close coordination, the need to produce proteins of at least this length numerous times through the history of life cannot be avoided.

All this requires searching for a functional needle in a vast haystack of combinatorial possibilities. Recall that Douglas Axe estimated the ratio of needles (functional sequences) to strands of straw in the haystack (nonfunctional sequences) to be 1 to 1077 for sequences of modest-length (150 amino acids).

Of course, in naturally occurring proteins, the interactions between side chains in units of secondary structure do maintain stable folds. But these proteins, with their stable three-dimensional folded structures, depend upon exceedingly rare and precisely arranged sequences of amino acids. The question is not whether the combinatorial search problem necessary to produce stable protein folds has ever been solved, but whether a neo-Darwinian mechanism relying on random mutations (in this case random shuffling of exons) provides a plausible explanation for how it might have been solved.

The papers that Long cites give no reason to think that exon shuffling (or any other mutational mechanism) has solved this problem. The exon-shuffling hypothesis ignores the need for side-chain specificity, though the need for such specificity has repeatedly defeated attempts in the laboratory to build new proteins from units of secondary structure in the manner required.

But advocates of exon shuffling make no attempt to show how random rearrangements of protein domains—whether the domains are conceived of as fragments of a fold or whole folds—would solve the combinatorial problem. Nor do they challenge Sauer’s or Axe’s experimentally derived quantitative estimates of the rarity of functional genes or proteins. They do not challenge the probability calculations based on these estimates. And they do not show that a mechanism exists that can search amino-acid sequence spaces more effectively or efficiently than random mutation and selection. Nor do they demonstrate the efficacy of exon shuffling in a model system in the laboratory. Instead, basic considerations of protein structure imply the implausibility of exon shuffling as a means of generating the genetic information necessary to produce a new protein fold.45 So, in the end, with few words and with apparent confidence, advocates of exon shuffling simply assert, as the Long paper does, that “exon shuffling often recombines sequences that encode various protein domains to create mosaic proteins.”

Word Salad

The assertion of Long and his colleagues about exon shuffling, like many other statements about postulated mutational mechanisms, blurs the distinction between theory and evidence. Despite the authoritative tone of such statements, evolutionary biologists rarely directly observe the mutational processes they envision. Instead, they see patterns of similarities and differences in genes and then attribute them to the processes they postulate. Yet the papers that Long cites offer neither mathematical demonstration, nor experimental evidence, of the power of these mechanisms to produce significant gains in biological information.

In the absence of such demonstrations, evolutionary biologists have taken to offering what one biologist I know calls “word salad”—jargon-laced descriptions of unobserved past events—some possible, perhaps, but none with the demonstrated capacity to generate the information necessary to produce novel forms of life. This genre of evolutionary literature envisions exons being “recruited”46 and/or “donated”47 from other genes or from an “unknown source”48; it appeals to “extensive refashioning”49 of genes; it attributes “fortuitous juxtaposition of suitable sequences”50 to mutations or “fortuitous acquisition”51 of promoter elements; it assumes that “radical change in the structure” of a gene is due to “rapid, adaptive evolution”;52 it asserts that “positive selection has played an important role in the evolution”53 of genes, even in cases when the function of the gene under study (and thus the trait being selected) is completely unknown;54 it imagines genes being “cobbled together from DNA of no related function (or no function at all)”;55 it assumes the “creation” of new exons “from a unique noncoding genomic sequence that fortuitously evolved”;56 it invokes “the chimeric fusion of two genes”;57 it explains “near-identical”58 proteins in disparate lineages as “a striking case of convergent evolution”;59 and when no source material for the evolution of a new gene can be identified, it asserts that “genes emerge and evolve very rapidly, generating copies that bear little similarity to their ancestral precursors” because they are apparently “hypermutable.”60 Finally, when all else fails, scenarios invoke the “de novo origination” of new genes, as if that phrase—any more than the others just mentioned—constitutes a scientific demonstration of the power of mutational mechanisms to produce significant amounts of new genetic information.61

These vague narratives resemble nothing so much as the naming games of scholastic philosophers in the Middle Ages. Why does opium put people to sleep? Because it has a “dormative” virtue. What causes new genes to evolve so rapidly? Their “hypermutability” or perhaps their ability to undergo “rapid, adaptive evolution.” How do we explain the origin of two similar genes in two separate, but otherwise widely disparate lineages? Convergent evolution, of course. What is convergent evolution? The presence of two similar genes in two separate, but otherwise widely disparate lineages. How does convergent evolution occur, given the improbability of finding even one functional gene in sequence space, let alone the same gene ari
sing twice independently? No one knows exactly, but perhaps it was a “fortuitous juxtaposition of suitable sequences,” or “positive selection,” or “de novo origination.” Need to explain two similar genes in more closely related lineages? Try “gene duplication,” or “chimerical gene fusion,” or “retropositioning,” or “extensive refashioning of the genome,” or some other scientific-sounding combination of words.

The vagueness of these scenarios raises serious questions about how scientists could regard them as decisive demonstrations or refutations of anything—let alone refutations of the kind of experimentally based, mathematically precise challenges to mutation and selection described in the previous chapter.

So despite the official pronouncement of a federal judge and claims of extensive “scientific literature documenting the origin of new genes,” evolutionary biologists have not demonstrated how new genetic information arises, at least not in amounts sufficient to build protein folds, the crucial units of biological innovation. Biologists have not solved the problem of combinatorial inflation or refuted the precise quantitative argument against the creative power of the selection and mutation mechanism presented in the previous chapter (or in my 2004 article). Nor has anyone provided a compelling refutation of Douglas Axe’s assessment of the rarity of genes and proteins on which that argument is based.

‹ Prev Next ›