Darwin's Doubt

Page 24

by Stephen C. Meyer

Upon closer examination, however, none of these papers demonstrate how mutations and natural selection could find truly novel genes or proteins in sequence space in the first place; nor do they show that it is reasonably probable (or plausible) that these mechanisms would do so in the time available. These papers assume the existence of significant amounts of preexisting genetic information (indeed, many whole and unique genes) and then suggest various mechanisms that might have slightly altered or fused these genes together into larger composites. At best, these scenarios “trace” the history of preexisting genes, rather than explain the origin of the original genes themselves (see Fig. 11.2).

This kind of scenario building can suggest potentially fruitful avenues of research. But an obvious error comes in mistaking a hypothetical scenario for either a demonstration of fact or an adequate explanation. None of the scenarios that the Long paper cites demonstrate the mathematical or experimental plausibility of the mutational mechanisms they assert as explanations for the origin of genes. Nor do they directly observe the presumed mutational processes in action. At best, they provide hypothetical, after-the-fact reconstructions of a few events out of a sequence of many supposed events, starting with the existence of a presumed common ancestor gene. But that gene itself does not represent a hard data point. It is inferred to have existed on the basis of the similarity of two or more other existing genes, which are the only actual pieces of observational evidence upon which these often elaborate scenarios are based.

FIGURE 11.1

Various types of mutations that are alleged to result in the modification of genes: exon shuffling, retropositioning, lateral gene transfer, and gene fusion.

FIGURE 11.2

Depiction of how gene duplication and subsequent gene evolution might take place. While the gene on the bottom remains under selective pressure and cannot experience many mutations without loss of fitness or function (see Figure 10.3), the duplicated gene at the top can in theory vary without deleterious consequences to the organism.

That these scenarios depend on various inferences and postulations doesn’t, by itself, disqualify them from consideration. Nevertheless, whether they adequately explain the origin of genetic information depends upon the evidence for the existence of the entities they infer (the ancestral genes) and the plausibility of the mutational mechanisms they postulate. Let’s look at both parts of these scenarios.

Common Ancestor Genes?

Nearly all of the scenarios developed in the papers that Long cites start with an inferred common ancestral gene from which two or more modern genes diverged and developed. These scenarios treat the similarity of sequence (the information) in two or more genes as unequivocal evidence for a common ancestral gene (see Fig. 11.2). As I noted in Chapters 5 and 6, standard methods of phylogenetic reconstruction presuppose, rather than demonstrate, that biological similarity results from shared ancestry. Yet, as we saw in Chapter 6, similarity of sequence by itself is not always an unequivocal indicator of common ancestry. Sometimes similarity appears between species where it cannot be explained by inheritance from a common ancestor (e.g., the similar forelimbs on moles and mole crickets) and, at the very least, there are other possible explanations for sequence similarity.

In the first place, similar gene sequences might have evolved independently on two parallel lines of descent starting with two different genes, as the hypothesis of convergent evolution asserts. Recent examples of convergent genetic evolution now abound in the literature of molecular and evolutionary biology.13 For example, molecular biologists have discovered that both whales and bats use similar systems—involving similar genes and proteins—for echolocation. The striking similarity of these systems used in two otherwise disparate mammalian species has led biologists to posit the parallel evolution of echolocation, including the gene sequences and proteins that make it possible, from a common ancestor that did not possess this system.14

In addition, it is possible that similar genes might have been separately designed to meet similar functional needs in different organismal contexts. Viewed this way, similarity of sequence does not necessarily reflect descent with modification from a common ancestor, but could reflect design in accord with common functional considerations, constraints, or goals. I recognize, of course, that to this point I have not given any independent reasons for considering the design hypothesis, and that, as a hypothesis for sequence similarity by itself, intelligent design may not seem compelling. (For more compelling reasons to consider intelligent design, see Chapters 17 through 19.) Nevertheless, I mention both these other possible explanations for the similarity of gene sequences in order to demonstrate that sequence similarity does not necessarily indicate, or derive from, a common ancestral gene.

ORFan Genes

Some genes, and the information-rich sequences they contain, most certainly cannot be explained by reference to the kind of scenarios that Long cites. All of these scenarios attempt to explain the origin of two similar genes by reference to descent with modification (via mutation) from common ancestral genes. Yet genomic studies are now turning up hundreds of thousands of genes in many diverse organisms that exhibit no significant similarity in sequence to any other known gene.15 These “taxonomically restricted genes” or “ORFans” (for “open reading frames of unknown origin”) now dot the phylogenetic landscape. ORFans have turned up in every major group of organisms, including plants and animals as well as both eukaryotic and prokaryotic one-celled organisms. In some organisms, as much as one-half of the entire genome comprises ORFan genes.16

Thus, even if it could be assumed that similar gene sequences always point to a common ancestor gene, these ORFan genes cannot be explained using the kind of scenarios that Long’s article cites. Since ORFans lack sequence similarity to any known gene—that is, they have no known homologs in even distantly related species—it is impossible to posit a common ancestral gene from which a particular ORFan and its homolog might have evolved. Remember: ORFans, by definition, have no homologs. These genes are unique—one of a kind—a fact tacitly acknowledged by the increasing number of evolutionary biologists who attempt to “explain” the origin of such genes through de novo (“out of nowhere”) origination.

Some might argue that as biologists map the sequence of more genomes and add more gene sequences to protein databases, homologs of these ORFans will eventually turn up, thus gradually eliminating the mystery surrounding the ORFan phenomenon. Yet to date the trend has gone in the opposite direction. As scientists have explored and sequenced more genomes, they have discovered more and more ORFans without finding anything like a corresponding number of homologs. Instead, the number of “unpaired” ORFan genes continues to grow with no sign of the trend reversing itself.17

The Plausibility of the Mutational Processes

Even if evolutionary biologists could establish the existence of the common ancestral genes from which their scenarios begin, that would not establish the plausibility of a neo-Darwinian mechanism for generating genetic information from that ancestor. Moreover, the term “plausibility” in this context has a specific scientific and methodologically significant meaning. Studies in the philosophy of science show that successful explanations in historical sciences such as evolutionary biology need to provide “causally adequate” explanations—that is, explanations that cite a cause or mechanism capable of producing the effect in question. In On the Origin of Species, Darwin repeatedly attempted to show that his theory satisfied this criterion, which was then called the vera causa (or “true cause”) criterion. In the third chapter of the Origin, for example, he sought to demonstrate the causal adequacy of natural selection by drawing analogies between it and the power of animal breeding and by extrapolating from observed instances of small-scale evolutionary change over short periods of time.

In this, Darwin hewed to a principle of scientific reasoning that one of his scientific role models, the great geologist Charles Lyell, used as a guide for reasoning about events in the re
mote past. Lyell insisted that good explanations for the origin of geological features should cite “causes now in operation”—causes known from present experience to have the capacity to produce the effects under study.18

Do the scenarios developed by various evolutionary biologists cited in the Long review essay meet this criterion? Duplication mutations and various other modes of random mutational change along with natural selection clearly constitute “causes now in operation.” No one disputes that. But have these processes demonstrated the capacity to produce the effect in question, namely, the genetic information necessary to structural innovation in the history of life? There are several good reasons to think that they have not.

Begging Questions

First, most of the mutational processes that evolutionary biologists invoke in the scenarios cited in the Long essay presuppose significant amounts of preexisting genetic information on preexisting genes or modular sections of DNA or RNA. The Long essay highlights seven main mutational mechanisms at work in the sculpting of new genes: (1) exon shuffling, (2) gene duplication, (3) retropositioning of messenger RNA transcripts, (4) lateral gene transfer, (5) transfer of mobile genetic units or elements, (6) gene fission or fusion, and (7) de novo origination (see Fig. 11.1). Yet each of these mechanisms, with the exception of de novo generation, begins with preexisting genes or extensive sections of genetic text. This preexisting functionally specified information is in some cases enough to code for the construction of an entire protein or a distinct protein fold. Moreover, these scenarios not only assume unexplained preexisting sources of biological information, they do so without explaining or even attempting to explain how any of the mechanisms they envision could have solved the combinatorial search problem described in Chapters 9 and 10.

A closer look at each of these mechanisms will show why scenarios that rely on them beg important questions about the origin of genetic information.

Advocates of exon shuffling envision modular sections of a genome randomly arranging and rearranging themselves to generate entirely new genes, not unlike rearranging whole paragraphs in an essay to generate a new article. In genomes that have regions that code for the production of proteins interspersed with regions that do not code for proteins, the term “exon” refers to a protein-coding region of the genome. These protein-coding regions of the genome are often interrupted by nonprotein-coding sections of the genome (called introns) that serve other functions, such as coding for the production of regulatory RNAs. In any case, exons store significant quantities of preexisting functionally specified information.

Though most proteins are encoded by multiple exons, a single exon may encode a substantial unit of protein structure, such as a functional protein fold—a fact that advocates of exon shuffling count in their own attempts to explain novel proteins. They assume that exons can be blindly shuffled and mixed around to form genes. Nevertheless, this mechanism cannot produce new protein folds. Either an exon is large enough that it already encodes a protein fold—in which case it’s not creating a new fold—or it’s too small, small enough that multiple exons must be combined in order to form a stable protein fold. In this latter case, other problems—in particular, something called adverse side-chain interaction—will preclude success, as we will see.

Evolutionary scenarios envisioning other mutational mechanisms also presuppose important sources of preexistent genetic information. Gene duplication, as the name implies, involves the production of a duplicate copy of a preexisting gene, already rich in functionally specified information. Retropositioning of messenger RNA transcripts occurs when an enzyme called reverse transcriptase takes a preexisting strand of messenger RNA and inserts its corresponding DNA sequence into a genome, also producing a duplicate of the coding portion of a preexisting gene. Lateral gene transfer involves transferring a preexisting gene from one organism (usually a bacterium) into the genome of another. The transfer of mobile genetic elements likewise occurs when preexisting genes enclosed in circular strands of DNA called plasmids enter one organism from another and eventually find themselves incorporated into a new genome. This process also mainly occurs in single-celled organisms. A similar process can occur in eukaryotes, where mobile genetic elements called transposons—often called “jumping genes”19—can hop from place to place in the genome. Gene fusion occurs when two adjacent preexisting genes, each rich with specified genetic information, link together after the deletion of intervening genetic material.”20

Each of these six mutational mechanisms presupposes preexisting modules of specified genetic information. Some of these mutational mechanisms also depend upon sophisticated preexistent molecular machines such as the reverse transcriptase enzyme used in retropositioning or other complex cellular machinery involved in DNA replication. Since building these machines requires other sources of genetic information, scenarios that presuppose the availability of such molecular machines to assist in the cutting, splicing, or positioning of modular sections of genetic information clearly beg the question.

Overall, what evolutionary biologists have in mind is something like trying to produce a new book by copying the pages of an existing book (gene duplication, lateral gene transfer, and transfer of mobile genetic elements), rearranging blocks of text on each page (exon shuffling, retropositioning, and gene fusion), making random spelling changes to words in each block of text (point mutations), and then randomly rearranging the new pages. Clearly, such random rearrangements and changes will have no realistic chance of generating a literary masterpiece, let alone a coherent read. That is to say, these processes will not likely generate specificity of arrangement and sequence and, therefore, do not solve the combinatorial search problem. In any case, all such scenarios also beg the question. There is a big difference between shuffling and slightly altering preexisting sequence-specific modules of functional information and explaining how those modules came to possess information-rich sequences in the first place.

Evolution Ex Nihilo?

Long does cite at least one type of mutation that does not presuppose existing genetic information, the de novo origination of new genes. For example, one paper he discusses sought to explain the origin of a promoter region for a gene (the part of the gene that helps initiate the transcription of the gene’s instructions) and found that “this unusual regulatory region did not really ‘evolve.’ ” Instead, it somehow snapped into being: “It was aboriginal, created de novo by the fortuitous juxtaposition of suitable sequences.”21

Many other papers invoke de novo origination of genes. Long mentions, for example, a study seeking to explain the origin of an antifreeze protein in an Antarctic fish that cites “de novo amplification of a short DNA sequence to spawn a novel protein with a new function.”22 Likewise, Long cites an article in Science to explain the origin of two human genes involved in neurodevelopment that appealed to “de novo generation of building blocks—single genes or gene segments coding for protein domains,” where an exon spontaneously “originated from a unique noncoding sequence.”23 Other papers make similar appeals. A paper in 2009 reported “the de novo origin of at least three human protein-coding genes since the divergence with chimp[s],” where each of them “has no protein-coding homologs in any other genome.”24 An even more recent paper in PLoS Genetics reported “60 new protein-coding genes that originated de novo on the human lineage since divergence from the chimpanzee,”25 a finding that was called “a lot higher than a previous, admittedly conservative, estimate.”26

Another 2009 paper in the journal Genome Research was appropriately titled “Darwinian Alchemy: Human Genes from Noncoding RNA.” It investigated the de novo origin of genes and acknowledged, “The emergence of complete, functional genes—with promoters, open reading frames (ORFs), and functional proteins—from ‘junk’ DNA would seem highly improbable, almost like the elusive transmutation of lead into gold that was sought by medieval alchemists.”27 Nonetheless, the article asserted without saying how that: “evolution by natural selection
can forge completely new functional elements from apparently nonfunctional DNA—the process by which molecular evolution turns lead into gold.”28

The presence of unique gene sequences forces researchers to invoke de novo origin of genes more often than they would like. After one study of fruit flies reported that “as many as ~12% of newly emerged genes in the Drosophila melanogaster subgroup may have arisen de novo from noncoding DNA,”29 the author went on to acknowledge that invoking this “mechanism” poses a severe problem for evolutionary theory, since it doesn’t really explain the origin of any of its “nontrivial requirements for functionality.”30 The author proposes that “preadaptation” might have played some role. But that adds nothing by way of explanation, since it only specifies when (before selection played a role) and where (in noncoding DNA), not how the genes in question first arose. Details about how the gene became “preadapted” for some future function is never explained. Indeed, evolutionary biologists typically use the term “de novo origination” to describe unexplained increases in genetic information; it does not refer to any known mutational process.

Taking stock, then, many of the mutational processes that Long cites either: (1) beg the question as to the origin of the specified information contained in genes or parts of genes, or (2) invoke completely unexplained de novo jumps—essentially evolutionary creation ex nihilo (“from nothing”).

Thus, ultimately, the scenarios featured in Long’s review essay do not explain the origin of the specified information in either genes or sections of genes. That would require a cause capable of solving the combinatorial inflation problem discussed in the previous chapters. But none of the scenarios discussed in Long’s article even addresses this problem, let alone demonstrates the mathematical plausibility of the mechanisms they cite. Yet, Gishlick, Matzke, and Elsberry originally cited Long as a definitive refutation of my article—the one in which I argued that the rarity of genes and proteins in sequence space cast doubt on the power of selection and mutations to generate novel genetic information. Professor Miller, in his testimony at the celebrated Dover trial, even convinced a federal judge to affirm that Long had succeeded in demonstrating how genetic information originates in a celebrated legal ruling. Clearly, one cannot solve a problem or refute an argument by failing to address it.31

‹ Prev Next ›