frequent duplications of individual genes themselves. Gene survival is highly
selective, and regulatory genes, which influence the expression of other genes, are retained in preference to genes regulating metabolism. In fact, when a whole
genome duplicates, all of the genetic material is replicated, not just the genes, and plants have mechanisms for eliminating chunks of unnecessary DNA with varying degrees of efficiency. For example, the genome of the bladderwort, Utricularia gibba, a small carnivorous plant that lives in freshwater and damp soils, is compact,
Genomes decoded a 57
despite having undergone three rounds of whole genome duplication. The effi-
cient removal of large amounts of repetitive DNA has compressed the genome,
leaving the remainder to encode for roughly the same number of genes as
Arabidopsis.60
At the opposite end of the spectrum sits the huge genome of the distinctive
gymnosperm tree Ginkgo, with its beautiful heart-shaped leaves.61 Ginkgo’s genome is 80 times larger than that of Arabidopsis. Some of its 40 000 genes may explain Ginkgo’s extraordinary resilience to insects. It has genes for synthesizing chemicals that fight insect attack directly and others for attracting the
enemies of plant-eating insects by synthesizing and releasing volatile organic
compounds. The Ginkgo genome is bigger than the notoriously large maize
genome, but only half the size of the enormous genome of the Norway spruce,
Picea abies, which has been inflated by the accumulation of large numbers of repetitive mobile elements. The huge size of the Ginkgo genome also reflects a very high proportion (more than 75%) of repetitive sequences, resulting from
both gradual accumulation over deep time, and from two ancient whole
genome duplication events.
Whole genome duplications in the distant past gained a new significance
when Claude dePamphilis and his team published a landmark investigation into
ancient whole genome duplications in plants in the journal Nature.62 They concluded that all extant flowering plants shared a very old whole genome duplica-
tion (dating to around 234 million years ago) and that all extant seed plants
shared an even older one (dating to approximately 349 million years ago). In
other words, the appearance of seed plants and the origin of flowering plants
are coincident with whole genome duplications (Figure 5). Both duplications
are marked by ‘sudden’ bursts of gene proliferation, with thousands of new
genes appearing simultaneously. These surviving genes may have formed new
genetic circuitry to regulate new developmental processes important for build-
ing the major innovations like seeds and flowers that contributed to the rise of
seed plants and flowering plants. But here is the rub. Not everyone is convinced
by this apparently elegant story. In the best traditions of science, a different
team re-analysed the data sets and felt it was premature to reach a definitive
conclusion on the exact number and timing of ancient duplications.63
As ever, there is a need for more high-quality data on additional flowering plant and gymnosperm species, and better computer algorithms, to understand better
58 a Genomes decoded
Estimated divergence time (millions of years ago)
450
350
250
150
50
0
Arabidopsis thaliana
Eu
Carica papaya
dico
Populus trichocarpa
ts
Cucumis sativus
Vitis vinifera
Gr
Oryza sativa
asse
Sorghum bicolor
s
Ancestral
flowering
Aristolochia fimbriata
Ba
Ancestral
sal fl
plants
seed plant
Liriodendron tulipifera
pl
WGD
ants
ow
WGD
Nuphar advena
ering
Amborella trichopoda
Gymnosperms
Lycophyte ( Selaginella
moellendorffii)
Moss ( Physcomitrella patens)
Figure 5 Controversial evidence suggesting whole genome duplication (WGD) events in the ancestors of seed plants and flowering plants. Eudicots are a clade of flowering plants.
the genomic driving force of plant evolution. Nevertheless, the fingerprints of
gene duplication in driving innovation are found in the details of flower evolu-
tion. Enrico Coen and Elliot Meyerowitz proposed the elegant ABC genetic model
of floral identity over twenty-five years ago.64 In their model, the action of several genes together specify the formation of the component parts of flowers—sepals,
petals, stamens, and carpels (Figure 6). Moving from the outside of the flower
inwards, the sequence of gene actions is as follows. The A function genes specify sepals, the A and B function genes specify petals, B and C function genes specify stamens, and C function alone specifies carpels. Finally, the E function genes are required in conjunction with those of the other floral regulators for correct organ specification.
Today’s flowering plants all share more or less the same core ABC genetic cir-
cuitry to produce the enormously wide variations in flowers, and many of the genes controlling the different functions are duplicates of each other.65 The startling implication is that the earliest flowering plants possessed a full complement
Genomes decoded a 59
Carpel
Stamen
Stamen
Petal
Petal
Sepal
Sepal
A
E
A
B
B
C
Figure 6 The ABC genetic model of flower evolution.
of the basic genetic information needed to assemble a modern flower. Increas-
ingly elaborate flowers may have evolved by refashioning the developmen t al cir-
cuitry that originally controlled sterile leaves. In fact the origins of flowers go further back still because a similar gene system already seems to have existed in the ancestral seed plant, given what we know about the role of B and C class genes in gymnosperms. C class genes specify a female cone, whereas B plus C classes
specify male cones. Creating a flower, then, requires the evolutionary assembly of these ‘cones’ in the same shoot tip (rather than separate organs as in gymnosperms) to create a bisexual shoot, adding some more sterile whorls surrounding
the sexual organs with A class genes, and finally enclosing the ovules with fused leaf-like organs, i.e., a carpel.66
Insights from genome biology go further by offering an original take on
Darwin’s ‘abominable mystery’—what he perceived to be the abrupt origin and
dramatic radiation of flowering plants. In Darwin’s time, flowering plants were
thought to enter the fossil record very suddenly and apparently ‘fully evolved’ in the mid-Cretaceous, about 100 million years ago. This prompted him to speculate
about a long pre-Cretaceous history for flowering plants, perhaps in remote areas that left no fossil signals for palaeobotanists to unearth. With the benefits of
continued exploration and discovery of the fossil record since Darwin’s day, the
60 a Genomes decoded
date of the first flowering plants has been pushed back to the early Cretaceous,
140 million years ago.67 In a sense, then, Darwin’s mystery is solved. Flowering
plants originated far earlier than he thought and flo
wers, as we have seen, arose partly because genome duplications provided the genetic material for flower evolution. But how do we explain their explosive diversification within a few tens of millions years in the Cretaceous?
One answer, provided by genome biology, links diversification to duplica-
tion. Whole-genome duplications have been reported for the Brassicaceae
(3700 species), Asteraceae (25 000 species), Fabaceae (19 400 species), and the
Solanaceae (over 3000 species), to name but a few.68 In these families, genome
duplication seems to correlate with species-rich plant families. The common
ancestor of the Asteraceae (daisies), the largest family of flowering plants, for example, has a well-dated genome duplication event tightly linked to sharp
increases in diversification rates.69 The origin of the monocotyledons, which
gave rise to some 11 000 species of grasses, is also associated with a whole
genome duplication event.70 The origin of grasses with the C photosynthetic
4
pathway in tropical savannas, the expansion of which at the expense of low-
latitude forests took place during the Miocene (11–5 million years ago), involved grasses of the Andropogoneae tribe. Grass species in this tribe experienced
more than 30 genome duplications in a comparatively narrow interval of geo-
logical time, tempting us to speculate that this helped them deal with the
changing climate and atmospheric conditions of the time by giving them new
ecological tolerances.71
Later, from about 10 000 years ago, domestication and cultivation of wild
grasses began in the Fertile Crescent of the Middle East, and saw hunter-gatherer societies replaced by farming communities. Five grasses—rice, maize, wheat,
barley, and sorghum—emerged from the process, and today feed the world. The
ancestor of these grasses underwent a whole genome duplication event around
90 million years ago and again 70–50 million years ago72 to create the ancestral
genome found in modern cereals. The great insight leading to this picture came
from the simple observation that the order of blocks of genes on the chromo-
somes of grasses is the same. It is the same in rice and wheat, for instance, despite hugely different genome sizes. Clever detective work showed how each cereal
genome derives from the cleavage of a single structure, the hypothetical ‘ances-
tral’ genome.73 The genomes of maize, sorghum, and rice record their millennial
Genomes decoded a 61
dialogue with humans during domestication, telling us how farmers on the
American, African, and Asian continents all independently selected for similar
traits—higher harvests and greater resistance to disease. The story is written in the genomes of the cereals we consume without a second thought.
We can also tie specific innovations in flowering plants to genome duplication
events that ignited an arms race between plants and insects and contributed to the explosive diversification that puzzled Darwin. Consider the brassicas, the genus
that includes mustards and cabbages, which defend themselves against herbi-
vores by synthesizing protective chemical compounds. Nearly 90 million years
ago, the ancestors of brassicas developed a particular set of chemical defence
compounds (called glucosinolates) that are toxic to most insects, and as a conse-
quence triggered an evolutionary arms race with the caterpillars of the butterflies that fed on the plants.74 Each time the brassicas evolved increases in chemical
defence complexity, a burst of plant diversification followed. In response, the
brassica-feeding butterflies (Pierinae) evolved the ability to detoxify the com-
pounds. Freed from the toxicity of their food plants, butterflies underwent their own burst of diversification until the plants escalated the situation by evolving another chemical elaboration to defend themselves.
Modern DNA sequencing has shown genome duplication events enabled
advances in the chemical weapon sophistication of the brassicas, with the arms
race between plants and butterflies that started 90 million years ago continuing; modern brassicas are now able to synthesize more than 120 different varieties of
glucosinolates. The biologists Paul Ehrlich and Peter H. Raven first proposed the idea that plant–insect interactions generate biodiversity nearly half a century
ago.75 Written in the DNA code, we later discovered, is evidence supporting their hypothesis, with repeated escalation between chemical innovations and bursts of
diversification in brassicas and Pierinae butterflies. The parsley family (Apiaceae) and the butterflies (Papilionidae) and moths (Oecophoridae) that attack it are
probably also engaged in a similar arms race; the details are yet to be deciphered.
None of this is to deny the traditional co-evolution of flowers and their pollinators, or fruits and their dispersers, as drivers of biological diversification, but rather to show how genome biology offers fresh explanatory insights into modern
biological diversity.
Q
62 a Genomes decoded
Figure 7 The Korean-born geneticist Susumu
Ohno (1928–2000).
Over half a century ago, Susumu Ohno (1928–2000) (Figure 7), a prominent and
influential biologist, proposed an extraordinary idea. In his 1970 book, Evolution by Gene Duplication, he presciently postulated that two rounds of whole genome duplication accompanied great transitions in the evolutionary history of life on
Earth.76 The mainstay of evolutionary theory (at that time) was that random
mutations in existing genes can lead to organisms being better adapted to their
circumstances, thereby making them more likely to survive and reproduce. In
this way, natural selection could gradually improve the ‘fitness’ of a lineage in the long-run. But in the preface to his prescient book, he wrote:
Had evolution been entirely dependent upon natural selection, from a bacterium
only numerous forms of bacteria would have emerged. The creation of metazoans,
vertebrates, and finally mammals from unicellular organisms would have been
quite impossible, for such big leaps in evolution required the creation of new gene loci with previously nonexistent function.
The basic idea is that the extra gene copies resulting from a genome duplication
might, in the right circumstances, evolve new functions, for major innovations in body plans as well as other modifications, whilst allowing the functions of the
original genes to be maintained. Ohno’s radical idea is distinguished from what
went before because he proposed evolution through the duplication of genomes,
rather than alteration to individual genes. He insisted organisms could deal with
Genomes decoded a 63
extra genomes more readily than they could duplicate copies of individual genes.
His idea was that changes in proteins coded for by an extra copy of a particular
gene could upset the metabolism of an organism more than the duplication of the
entire genome. The explanation is down to a phenomenon called dosage com-
pensation, a process by which organisms equalize the expression of genes to
compensate for the change in the normal number of chromosomes present.
By proposing this explanation, he was resurrecting an earlier idea originally
advanced by the evolutionary biologist J.B.S Haldane (1892–1964). In what became
known as his 2R hypothesis (R for Round of duplication), Ohno suggested a first
round of duplication made the transition from invertebrates to vertebrates
possible, and a second led to the dive
rsification of vertebrates. Of course, the
naysayers were understandably critical right from the off, given the paucity of
data at that time. Indeed, back in the 1970s, the idea was widely regarded as ‘outra-geous’, not least because of the lack of sufficient genomic information to test it.
With the explosion in modern biology of molecular genetics and genome
sequencing data, Ohno’s radical suggestion could be critically evaluated, and the evidence suggests he may have been right all along.
Over the course of 500–600 million years of vertebrate evolution, two rounds
of whole genome duplication have since been detected, both near the base of the
vertebrate tree of life, and both followed by substantial increases in complexity.77
Rapid innovation, leading to enhanced nervous, endocrine, and circulatory sys-
tems, enhanced sensory organs, complex brains, skulls, vertebrae, an endoskeleton, and teeth, followed the first duplication deep in the Cambrian.78 These changes
were followed in vertebrate lineages by innovations such as paired limbs/fins,
hinged jaws, and improved immune systems. Later, bony fishes (teleosts) were
found to share a third rapid diversification dated to 316–226 million years ago,
straddling the end-Permian mass extinction; this fits with Ohno’s proposed third
duplication, which he added later in his career. Long before all of this evolutionary action, an exceptionally ancient gene duplication may have happened before
plants split from animals and fungi79 (~2.7 billion years ago). Today, it seems more than a coincidence that whole genome duplications are followed by substantial
increases in the complexity of major groups of organisms throughout the history
of life on Earth, including plants, fish, vertebrates, and fungi. Not even Ohno fore-saw this radical possibility. Towards the latter part of his career, he famously married chemical and musical composition, turning DNA sequences into musical
64 a Genomes decoded
pieces. To the delight of many (others were appalled), he had the results recorded by a violinist and a pianist, turning the genetic code of primitive animals into passages of sound.80
As the genomic revolution progresses, we are discovering that most species of
animals and plants are descended from ancestors that underwent whole genome
Making Eden Page 10