Book Read Free

Life's Greatest Secret

Page 28

by Matthew Cobb


  Some parts of our genome seem to be pure selfish DNA – sequences that apparently have no function beyond to survive.91 Some of these genetic elements, which riddle our genome, are the remnants of what are effectively genetic parasites – transposons. Transposons are sequences of DNA that can move about the genome, jumping from one location to another. They probably originated as RNA retroviruses that copied themselves into DNA and then became trapped in our genomes. They no longer produce viral RNA but retain the ability to move from place to place in our genome by producing an enzyme called transposase, which effectively unglues them from the DNA strand. Occasionally, the transposon may land in or next to a protein-encoding gene that then hijacks the transposon and converts its activity into a product that is delivered into the cell, thereby leading to the evolution of a new gene.92 Transposons, and their potential regulatory functions, were first identified by Barbara McClintock at Cold Spring Harbor in the 1940s, to widespread disbelief in the scientific community. Not only was she right, in 1983 she was awarded the Nobel Prize in Physiology or Medicine for her discovery – the only woman to be the sole recipient of this prize.

  Over evolutionary time, transposon sequences accumulate mutations in the part of their genome that codes for transposase, and cease to be able to move. They become frozen in our DNA, recognisable but immobile. The remnants of these invasive DNA sequences make up an astonishing 45 per cent of the human genome, with one element, known as Alu, leaving genetic traces that make up to 10 per cent of your DNA. Sometimes bits of these pseudogenes can even be transcribed, producing short bits of RNA that can regulate the activity of genes.93 Apart from their potential transformation into transposons, retroviruses can play a direct evolutionary role – it is probable that the origin of the placenta in mammals was due to our distant ancestors being infected with a retrovirus that produced a protein called syncytin that is now essential for the development of the placenta. This infection seems to have occurred several times in the mammalian lineage, perhaps explaining the varying forms of this organ in different mammals.94

  The long stretches of non-coding DNA lie at the heart of the most mysterious result that has been discovered since the beginning of widespread genomic sequencing. Different species can have substantial differences in the size of their genomes, which do not seem to be related to anything in their ecology or degree of apparent physiological complexity. For example, the genome of the ‘primitive’ lungfish is 350 times larger than that of the pufferfish. No one has been able to come up with any explanation for why this might be. This problem is called the ‘C-value paradox’ or ‘C-value enigma’ – ‘C’ is the amount of DNA in a genome.95 Some of these differences may be due to a well-known phenomenon: chunks of genomes can be duplicated during evolution, particularly in plants, which can double their genome size in one generation when chromosome duplication goes slightly awry. Because of factors such as duplication, the variation in genomic size that we see between species resists any overall functional explanation. This is highlighted by what is known jocularly as the onion test: the onion genome contains around 16 billion base pairs, or five times that of a human. It is hard to explain this in terms of the contrasting physiology and behaviour of the two organisms, or to imagine that every one of these bases is necessary to the onion.96

  In the late 1950s and early 1960s, researchers began to use the term ‘junk DNA’ to describe DNA that had no apparent function.97 In 1972, Susumu Ohno defined junk DNA as a sequence that cannot be affected by a deleterious mutation. According to this definition, junk DNA is a sequence that, if it were changed, would have no effect on the organism’s fitness (that is, on its success in passing its genes onto the next generation). Both pseudogenes and the remnants of tranposon activity would seem to be junk DNA, but scientists argue about this term, and some dispute whether any DNA can truly be considered junk.

  In September 2012 this rather arcane debate erupted onto the pages of the press and on the Internet, focused on the question of what the human genome actually does. This was prompted by the publication of the findings of a large-scale project to study the cellular activity of the whole of the human genome, called ENCODE (Encyclopaedia of DNA Elements). The results of the ENCODE project were published in an unprecedented wave of thirty papers, signed by 442 authors, backed up by a web site and an iPad app. The leaders of the project claimed that 80 per cent of the human genome could be assigned a ‘biochemical function’; the coordinator, Ewan Birney, went on to claim that the final figure would ‘likely go to 100%’.98 This led to great excitement in the press: Science proclaimed that ENCODE had written the ‘eulogy’ for junk DNA, the New York Times stated that ENCODE had shown that 80 per cent of the human genome was ‘critical’ and ‘needed’, while The Guardian trumpeted ‘Breakthrough study overturns theory of “junk DNA” in genome’.99 This hyperbole led to a backlash on the Internet and in scientific publications as scientists who had not been involved in the project disagreed with the suggestion that there was no ‘junk DNA’, or that 80 per cent of our genome is ‘functional’.100

  The argument turned on the meaning of the word ‘function’. The ENCODE project deliberately cast its net wide by looking for a ‘reproducible biochemical signature’, which they defined as any consistent biochemical reaction induced by a given stretch of DNA, from mRNA production to protein binding.101 That was where the 80 per cent figure came from. The computational biologist Sean Eddy pointed out that the ENCODE study lacked what scientists call a ‘negative control’ – a set of DNA sequences that did not have any function, by any definition, and should therefore have not been identified as functional by the biochemical criteria used by ENCODE.102 Shortly afterwards, a paper appeared in which researchers carried out this experiment: they randomly generated 1,300 DNA sequences and found that most of these artificial sequences were ‘functional’ according to the ENCODE criteria. This suggested that the ENCODE definition could not systematically discriminate between random bases and DNA that has some kind of biochemical role in the cell.103 The lead author of the study, Mike White, wrote:

  most DNA will look functional at the biochemical level. The inside of a cell nucleus is a chemically active place. The real puzzle is this: how does functional DNA manage to distinguish itself from the vast excess of dead transposable elements, pseudogenes, and other accumulated junk?104

  That question remains unanswered.

  In 2014, the ENCODE consortium published a second wave of papers and seemed to back away from their earlier headline claim of 80 per cent function, admitting that ‘it is not at all simple to establish what fraction of the biochemically annotated genome should be regarded as functional’. Instead, they emphasised their indisputable finding that an important part of the human genome seems to induce reliable biochemical activity of some kind:

  The major contribution of ENCODE to date has been high-resolution, highly-reproducible maps of DNA segments with biochemical signatures associated with diverse molecular functions. We believe that this public resource is far more important than any interim estimate of the fraction of the human genome that is functional.105

  For the moment, despite the initial claims of ENCODE, and despite the fact that much of the genome seems to be transcribed into RNA in one form or another, a substantial proportion of our DNA, and that of other organisms, seems to have no discernible role in our existence and could be deleted without causing any selective disadvantage. Future discoveries may change this view, but in those strict terms, much of our DNA still appears to be ‘junk’.

  * In 1988 the historian Jan Witkowski pointed out that there had been no historical study of this period. Although that might not have been unusual in 1988, it is surprising that nearly 40 years after the discovery there has still been no detailed historical analysis of the revolution and its implications.

  * In his Nobel Prize address, Mullis said that when he realised what he had dreamt up, he stopped his car at mile marker 46.7 on Highway 128 and scribbled down the essential
elements of the technique.

  * The answer, surprisingly, is whale.

  * In 2004, the remains of another archaic human, Homo floresiensis, were discovered in a tropical cave on the island of Flores in Indonesia (Brown et al., 2004; Morwood et al., 2004). H. floresiensis, popularly known as ‘the hobbit’, inhabited the cave until about 18,000 years ago. For the moment, it is not possible to extract DNA from bones preserved in such bacteria-rich conditions, but this may change (Callaway, 2014a).

  –THIRTEEN–

  THE CENTRAL DOGMA REVISITED

  In his 1957 lecture, Francis Crick outlined what he called the central dogma of molecular genetics:

  once information (meaning here the determination of a sequence of units) has been passed into a protein molecule it cannot get out again, either to form a copy of the molecule or to affect the blueprint of a nucleic acid.1

  Information can get out of DNA into RNA to determine the structure of a protein, but proteins cannot specify the sequence of new proteins, and the information in proteins cannot make the reverse journey back into your genes – your DNA cannot be rewritten by a protein. The central dogma has been the focus of repeated criticism over the past sixty years, partly because of the discovery of new facts, and partly because the unfortunate term ‘dogma’ tends to be a lightning rod for debate.

  In 1970, Nature magazine trumpeted ‘Central dogma reversed’ when it was discovered that information can flow from RNA into DNA. Nature’s claim was prompted by a discovery that explained how RNA viruses can infect healthy cells and transform them into cancerous cells that produce viruses. In 1964, Howard Temin, a 30-year-old cancer researcher at the University of Wisconsin-Madison, had boldly suggested that the basis of this effect was that RNA viruses turned their RNA code back into DNA, which was then integrated into the host’s chromosome, where it de-regulated cell growth and produced more virus RNA. There was no known mechanism whereby such a ‘reverse transcription’ could take place, so Temin was forced to hypothesise the existence of an enzyme that could carry out that task, transcribing the RNA virus into DNA. In 1970, Temin was proved right, when he, along with 32-year-old David Baltimore at MIT, reported the existence of an enzyme in RNA viruses that copies RNA into DNA. This enzyme, now called reverse transcriptase, enables information to flow from RNA back to DNA. Nature magazine, which published Baltimore’s paper, editorialised somewhat pompously:

  The central dogma, enunciated by Crick in 1958 and the keystone of molecular biology ever since, is likely to prove a considerable over-simplification.2

  Piqued by the tone of the editorial, Crick replied in the pages of the journal, graciously acknowledging Temin and Baltimore’s ‘very important work’ and setting the record straight with regard to what he had argued thirteen years earlier. Crick’s original hypothesis explored all the possible transfers of information between nucleic acids and proteins, and prohibited only those, such as protein → DNA, that either had been excluded experimentally or for which there was no conceivable mechanism. In subsequent years, this rich view tended to be replaced by the cruder DNA → RNA → protein, as summarised in Jim Watson’s influential 1965 textbook, Molecular Biology of the Gene.3 In 1957, Crick considered that the RNA → DNA step was ‘rare or absent’, but not impossible. As Crick pointed out in 1970, there was no ‘good theoretical reason why the transfer RNA → DNA should not sometimes be used. I have never suggested that it cannot occur, nor, as far as I know, have any of my colleagues.’4

  Crick’s view of the significance of Temin and Baltimore’s discovery was that the RNA → DNA transfer probably did not occur in most cells but might take place in special circumstances such as some viral infections. Temin was not so restrained, and within a year he was arguing that RNA → DNA information transfers were a fundamental part of normal development in the somatic cell line (that is, in all cells except the egg and sperm). As a result, he claimed, ‘new DNA sequences are formed by this process during the lifetime of a single organism’.5 According to Temin, reverse transcription was an everyday process, helping to shape how our cells develop – except that it is not, and reverse transcriptase is only ever found in cells infected by a particular class of RNA viruses called retroviruses. In this respect, Crick was right, and Temin – and the editorial writers at Nature – were wrong.

  Although Temin’s more extreme claims were misplaced, his discovery of reverse transcriptase was significant because it showed how viruses could cause cancer by altering the DNA of the cells they infect and by deregulating genes, leading to uncontrolled growth. The enzyme also went on to play an important role in the development of molecular genetics and of our ability to genetically modify organisms by introducing new sequences into DNA – it is used by scientists to make complementary DNA (cDNA) from mature mRNA. In 1975, Temin and Baltimore, along with Temin’s PhD supervisor, Renato Delbucco, won the Nobel Prize in Physiology or Medicine for their work on how cancer viruses affect our genes.

  *

  In his 1970 clarification of exactly what he meant by the central dogma, Crick highlighted three kinds of information transfer that he postulated would never occur: protein → protein, protein → DNA and protein → RNA. However, even as he made such a clear prediction, Crick was cautious, underlining our ignorance and the fragility of the evidence upon which he based his slightly revised ‘dogma’:

  our knowledge of molecular biology, even in one cell – let alone for all organisms in nature – is still far too incomplete to allow us to assert dogmatically that it is correct.6

  And in the very next sentence he highlighted a potential exception:

  There is, for example, the problem of the chemical nature of the agent of the disease scrapie.

  Scrapie is a neurodegenerative disease affecting sheep and goats that has been known for hundreds of years. In 1970, its cause was mysterious – the disease-causing agent was known to be resistant to heat, formalin, ultraviolet radiation and ionising radiation (all of which destroy nucleic acids and inactivate viruses) and it left no sign of infection in the animal’s immune system. This curious set of facts led some scientists to argue that scrapie was in fact a genetic disorder rather than an infectious disease. Others daringly suggested that the scrapie infectious agent was a protein – this was what lay behind Crick’s remark in 1970. At this time, all known infectious agents were based on nucleic acids and were either organisms or viruses. A protein-based infectious agent would be a truly radical discovery, and this suggestion was therefore treated with some scepticism.7

  In 1982, Stanley Prusiner’s group discovered that scrapie could be detected by the presence of a protein that was also a potential infectious agent – they called it the prion protein, and it seemed to act by altering the shape of non-infectious proteins that were otherwise identical to the prion.8 This was utterly novel, both because it suggested that a protein could transmit a disease and because it implied that the prion might breach the central dogma by allowing the transmission of information from protein → protein. In the 1940s, Mirsky had suggested that there might be minute levels of protein contamination in Avery’s purified DNA; in the 1980s, some of Prusiner’s critics argued that there must be small amounts of nucleic acid in the apparently pure prion protein extracts. The prejudices that prevented some scientists from accepting Avery’s discovery that DNA is the hereditary material reappeared in the case of this infectious agent that was apparently not based on nucleic acids.9

  Interest in scrapie grew in the late 1980s and the 1990s with the horrific outbreak of variant Creutzfeldt–Jakob Disease (vCJD) in humans and its equivalent in animals, ‘mad cow’ disease (bovine spongiform encephalopathy, or BSE). These diseases infected millions of cattle and caused the deaths of hundreds of people, most of them teenagers and young adults. Both BSE and vCJD showed similarities to scrapie, and again the evidence suggested that an infectious protein was involved. It was eventually shown that the same prion protein causes all three diseases. Although it is still unclear how the BSE out
break began, it is possible that cows initially got the disease from scrapie-infected sheep, the remains of which were fed to cows as meat and bonemeal. Whatever the original source of BSE, people caught the disease by eating diseased sections of the bovine nervous system that had been included in processed meat such as burgers.

  It is now accepted that the aberrant prion protein alters the conformation of normal prion proteins, thereby producing the brain pathology in sheep, goats, cows and humans.10 In 1997, Prusiner was awarded the Nobel Prize in Physiology or Medicine for his discovery, but despite the widespread acceptance of the prion hypothesis, a few scientists continue to argue that virus-like particles are involved in scrapie and similar diseases.11 Although it is known that yeast prion transmission involves only proteins, there remains the slim possibility that unknown nucleic acid-based cofactors may be involved in mammalian prion diseases.12

  In 1982, Prusiner suggested that the prion codes directly for the synthesis of another prion, not merely for its shape. This would have completely destroyed one fundamental point of the central dogma, that protein does not code for protein. Prusiner was wrong. Prion proteins are produced by the action of the prion gene, encoded in DNA and transcribed into RNA and then translated into a chain of amino acids – the normal prion protein plays a role in producing myelin, which protects nerves.13 In both the benign and the pathogenic forms of the prion, the amino acid sequence remains the same, so there is no transfer of information as defined by the central dogma, which referred solely to the sequence, not the structure. Although it can be argued that three-dimensional conformation is a form of information – indeed, Crick accepted as much – the change induced by the prion protein is probably more similar to the action of a crystal growing by assembling identical copies of itself than it is to that of a DNA molecule, which can produce a correspondence with a sequence in a different kind of molecule.14 Despite the highly unusual and pathological conditions that produce prion disease, the central dogma remains fundamentally intact.15

 

‹ Prev