Dna: The Secret of Life
Page 9
One element of Gamow's 1954 scheme had the virtue of being testable: because it involved overlapping DNA triplets, it predicted that many pairs of amino acids would in fact never be found adjacent in proteins. So Gamow eagerly awaited the sequencing of additional proteins. To his disappointment, more and more amino acids began to be found next to each other, and his scheme became increasingly untenable. The coup de grâce for all Gamow-type codes came in 1956 when Sydney Brenner (VAL – valine) analyzed every amino acid sequence then available.
Brenner had been raised in a small town outside Johannesburg, South Africa, in two rooms at the back of his father's cobbler's shop. Though the elder Brenner, a Lithuanian immigrant, was illiterate, his precocious son discovered a love of reading at the age of four and, led by this passion, would be turned on to biology by a textbook called The Science of Life. Though he was one day to admit having stolen the book from the public library, neither larceny nor poverty could slow Brenner's progress: he entered the University of Witwatersrand's undergraduate medical program at fourteen, and was working on his Ph.D. at Oxford when he came to Cambridge a month after our discovery of the double helix. He recalls his reaction to our model: "That's when I saw that this was it. And in a flash you just knew that this was very fundamental."
Gamow was not the only one whose theories were biting the dust: I had my own share of disappointments. Having gone to Caltech in the immediate aftermath of the double helix, I wanted to find the structure of RNA. To my despair, Alexander Rich (ARG – arginine) and I soon discovered that X-ray diffraction of RNA yielded uninterpretable patterns: the molecule's structure was evidently not as beautifully regular as that of DNA. Equally depressing, in a note sent out early in 1955 to all Tie Club members, Francis Crick (TYR – tyrosine) predicted that the structure of RNA would not, as I supposed, hold the secret of the DNA → protein transformation. Rather, he suggested that amino acids were likely ferried to the actual site of protein synthesis by what he called "adaptor molecules," of which there existed one specific to every amino acid. He speculated that these adaptors themselves might be very small RNA molecules. For two years I resisted his reasoning. Then a most unexpected biochemical finding proved that his novel idea was right on the mark.
It came from work at the Massachusetts General Hospital in Boston, where Paul Zamecnik had for several years been developing cell-free systems for studying protein synthesis. Cells are highly compartmentalized bodies, and Zamecnik correctly saw the need to study what was going on inside them without the complications posed by their various membranes. Using material derived from rat liver tissue, he and his collaborators were able to re-create in a test tube a simplified version of the cell interior in which they could track radioactively tagged amino acids as they were assembled into proteins. In this way Zamecnik was able to identify the ribosome as the site of protein synthesis, a fact that George Gamow did not accept initially.
Soon, with his colleague Mahlon Hoagland, Zamecnik made the even more unexpected discovery that amino acids, prior to being incorporated into polypeptide chains, were bound to small RNA molecules. This result puzzled them until they heard from me of Crick's adaptor theory. They then quickly confirmed Crick's suggestion that a specific RNA adaptor (called transfer RNA) existed for each amino acid. And each of these transfer RNA molecules also had on its surface a specific sequence of bases that permitted it to bind to a corresponding segment of the RNA template, thereby lining up the amino acids for protein synthesis.
Until the discovery of transfer RNA, all cellular RNA was thought to have a template role. Now we realized RNA could come in several different forms, though the two major RNA chains that comprised the ribosomes predominated. Puzzling at the time was the observation that these two RNA chains were of constant sizes. If these chains were the actual templates for protein synthesis, we would have expected them to vary in length in relation to the different sizes of their protein products. Equally disturbing, these chains proved very stable metabolically: once synthesized they did not break down. Yet experiments at the Institut Pasteur in Paris suggested that many templates for bacterial protein synthesis were short-lived. Even stranger, the sequences of the bases in the two ribosomal RNA chains showed no correlation to sequences of bases along the respective chromosomal DNA molecules.
Resolution of these paradoxes came in 1960 with discovery of a third form of RNA, messenger RNA. This was to prove the true template for protein synthesis. Experiments done in my lab at Harvard and at both Caltech and Cambridge by Matt Meselson, François Jacob, and Sydney Brenner showed that ribosomes were, in effect, molecular factories. Messenger RNA passed between the two ribosomal subunits like ticker tape being fed into an old-fashioned computer. Transfer RNAs, each with its amino acid, attached to the messenger RNA in the ribosome so that the amino acids were appropriately ordered before being chemically linked to form polypeptide chains.
Still unclear was the genetic code, the rules for translating a nucleic acid sequence into an ordered polypeptide sequence. In a 1956 RNA Tie Club manuscript, Sydney Brenner laid out the theoretical issues. In essence they boiled down to this: how could the code specify which one of 20 amino acids was to be incorporated into a protein chain at a particular point when there are only four DNA letters, A, T, G, C? Obviously a single nucleotide, with only four possible identities, was insufficient, and even two – which would allow for 16 (4 x 4) possible permutations – wouldn't work. It would take at minimum three nucleotides, a triplet, to code for a single amino acid. But this also supposed a puzzling redundant capacity. With a triplet, there could exist 64 permutations (4 x 4 x 4); since the code needed only 20, was it the case that most amino acids could be encoded by more than one triplet? If that were so, in principle, a "quadruplet" code (4 x 4 x 4 X 4) yielding 256 permutations was also perfectly feasible, though it implied even greater redundancy.
In 1961 at Cambridge University, Brenner and Crick did the definitive experiment that demonstrated that the code was triplet-based. By a clever use of chemical mutagens they were able to delete or insert DNA base pairs. They found that inserting or deleting a single base pair results in a harmful "frameshift" because the entire code beyond the site of the mutation is scrambled. Imagine a three-letter word code as follows: JIM ATE THE FAT CAT. Now imagine that the first "T" is deleted. If we are to preserve the three-letter word structure of the sentence, we have JIM AET HEF ATC AT – gibberish beyond the site of the deletion. The same thing happens when two base pairs are deleted or inserted: removing the first "T" and "E," we get JIM ATH EFA TCA T – more gibberish. Now what happens if we delete (or insert) three letters? Removing the first "A," "T," and "E," we get JIM THE FAT CAT; although we have lost one "word" – ATE – we have nevertheless retained the sense of the rest of the sentence. And even if our deletion straddles "words" – say we delete the first "T" and "E," and the second "T" – we still lose only those two words, and are again able to recover the intended sentence beyond them: JIM AHE FAT CAT. So it is with DNA sequence: a single insertion/ deletion massively disrupts the protein because of the frameshift effect, which changes every single amino acid beyond the insertion/ deletion point; so does a double insertion/deletion. But a triple insertion/deletion along a DNA molecule will not necessarily have a catastrophic effect; they will add/eliminate one amino acid but this does not necessarily disrupt all biological activity.
Crick came into the lab late one night with his colleague Leslie Barnett to check on the final result of the triple-deletion experiment, and realized at once the significance of the result, telling Barnett, "We're the only two who know it's a triplet code!" With me, Crick had been the first to glimpse the double helical secret of life; now he was the first to know for sure that the secret is written in three-letter words.
So the code came in threes, and the links from DNA to protein were RNA-mediated. But we still had to crack the code. What pair of amino acids was specified by a stretch of DNA with, say, sequence ATA TAT or GGT CAT? The first glimpse of the so
lution came in a talk given by Marshall Nirenberg at the International Congress of Biochemistry in Moscow in 1961.
After hearing about the discovery of messenger RNA, Nirenberg, working at the U.S. National Institutes of Health, wondered whether RNA synthesized in vitro would work as well as the naturally occurring messenger form when it came to protein synthesis in cell-free systems. To find out, he used RNA tailored according to procedures developed at New York University six years earlier by the French biochemist Marianne Grunberg-Manago. She had discovered an RNA-specific enzyme that could produce strings like AAAAAA or GGGGGG. And because one key chemical difference between RNA and DNA is RNA's substitution of uracil, "U," for thymine, "T," this enzyme would also produce strings of U, UUUUU . . . – poly-U, in the biochemical jargon. It was poly-U that Nirenberg and his German collaborator, Heinrich Matthaei, added to their cell-free system on May 22, 1961. The result was striking: the ribosomes started to pump out a simple protein, one consisting of a string of a single amino acid, phenylalanine. They had discovered that poly-U encodes polyphenylalanine. Therefore, one of the three-letter words by which the genetic code specified phenylalanine had to be UUU.
The International Congress that summer of 1961 brought together all the major players in molecular biology. Nirenberg, then a young scientist nobody had heard of, was slated to speak for just ten minutes, and hardly anyone, including myself, attended his talk. But when news of his bombshell began to spread, Crick promptly inserted him into a later session of the conference so that Nirenberg could make his announcement to a now-expectant capacity audience. It was an extraordinary moment. A quiet, self-effacing young no-name speaking before a who's who crowd of molecular biology had shown the way toward finding the complete genetic code.
Practically speaking, Nirenberg and Matthaei had solved but one sixty-fourth of the problem – all we now knew was that UUU codes for phenylalanine. There remained sixty-three other three-letter triplets (codons) to figure out, and the following years would see a frenzy of research as we labored to discover what amino acids these other codons represented. The tricky part was synthesizing the various permutations of RNA: poly-U was relatively straightforward to produce, but what about AGG? A lot of ingenious chemistry went into solving these problems, much of it done at the University of Wisconsin by Gobind Khorana. By 1966, what each of the sixty-four codons specifies (in other words, the genetic code itself) had been established (see Plate 22); Khorana and Nirenberg received the Nobel Prize for Physiology or Medicine in 1968.
Let's now put the whole story together and look at how a particular protein, hemoglobin, is produced.
Red blood cells are specialized as oxygen transporters: they use hemoglobin to transport oxygen from the lungs to the tissues where it is needed. Red blood cells are produced in the bone marrow by stem cells – at a rate of about two and a half million per second.
When the need arises to produce hemoglobin, the relevant segment of the bone-marrow DNA – the hemoglobin gene – unzips just as DNA unzips when it is replicating (see Plate 23). This time, instead of copying both strands, only one is copied or, to use the technical term, transcribed; and rather than a new strand of DNA, the product created with the help of the enzyme RNA polymerase is a new single strand of messenger RNA, which corresponds to the hemoglobin gene. The DNA from which the RNA has been derived now zips itself up again.
The messenger RNA is transported out of the nucleus and delivered to a ribosome, itself composed of RNA and proteins, where the information in the sequence of the messenger RNA will be used to generate a new protein molecule. This process is known as translation. Amino acids are delivered to the scene attached to transfer RNA. At one end of the transfer RNA is a particular triplet (in the case given in the diagram, CAA) that recognizes its opposite corresponding triplet in the messenger RNA, GUU. At its other end the transfer RNA is towing its matching amino acid, in this case valine. At the next triplet along the messenger RNA, because the DNA sequence is TTC (which specifies lysine), we have a lysine transfer RNA. All that remains now is to glue the two amino acids together biochemically. Do that 100 times, and you have a protein chain 100 amino acids long; the order of the amino acids has been specified by the order of As, Ts, Gs, and Cs in the DNA from which the messenger RNA was created. The two kinds of hemoglobin chains are 141 and 146 amino acids in length.
Proteins, however, are more than just linear chains of amino acids. Once the chain has been made, proteins fold into complex configurations, sometimes by themselves, sometimes assisted by "helper" molecules. It is only once they assume this configuration that they become biologically active. In the case of hemoglobin, it takes four chains, two of one kind and two of a slightly different kind, before the molecule is in business. And loaded into the center of each twisted chain is the key to oxygen transport, an iron atom.
It has been possible to use today's molecular biological tricks to go back and reconsider some of the classic examples of early genetics. For Mendel, the mechanism that caused some peas to be wrinkled and others round was mysterious; as far as he was concerned, these were merely characteristics that obeyed the laws of inheritance he had worked out. Now, however, we understand the difference in molecular detail.
In 1990, scientists in England found that wrinkled peas lack a certain enzyme involved in the processing of starch, the carbohydrate that is stored in seeds. It turns out that the gene for that enzyme in wrinkled-pea plants is nonfunctional owing to a mutation (in this case an intrusion of irrelevant DNA into the middle of the gene). Because wrinkled peas contain, as a result of this mutation, less starch and more sugar, they tend to lose more water as they are maturing. The outside seed coat of the pea, however, fails to shrink as the water escapes (and the volume of the pea decreases), and the result is the characteristic wrinkling – the contents being too little to fill out the coat.
Archibald Garrod's alkaptonuria has also entered the molecular era. In 1995, Spanish scientists working with fungi found a mutated gene that resulted in the accumulation of the same substance that Garrod had noted in the urine of alkaptonurics. The gene in question ordinarily produces an enzyme that turns out to be a basic feature of many living systems, and is present in humans. By comparing the sequence of the fungal gene to human sequences, it was possible to find the human gene, which encodes an enzyme called homogentisate dioxygenase. The next step was to compare the gene in normal individuals with the one in alkaptonurics. Lo and behold, the alkaptonurics' gene was nonfunctional, courtesy of single base pair mutations. Garrod's "inborn error in metabolism" is caused by a single difference in DNA sequence.
At the 1966 Cold Spring Harbor Symposium on the genetic code, there was a sense that we had done it all. The code was cracked, and we knew in outline how DNA exerted control of living processes through the proteins it specifies. Some of the old hands decided that it was time to move beyond the study of the gene per se. Francis Crick decided to move into neurobiology; never one to shy away from big problems, he was particularly interested in figuring out how the human brain works. Sydney Brenner turned to developmental biology, choosing to concentrate on a simple nematode worm in the belief that precisely so simple a creature would most readily permit scientists to unravel the connections between genes and development. Today, the worm, as it is known in the trade, is indeed the source of many of our insights into how organisms are put together. The worm's contribution was recognized by the Nobel Committee in 2002 when Brenner and two longstanding worm stalwarts, John Sulston at Cambridge and Bob Horvitz at MIT, were awarded the Nobel Prize in Physiology or Medicine.
Most of the early pioneers in the DNA game, however, chose to remain focused on the basic mechanisms of gene function. Why are some proteins much more abundant than others? Many genes are switched on only in specific cells or only at particular times in the life of a cell; how is that switching achieved? A muscle cell is hugely different from a liver cell, both in its function and in its appearance under the microscope. Changes in gene express
ion create this cellular diversity and differentiation: in essence, muscle cells and liver cells produce different sets of proteins. The simplest way to produce different proteins is to regulate which genes are transcribed in each cell. Thus some so-called housekeeping proteins – the ones essential for the functioning of the cell, such as those involved in the replication of DNA – are produced by all cells. Beyond that, particular genes are switched on at particular moments in particular cells to produce appropriate proteins. It is also possible to think of development – the process of growth from a single fertilized egg into a staggeringly complex adult human – as an enormous exercise in gene-switching: as tissues arise through development, so whole suites of genes must be switched on and off.
The first important advances in our understanding of how genes are switched on and off came from experiments in the 1960s by François Jacob and Jacques Monod at the Institut Pasteur in Paris. Monod had started slowly in science because, poor fellow, he was talented in so many fields that he had difficulty focusing. During the thirties, he spent time at Caltech's biology department under T. H. Morgan, father of fruit fly genetics, but not even daily exposure to Morgan's no-longer-so-boyish "boys" could turn Monod into a fruit fly convert. He preferred conducting Bach concerts at the university – which later offered him a job teaching undergraduate music appreciation – and in the lavish homes of local millionaires. Not until 1940 did he complete his Ph.D. at the Sorbonne in Paris, by which time he was already heavily involved in the French Resistance. In one of the few instances of biology's complicity in espionage, Monod was able to conceal vital secret papers in the hollow leg bones of a giraffe skeleton on display outside his lab. As the war progressed, so did his importance to the Resistance (and with it his vulnerability to the Nazis). By D-day he was playing a major role in facilitating the Allied advance and harrying the German retreat.