The Gene
Page 37
On June 10, 1992, fed up with the interminable squabbles, Venter left the NIH to launch his own private gene-sequencing institute. At first the organization was called Institute for Genome Research, but Venter cannily spotted the flaw in the name: its acronym, IGOR, carried the unfortunate association with a cross-eyed Gothic butler apprenticed to Frankenstein. Venter changed it to The Institute for Genomic Research—or TIGR, for short.
On paper—or, at least, on scientific papers—TIGR was a phenomenal success. Venter collaborated with scientific luminaries, such as Bert Vogelstein and Ken Kinzler, to discover new genes associated with cancer. More important, Venter kept battering at the technological frontiers of genome sequencing. Uniquely sensitive to his critics, he was also uniquely responsive to them: in 1993, he expanded his sequencing efforts beyond gene fragments to full genes, and to genomes. Working with a new ally, Hamilton Smith, the Nobel Prize–winning bacteriologist, Venter decided that he would sequence the entire genome of a bacterium that causes lethal human pneumonias—Haemophilus influenzae.
Venter’s strategy was an expansion of the gene-fragment approach he had used with the brain—except with an important twist. This time he would shatter the bacterial genome into a million pieces using a shotgunlike device. He would then sequence hundreds of thousands of fragments at random, then use their overlapping segments to assemble them to solve the entire genome. To return to our sentence analogy, imagine trying to assemble a word using the following word fragments: stru, uctu, ucture, structu, and ucture. A computer can use the overlapping segments to assemble the full word: structure.
The solution depends on the presence of the overlapping sequences: if an overlap does not exist, or some part of the word gets omitted, it becomes impossible to assemble the correct word. But Venter was confident that he could use this approach to shatter and reassemble most of the genome. It was a Humpty Dumpty strategy: all the king’s men would solve the jigsaw puzzle by fitting the parts together again. The technique, called “shotgun” sequencing, had been used by Fred Sanger, the inventor of gene sequencing, in the 1980s—but Venter’s attack on the Haemophilus genome was the most ambitious application of this method in its history.
Venter and Smith launched the Haemophilus project in the winter of 1993. By July 1995, it was complete. “The final [paper] took forty drafts,” Venter later wrote. “We knew this paper was going to be historic, and I was insistent that it be as near perfect as possible.”
It was a marvel: the Stanford geneticist Lucy Shapiro wrote about how members of her lab had stayed up all night reading the H. flu genome, “thrilled by the first glimpse at the complete gene content of a living species.” There were genes to generate energy, genes to make coat proteins, genes to manufacture proteins, to regulate food, to evade the immune system. Sanger himself wrote to Venter to describe the work as “magnificent.”
While Venter was sequencing bacterial genomes at TIGR, the Human Genome Project was undergoing drastic internal changes. In 1993, after a series of quarrels with the head of the NIH, Watson stepped down as the project’s director. He was swiftly replaced by Francis Collins, the Michigan geneticist known for cloning the cystic fibrosis gene in 1989.
If the Genome Project had not found Collins in 1993, it might have found it necessary to invent him: he was almost preternaturally matched to its peculiar challenges. A devout Christian from Virginia, an able communicator and administrator, a first-rate scientist, Collins was measured, cautious, and diplomatic; to Venter’s furious little yacht constantly tilting against the winds, Collins was a transoceanic liner, barely registering the tumult around him. By 1995, as TIGR had roared forward with the Haemophilus genome, the Genome Project had concentrated its efforts on refining the basic technologies for gene sequencing. In contrast to TIGR’s strategy, which was to shred the genome to pieces, sequence at random, and reassemble the data post hoc, the Genome Project had chosen a more orderly approach—assembling and organizing the genomic fragments into a physical map (“Who is next to whom?”), confirming the identity and the overlaps of the clones, and then sequencing the clones in order.
To Lander, Collins, and Sulston, this clone-by-clone assembly was the only strategy that made any sense. A mathematician-turned-biologist-turned-gene-sequencer, Lander, whose opposition to shotgun sequencing could almost be described as an aesthetic revulsion, liked the idea of sequencing the complete genome piece by piece, as if solving an algebra problem. Venter’s strategy, he worried, would inevitably leave potholes in the genome. “What if you took a word, broke it apart, and tried to reconstruct it from the parts?” Lander asked. “That might work if you can find every piece of the word, or if every fragment overlaps. But what if some letters of the word are missing?” The word you might construct out of the available alphabets might convey precisely the opposite meaning to the real word; what if you found just the letters “p . . . u . . . n . . . y” in “profundity”?
Lander also feared the false intoxication of a half-finished genome: if gene sequencers left 10 percent of the genome incomplete, the full sequence would never be completed. “The real challenge of the Human Genome Project wasn’t starting the sequence. It was finishing the sequence of the genome. . . . If you left holes in the genome, but gave yourself the impression of completion, then no one would have the patience to finish the full sequence. Scientists would clap, dust their hands, pat their backs and move on. The draft would always remain a draft.”
The clone-by-clone approach required more money, deeper investments in infrastructure, and the one factor that seemed to have gone missing among genome researchers: patience. At MIT, Lander had assembled a formidable team of young scientists—mathematicians, chemists, engineers, and a group of caffeinated twentysomething computer hackers—who were developing algorithms to plod their way methodically through the genome. Lander was not working alone; the British team, funded by the Wellcome Trust, was developing its own platforms for analysis and assembly.
In May 1998, the ever-moving Venter tacked sharply windward yet again. Although TIGR’s shotgun-sequencing efforts had been undeniably successful, Venter still chafed under the organizational structure of the institute. TIGR had been set up as a strange hybrid—a nonprofit institute nestled inside a for-profit company, called Human Genome Sciences (HGS). Venter found this Russian-doll organizational system ridiculous. He argued relentlessly with his bosses. Venter decided to sever his ties with TIGR. He formed yet another new company, which would focus entirely on human genome sequencing. Venter called the new company Celera, a contraction of “accelerate.”
A week before a pivotal Human Genome Project meeting at Cold Spring Harbor, Venter met Collins, between flights, at the Red Carpet Lounge at Dulles Airport. Celera was about to launch an unprecedented push to sequence the human genome using shotgun sequencing, Venter announced matter-of-factly. It had bought two hundred of the most sophisticated sequencing machines, and was prepared to run them to the ground to finish the sequence in record time. Venter agreed to make much of the information available as a public resource—but with a menacing clause: Celera would seek to patent the three hundred most important genes that might act as targets for drugs for diseases such as breast cancer, schizophrenia, and diabetes. He laid out an ambitious timeline. Celera hoped to have the whole human genome assembled by 2001, beating the projected deadline for the publicly funded Human Genome Project by four years. He got up abruptly and caught the next flight to California.
Stung into action, Collins and Lander rapidly reorganized the public effort. They threw open the sluices of federal funding, sending $60 million in sequencing grants to seven American centers. Maynard Olson, a yeast geneticist from Berkeley, and Robert Waterston, a former worm biologist and now a gene-sequencing expert from Washington University, provided key strategic advice. Losing the genome to a private company would be a monumental embarrassment for the Genome Project. As knowledge of the looming public-private rivalry spread, newspapers were awash with speculation. On Ma
y 12, 1998, the Washington Post announced, “Private Firm Aims to Beat Government to Gene Map.”
In December 1998, the Worm Genome Project scored a decisive victory. From the gene-sequencing facility in Hinxton, near Cambridge, England, John Sulston brought news that the worm (C. elegans) genome had been completely sequenced using the clone-by-clone approach favored by proponents of the Human Genome Project.
If the Haemophilus genome had nearly brought geneticists to their knees with amazement and wonder in 1995, then the worm genome—the first complete sequence of a multicellular organism—demanded a full-fledged genuflection. Worms are vastly more complex than Haemophilus—and vastly more similar to humans. They have mouths, guts, muscles, a nervous system—and even a rudimentary brain. They touch; they feel; they move. They turn their heads away from noxious stimuli. They socialize. Perhaps they register something akin to worm anxiety when their food runs out. Perhaps they feel a fleeting pulse of joy when they mate.
C. elegans was found to have 18,891 genes.III Thirty-six percent of the encoded proteins were similar to proteins found in humans. The rest—about 10,000—had no known similarities to known human genes; these 10,000 genes were either unique to worms, or, much more likely, a potent reminder of how little humans knew of human genes (many of these genes would, indeed, later be found to have human counterparts). Notably, only 10 percent of the encoded genes were similar to genes found in bacteria. Ninety percent of the nematode genome was dedicated to the unique complexities of organism building—demonstrating, yet again, the fierce starburst of evolutionary innovation that had forged multicellular creatures out of single-celled ancestors several million years ago.
As was the case with human genes, a single worm gene could have multiple functions. A gene called ceh-13, for instance, organizes the location of cells in the developing nervous system, allows the cells to migrate to the anterior parts of the worm’s anatomy, and ensures that the vulva of the worm is appropriately created. And conversely, a single “function” might be specified by multiple genes: the creation of a mouth in worms requires the coordinated function of multiple genes.
The discovery of ten thousand new proteins, with more than ten thousand new functions, would have amply justified the novelty of the project—yet the most surprising feature of the worm genome was not protein-encoding genes, but the number of genes that made RNA messages, but no protein. These genes—called “noncoding” (because they do not encode proteins)—were scattered through the genome, but they clustered on certain chromosomes. There were hundreds of them, perhaps thousands. Some noncoding genes were of known function: the ribosome, the giant intracellular machine that makes proteins, contains specialized RNA molecules that assist in the manufacture of proteins. Other noncoding genes were eventually found to encode small RNAs—called micro-RNAs—which regulate genes with incredible specificity. But many of these genes were mysterious and ill defined. They were not dark matter, but shadow matter, of the genome—visible to geneticists, yet unknown in function or significance.
What is a gene, then? When Mendel discovered the “gene” in 1865, he knew it only as an abstract phenomenon: a discrete determinant, transmitted intact across generations, that specified a single visible property or phenotype, such as flower color or seed texture in peas. Morgan and Muller deepened this understanding by demonstrating that genes were physical—material—structures carried on chromosomes. Avery advanced this understanding of genes by identifying the chemical form of that material: genetic information was carried in DNA. Watson, Crick, Wilkins, and Franklin solved its molecular structure as a double helix, with two paired, complementary strands.
In the 1930s, Beadle and Tatum solved the mechanism of gene action by discovering that a gene “worked” by specifying the structure of a protein. Brenner and Jacob identified a messenger intermediate—an RNA copy—that is required for the translation of genetic information into a protein. Monod and Jacob added to the dynamic conception of genes by demonstrating that genes can be turned on and off by increasing or decreasing this RNA message, using regulatory switches appended to each gene.
The comprehensive sequencing of the worm genome extended and modified these insights on the concept of a gene. A gene specifies a function in an organism, yes—but a single gene can specify more than a single function. A gene need not provide instructions to build a protein: it can be used to encode RNA alone, and no proteins. It need not be a contiguous piece of DNA: it can be split into parts. It has regulatory sequences appended to it, but these sequences need not be immediately adjacent to a gene.
Already, comprehensive genome sequencing had opened the door to an unexplored universe in organismal biology. Like an infinitely recursive encyclopedia—whose entry under encyclopedia has to be updated constantly—the sequencing of a genome had shifted our conception of genes, and therefore, of the genome itself.
The C. elegans genome—published to universal scientific acclaim in a special issue of Science magazine, with a picture of the subcentimeter nematode emblazoned on its cover in December 1998—was a powerful vindication for the Human Genome Project. Days after the worm genome announcement, Lander had exciting news of his own: the Human Genome Project had completed one-quarter of the sequence of the human genome. In a dark, dry, vaultlike warehouse on an industrial lot near Kendall Square in Cambridge, Massachusetts, 125 semiautomated sequencing machines, shaped like enormous gray boxes, were reading about two hundred DNA letters every second (Sanger’s virus, which had taken him three years to sequence, would have been completed in twenty-five seconds). The sequence of an entire human chromosome—chromosome twenty-two—had been fully assembled and was awaiting final confirmation. In about a month, the project would cross a memorable sequencing landmark: its one-billionth human base pair (a G-C, as it turned out), of the total 3 billion.
Celera, meanwhile, had no intention of lagging behind in this arms race. Flush with funds from private investors, it had doubled its output of gene sequences. On September 17, 1999, barely nine months after the publication of the worm genome, Celera opened a vast genome conference at the Fontainebleau Hotel in Miami with its own strategic counterpunch: it had sequenced the genome of the fruit fly, Drosophila melanogaster. Working with the fruit fly geneticist Gerry Rubin and a team of geneticists from Berkeley and Europe, Venter’s team had assembled the fly genome in a record spurt of eleven months—faster than any prior gene-sequencing project. As Venter, Rubin, and Mark Adams rose to the podium to give talks, the leap of the advance became clear: in the nine decades since Thomas Morgan had begun his work on fruit flies, geneticists had identified about 2,500 genes. Celera’s draft sequence contained all 2,500 known genes—and, in a single swoop, had added 10,500 new ones. In the hushed, reverential minute that followed the end of the presentations, Venter did not hesitate to drive a switchblade through his competitors’ spines: “Oh, and by the way, we [have] just started sequencing human DNA, and it looks as if [the technical hurdles are] going to be less of a problem than they had been with the fly.”
In March 2000, Science published the sequence of the fruit fly genome in yet another special issue of the journal, this time with a 1934 engraving of a male and a female fruit fly on its cover. Even the most strident critics of shotgun sequencing were sobered by the quality and depth of the data. Celera’s shotgun strategy had left some important gaps in the sequence—but significant sections of the fly genome were complete. Comparisons between human, worm, and fly genes revealed several provocative patterns. Of the 289 human genes known to be involved with a disease, 177 genes—more than 60 percent—had a related counterpart in the fly. There were no genes for sickle-cell anemia or hemophilia—flies do not have red blood cells or form clots—but genes involved in colon cancer, breast cancer, Tay-Sachs disease, muscular dystrophy, cystic fibrosis, Alzheimer’s disease, Parkinson’s disease, and diabetes, or close counterparts of those genes, were present. Although separated by four legs, two wings, and several million years of evolution, flies an
d humans shared core pathways and genetic networks. As William Blake had suggested in 1794, the diminutive fly had turned out to be “a man like me.”
The most bewildering feature of the fly genome was also a matter of size. Or more accurately, it was the proverbial revelation that size does not matter. Contrary to the expectations of even the most seasoned fly biologists, the fly was found to have just 13,601 genes—5,000 fewer genes than a worm. Less had been used to build more: out of just 13,000 genes was created an organism that mates, grows old, gets drunk, gives birth, experiences pain, has smell, sight, taste, and touch, and shares our insatiable desire for ripe summer fruit. “The lesson is that the complexity apparent [in flies] is not achieved by the sheer number of genes,” Rubin said. “The human genome . . . is likely to be an amplified version of a fly genome. . . . The evolution of additional complex attributes is essentially an organizational one: a matter of novel interactions that derive from the temporal and spatial segregation of fairly similar components.”