Book Read Free

The Violinist's Thumb: And Other Lost Tales of Love, War, and Genius, as Written by Our Genetic Code

Page 9

by Sam Kean


  Both the linguistic and mathematical properties of DNA contribute to its ultimate purpose: managing data. Cells store, call up, and transmit messages through DNA and RNA, and scientists routinely speak of nucleic acids encoding and processing information, as if genetics were a branch of cryptography or computer science.

  As a matter of fact, modern cryptography has some roots in genetics. After studying at Cornell University, a young geneticist named William Friedman joined an eccentric scientific think tank in rural Illinois in 1915. (It boasted a Dutch windmill, a pet bear named Hamlet, and a lighthouse, despite being 750 miles from the coast.) As Friedman’s first assignment, his boss asked him to study the effects of moonlight on wheat genes. But Friedman’s statistical background soon got him drawn into another of his boss’s lunatic projects*—proving that Francis Bacon not only wrote Shakespeare’s plays but left clues throughout the First Folio that trumpeted his authorship. (The clues involved changing the shapes of certain letters.) Although enthused—he’d loved code breaking ever since he’d read Edgar Allan Poe’s “The Gold-Bug” as a child—Friedman determined the supposed references to Bacon were bunkum. Someone could use the same deciphering schemes, he noted, to “prove” that Teddy Roosevelt wrote Julius Caesar. Nevertheless Friedman had envisioned genetics as biological code breaking, and after his taste of real code breaking, he took a job in cryptography with the U.S. government. Building on the statistical expertise he’d gained in genetics, he soon cracked the secret telegrams that broke the Teapot Dome bribery scandal open in 1923. In the early 1940s he began deciphering Japanese diplomatic codes, including a dozen infamous cables, intercepted on December 6, 1941, from Japan to its embassy in Washington, D.C., that foreshadowed imminent threat.

  Friedman had abandoned genetics because genetics in the first decades of the century (at least on farms) involved too much sitting around and waiting for dumb beasts to breed; it was more animal husbandry than data analysis. Had he been born a generation or two later, Friedman would have seen things differently. By the 1950s biologists regularly referred to A-C-G-T base pairs as biological “bits” and to genetics as a “code” to crack. Genetics became data analysis, and continued to develop along those lines thanks in part to the work of a younger contemporary of Friedman, an engineer whose work encompassed both cryptography and genetics, Claude Shannon.

  Scientists routinely cite Shannon’s thesis at MIT, written in 1937 when he was twenty-one years old, as the most important master’s thesis ever. In it Shannon outlined a method to combine electronic circuits and elementary logic to do mathematical operations. As a result, he could now design circuits to perform complex calculations—the basis of all digital circuitry. A decade later, Shannon wrote a paper on using digital circuits to encode messages and transmit them more efficiently. It’s only barely hyperbole to say that these two discoveries created modern digital communications from scratch.

  Amid these seminal discoveries, Shannon indulged his other interests. At the office he loved juggling, and riding unicycles, and juggling while riding unicycles down the hall. At home he tinkered endlessly with junk in his basement; his lifetime inventions include rocket-powered Frisbees, motorized pogo sticks, machines to solve Rubik’s Cubes, a mechanical mouse (named Theseus) to solve mazes, a program (named THROBAC) to calculate in Roman numerals, and a cigarette pack–sized “wearable computer” to rip off casinos at roulette.*

  Shannon also pursued genetics in his Ph.D. thesis in 1940. At the time, biologists were firming up the connection between genes and natural selection, but the heavy statistics involved frightened many. Though he later admitted he knew squat about genetics, Shannon dived right in and tried to do for genetics what he’d done for electronic circuits: reduce the complexities into simple algebra, so that, given any input (genes in a population), anyone could quickly calculate the output (what genes would thrive or disappear). Shannon spent all of a few months on the paper and, after earning his Ph.D., was seduced by electronics and never got back around to genetics. It didn’t matter. His new work became the basis of information theory, a field so widely applicable that it wound its way back to genetics without him.

  With information theory, Shannon determined how to transmit messages with as few mistakes as possible—a goal biologists have since realized is equivalent to designing the best genetic code for minimizing mistakes in a cell. Biologists also adopted Shannon’s work on efficiency and redundancy in languages. English, Shannon once calculated, was at least 50 percent redundant. (A pulp novel he investigated, by Raymond Chandler, approached 75 percent.) Biologists studied efficiency too because, per natural selection, efficient creatures should be fitter creatures. The less redundancy in DNA, they reasoned, the more information cells could store, and the faster they could process it, a big advantage. But as the Tie Club knew, DNA is sub-suboptimal in this regard. Up to six A-C-G-T triplets code for just one amino acid: totally superfluous redundancy. If cells economized and used fewer triplets per amino acid, they could incorporate more than just the canonical twenty, which would open up new realms of molecular evolution. Scientists have in fact shown that, if coached, cells in the lab can use fifty amino acids.

  But if redundancy has costs, it also, as Shannon pointed out, has benefits. A little redundancy in language ensures that we can follow a conversation even if some syllables or words get garbled. Mst ppl hv lttl trbl rdng sntncs wth lttrs mssng. In other words, while too much redundancy wastes time and energy, a little hedges against mistakes. Applied to DNA, we can now see the point of redundancy: it makes a mutation less likely to introduce a wrong amino acid. Furthermore, biologists have calculated that even if a mutation does substitute the wrong amino acid, Mother Nature has rigged things so that, no matter the change, the odds are good that the new amino acid will have similar chemical and physical traits and will therefore get folded properly. It’s an amino acid synonym, so cells can still make out the sentence’s meaning.

  (Redundancy might have uses outside of genes, too. Noncoding DNA—the long expanses of DNA between genes—contains some tediously redundant stretches of letters, places where it looks like someone held his fingers down on nature’s keyboard. While these and other stretches look like junk, scientists don’t know if they really are expendable. As one scientist mused, “Is the genome a trash novel, from which you can remove a hundred pages and it doesn’t matter, or is it more like a Hemingway, where if you remove a page, the story line is lost?” But studies that have applied Shannon’s theorems to junk DNA have found that its redundancy looks a lot like the redundancy in languages—which may mean that noncoding DNA has still-undiscovered linguistic properties.)

  All of this would have wowed Shannon and Friedman. But perhaps the most fascinating aspect is that, beyond its other clever features, DNA also scooped us on our most powerful information-processing tools. In the 1920s the influential mathematician David Hilbert was trying to determine if there existed any mechanical, turn-the-crank process (an algorithm) that could solve theorems automatically, almost without thought. Hilbert envisioned humans going through this process with pencils and paper. But in 1936 mathematician (and amateur knotologist) Alan Turing sketched out a machine to do the work instead. Turing’s machine looked simplistic—just a long recording tape, and a device to move the tape and mark it—but in principle could compute the answer to any solvable problem, no matter how complex, by breaking it down into small, logical steps. The Turing machine inspired many thinkers, Shannon among them. Engineers soon built working models—we call them computers—with long magnetic tapes and recording heads, much as Turing envisioned.

  Biologists know, however, that Turing machines resemble nothing so much as the machinery that cells use to copy, mark, and read long strands of DNA and RNA. These Turing biomachines run every living cell, solving all sorts of intricate problems every second. In fact, DNA goes one better than Turing machines: computer hardware still needs software to run; DNA acts as both hardware and software, both storing informati
on and executing commands. It even contains instructions to make more of itself.

  And that’s not all. If DNA could do only the things we’ve seen so far—copy itself perfectly over and over, spin out RNA and proteins, withstand the damage of nuclear bombs, encode words and phrases, even whistle a few choice tunes—it would still stand out as an amazing molecule, one of the finest. But what sets DNA apart is its ability to build things billions of times larger than itself—and set them in motion across the globe. DNA has even kept travelogues of everything its creations have seen and done in all that time, and a few lucky creatures can now, finally, after mastering the basics of how DNA works, read these tales for themselves.

  PART II

  Our Animal Past

  Making Things That Crawl and Frolic and Kill

  5

  DNA Vindication

  Why Did Life Evolve So Slowly—Then Explode in Complexity?

  Almost immediately upon reading the paper, Sister Miriam Michael Stimson must have known that a decade of labor, her life’s work, had collapsed. Throughout the 1940s, this Dominican nun—she wore a black-and-white habit (complete with hood) at all times—had carved out a productive, even thriving research career for herself. At small religious colleges in Michigan and Ohio, she had experimented on wound-healing hormones and even helped create a noted hemorrhoid cream (Preparation H) before finding her avocation in studying the shapes of DNA bases.

  She progressed quickly in the field and published evidence that DNA bases were changelings—shape-shifters—that could look quite different one moment to the next. The idea was pleasingly simple but had profound consequences for how DNA worked. In 1951, however, two rival scientists obliterated her theory in a single paper, dismissing her work as “slight” and misguided. It was a mortifying moment. As a woman scientist, Sister Miriam carried a heavy burden; she often had to endure patronizing lectures from male colleagues even on her own research topics. And with such a public dismissal, her hard-won reputation was unraveling as quickly and thoroughly as two strands of DNA.

  It couldn’t have been much consolation to realize, over the next few years, that her repudiation was actually a necessary step in making the most important biological discovery of the century, Watson and Crick’s double helix. James Watson and Francis Crick were unusual biologists for their time in that they merely synthesized other people’s work and rarely bothered doing experiments. (Even the über-theorist Darwin had run an experimental nursery and trained himself as an expert in barnacles, including barnacle sex.) This habit of “borrowing” got Watson and Crick in trouble sometimes, most notably with Rosalind Franklin, who took the crucial x-rays that illuminated the double helix. But Watson and Crick built on the foundational work of dozens of other, lesser-known scientists as well, Sister Miriam among them. Admittedly, her work wasn’t the most important in the field. In fact, her mistakes perpetuated much of the early confusion about DNA. But as with Thomas Hunt Morgan, following along to see how someone faced her errors has its value. And unlike many vanquished scientists, Sister Miriam had the humility, or gumption, to drag herself back into the lab and contribute something to the double helix story in the end.

  In many ways biologists in the mid–twentieth century were struggling with the same basic problem—what did DNA look like?—as in Friedrich Miescher’s day, when they’d first uncovered its anomalous mix of sugars, phosphates, and ringed bases. Most vexing of all, no one could figure out how the long strands of DNA could possibly mesh and snuggle up together. Today we know that DNA strands mesh automatically because A fits with T, and C with G, but no one knew that in 1950. Everyone assumed letter pairing was random. So scientists had to accommodate every ungainly combination of letters inside their models of DNA: bulky A and G had to fit together sometimes, as did slender C and T. Scientists quickly realized that no matter how these ill-fitting pairs of bases were rotated or jammed together, they would produce dents and bulges, not the sleek DNA shape they expected. At one point Watson and Crick even said to hell with this biomolecular Tetris and wasted months tinkering with an inside-out (and triple-stranded*) DNA model, with the bases facing outward just to get them out of the way.

  Sister Miriam tackled an important subproblem of DNA’s structure, the precise shapes of the bases. A nun working in a technical field like this might seem strange today, but Miriam recalled later that most women scientists she encountered at conferences and meetings were fellow sisters. Women at the time usually had to relinquish their careers upon marrying, while unmarried women (like Franklin) provoked suspicion or derision and sometimes earned such low pay they couldn’t make ends meet. Catholic sisters, meanwhile, respectably unmarried and living in church-run convents, had the financial support and independence to pursue science.

  Not that being a sister didn’t complicate things, professionally and personally. Like Mendel—born Johann, but Gregor at his monastery—Miriam Stimson and her fellow novices received new names upon entering their convent in Michigan in 1934. Miriam selected Mary, but during the christening ceremony, their archbishop and his assistant skipped an entry on the list, so most women in line were blessed with the wrong handle. No one spoke up, and because no names remained for Miriam, last in line, the clever archbishop used the first name that popped into his head, a man’s. The sisterhood was considered a marriage to Christ, and because what God (or archbishops) join together mere humans cannot put asunder, the wrong names became permanent.

  These demands for obedience grew more onerous as Sister Miriam started working, and crimped her scientific career. Instead of a full laboratory, her superiors at her small Catholic college spared only a converted bathroom to run experiments in. Not that she had much time to run them: she had to serve as a “wing-nun,” responsible for a student dorm, and she had full teaching loads to boot. She also had to wear a habit with an enormous cobra hood even in the lab, which couldn’t have made running complicated experiments easy. (She couldn’t drive either because the hood obscured her peripheral vision.) Nevertheless Miriam was sincerely clever—friends nicknamed her “M2”—and as with Mendel, M2’s order promoted and encouraged her love of science. Admittedly, they did so partly to combat godless commies in Asia, but partly also to understand God’s creation and care for his creatures. Indeed, Miriam and her colleagues contributed to many areas of medicinal chemistry (hence the Preparation H study). The DNA work was a natural extension of this, and she seemed to be making headway in the late 1940s on the shape of DNA bases, by studying their constituent parts.

  Sister Miriam Michael Stimson, a DNA pioneer, wore her enormous hooded habit even in the laboratory. (Archives: Siena Heights University)

  Carbon, nitrogen, and oxygen atoms make up the core of A, C, G, and T, but the bases also contain hydrogen, which complicates things. Hydrogen hangs out on the periphery of molecules, and as the lightest and most easily peer-pressured element, hydrogen atoms can get yanked around to different positions, giving the bases slightly different shapes. These shifts aren’t a big deal—each is pretty much the same molecule before and after—except the position of hydrogen is critical for holding double-stranded DNA together.

  Hydrogen atoms consist of one electron circling one proton. But hydrogen usually shares that negative electron with the inner, ringed part of the DNA base. That leaves its positively charged proton derriere exposed. DNA strands bond together by aligning positive hydrogen patches on one strand’s bases with negative patches on the other strand’s bases. (Negative patches usually center on oxygen and nitrogen, which hoard electrons.) These hydrogen bonds aren’t as strong as normal chemical bonds, but that’s actually perfect, because cells can unzip DNA when need be.

  Though common in nature, hydrogen bonding seemed impossible in DNA in the early 1950s. Hydrogen bonding requires the positive and negative patches to align perfectly—as they do in A and T, and C and G. But again, no one knew that certain letters paired up—and in other letter combinations the charges don’t line up so smartly. Research
by Sister M2 and others further fouled up this picture. Her work involved dissolving DNA bases in solutions with high or low acidity. (High acidity raises the population of hydrogen ions in a solution; low acidity depresses it.) Miriam knew that the dissolved bases and the hydrogen were interacting somehow in her solution: when she shined ultraviolet light through it, the bases absorbed the light differently, a common sign of something changing shape. But she assumed (always risky) that the change involved hydrogens shifting around, and she suggested that this happens naturally in all DNA. If that was true, DNA scientists now had to contemplate hydrogen bonds not only for mismatched bases, but for multiple forms of each mismatched base. Watson and Crick later recalled with exasperation that even textbooks of the time showed bases with hydrogen atoms in different positions, depending on the author’s whims and prejudices. This made building models nearly impossible.

  As Sister Miriam published papers on this changeling theory of DNA in the late 1940s, she watched her scientific status climb. Pride goeth before the fall. In 1951 two scientists in London determined that acidic and nonacidic solutions did not shift hydrogens around on DNA bases. Instead those solutions either clamped extra hydrogens onto them in odd spots or stripped vulnerable hydrogen off. In other words, Miriam’s experiments created artificial, nonnatural bases. Her work was useless for determining anything about DNA, and the shape of DNA bases therefore remained enigmatic.

  However faulty Miriam’s conclusions, though, some experimental techniques she had introduced with this research proved devilishly useful. In 1949 the DNA biologist Erwin Chargaff adapted a method of ultraviolet analysis that Miriam had pioneered. With this technique, Chargaff determined that DNA contains equal amounts of A and T and of C and G. Chargaff never capitalized on the clue, but did blab about it to every scientist he could corner. Chargaff tried to relay this finding to Linus Pauling—Watson and Crick’s main rival—while on a cruise, but Pauling, annoyed at having his holiday interrupted, blew Chargaff off. The cagier Watson and Crick heeded Chargaff (even though he thought them young fools), and from his insight they determined, finally, that A pairs with T, C with G. It was the last clue they needed, and a few degrees of separation from Sister Miriam, the double helix was born.

 

‹ Prev