Life's Greatest Secret

Page 34

by Matthew Cobb

Two other possibilities highlight the problem associated with using Shannon’s measure of information on data from molecular genetics. First, imagine two stretches of DNA of identical lengths, containing the same proportions of the four bases but in different orders. According to Shannon, the information content of those stretches of DNA, if calculated using each base, would be identical, and yet they would almost certainly have differing gene products that would affect the fitness of the organism in various ways – the biological content of their information would not be alike. Second, there is no agreed answer as to whether most of the DNA sequences in our genome, which have no apparent function and do not seem to be subject to natural selection, contain information or not. Most biologists would probably say not, because they would link information with function, whereas a mathematician would probably argue that they do. Although from Shannon’s point of view a sequence of junk DNA contains as much information as a sequence of codons from a protein-encoding gene, that is clearly not the case from the point of view of the cell, the organism or natural selection. Despite these obstacles, some scientists and philosophers continue to claim that DNA does contain Shannon information and have applied information theory to data from molecular genetics.39 None of these attempts has yet convinced the scientific community as a whole.

Towards the end of his life, the theoretical population geneticist John Maynard Smith (1920–2004) began to explore the role of information in biology. In 1997 he wrote a book with Eörs Szathmáry entitled The Major Transitions in Evolution, in which they described the evolution of life as a set of changes in the way in which information is stored and transmitted.40 For example, the evolution of multicellular organisms altered how information is transmitted and stored, with the appearance of differentiation between cells, underpinned by spatially and temporally modulated gene regulation. The most recent evolution of an information transfer system is the one we are using at this very moment – the appearance of language in humans.

In 2000, Maynard Smith wrote the first of a series of articles in which he explored the nature of genetic information and exchanged views with philosophers of biology.41 Maynard Smith put evolution at the heart of his view of genetic information and where it comes from: ‘DNA contains information that has been programmed by natural selection’, he stated, and as a consequence the quantity and quality of genetic information has increased over the past 3.8 billion years.42 From this point of view, natural selection is the coder that has given the DNA sequence meaning: ‘genomic information is “meaningful” in that it generates an organism able to survive in the environment in which selection has acted’, he wrote.43 In other words, genes provide the cell with instructions that have been encoded through natural selection: that is the nature of genetic information.

Genetic information is not like the effect of the environment – most scientists and philosophers consider that environmental factors, although they have shaped genetic information through natural selection and form the conditions that allow genes to be expressed, do not themselves contain information (some philosophers disagree).44 For example, although changes in temperature can alter the expression of sex-determining genes in crocodiles thereby changing the sex-ratio of a population, the meaning of increased temperature is not the production of more male crocodiles. ‘It is for this reason that we speak of genes carrying information during development, and of environmental fluctuations not doing so’, argued Maynard Smith.45

For Ulrich Stegmann, a biologist turned philosopher at the University of Aberdeen, DNA contains information that is conditionally expressed. One way of thinking about this is that DNA sequences act a bit like a recipe. Protein synthesis proceeds in a step-by-step fashion, where each step depends on an external factor (codons in DNA and then in mRNA), in the same way that a recipe determines the order in which a cook puts together the ingredients and uses the utensils.46 The idea of a gene as a computer program is another popular metaphor, according to which the program responds to input conditions in various ways and, depending on those inputs, produces various consistent outputs.47 However, these are only metaphors. Genes are not programs or recipes, and organisms are not computers or cakes.

The first systematic critic of the concept of genetic information was the philosopher of biology Sahotra Sarkar, of the University of Texas at Austin. For Sarkar, like Wolpert and Apter in 1965, genetic information is ‘little more than a metaphor that masquerades as a theoretical concept’.48 Sarkar’s critique rests partly on the fact that in eukaryotes, with their complex system of gene splicing, the DNA sequence does not correspond to the amino acid sequence. Strictly speaking, our genes therefore do not correspond to Crick’s definition of genetic information, because the DNA sequence has to be processed and mediated before it appears as an amino acid sequence. Sarkar also points out that genetic information differs from artificial codes because it is impossible to back-translate a protein sequence into a DNA sequence, owing to the redundancy of the genetic code, the presence of introns in eukaryotes and the existing of multiple splicing. For Sarkar, genetic information therefore fails what he calls the test of reverse differential specificity, and, he argues, the concept has ceased to be a useful tool for discovery.49 To my mind, Sarkar’s critique does not invalidate the use of the term information when discussing the content of genes. Instead, it underlines that genetic information is not like other kinds of information. Neither does this critique undermine the existence of a genetic code: a particular codon will produce a particular amino acid – the triplet of bases represents and encodes that amino acid. That is a code. The fact that you cannot reliably back-translate from amino acid into DNA may disqualify the use of the word code for a philosopher, but it does not for a scientist, or for a member of the public.

As the philosopher Peter Godfrey-Smith has pointed out, part of the problem flows from the fact that the meaning of the word code as used to describe the content of genes is not strictly identical to the word code as used in other contexts (Godfrey-Smith nevertheless thinks that it is legitimate to use the term code in molecular genetics).50 The genetic code is not an artificially designed system, it is a phrase that describes the sixty-four ways in which a part of one molecule (messenger RNA) binds with part of another (a tRNA), which in turn binds with another (an amino acid), the detail of which can only be fully understood in an evolutionary context. Sarkar put it pithily: ‘DNA is, ultimately, a molecule and not a language’.51 DNA is a replicating molecule that, in the right context, leads to the production of certain chemical sequences through the information it contains.

Despite these philosophical clarifications, at first glance the genetic code does indeed look like an artificial code, and the initial assumption was that it therefore came with the associated baggage of such an artefact, such as strictly logical rules and the ability to back-translate. This apparent similarity between the genetic code and artificial codes beguiled many scientists in the 1950s as they tried to crack the code using mathematical principles. Interpreting the genetic code in terms of precise analogies, strict definitions and exact parallels to artificial systems will almost certainly fail, because the genetic code, like every other aspect of biology, has not been designed. It is part of life, and has evolved. It can be properly understood only in its historical, biological context. That was the lesson of the doomed attempts to break the code in the 1950s, and it should guide us today in trying to understand what is in our genes.

For some philosophers, describing the content of genes as information suggests that DNA determines all the characters of an organism in an absolute and unmediated fashion. This critique is misplaced, because in reality few, if any, scientists hold such extreme views. There is a rule of thumb in reading popular science reporting (or, indeed, a scientific paper): if an article describes ‘the gene for’ something, you are almost certainly reading an over-simplistic account. Genes rarely do just one thing; even if a gene produces only one kind of protein, that protein can have different cons
equences in different contexts.

The gene that got me interested in studying the effects of genes on behaviour, back in 1976, was a Drosophila gene called dunce that was identified in Seymour Benzer’s lab – flies with a mutation in this gene show defects in learning and memory.52 Dunce might seem to be a gene ‘for’ learning or memory, and it primarily codes for an enzyme that affects the level of an intracellular signalling molecule called cAMP, which has been implicated in learning in a wide range of organisms. But through multiple splicing dunce can produce seventeen separate proteins, varying in length from 521 to 1,209 amino acids. Mutations in this gene can affect a wide range of characters apart from learning and memory, including female fertility and the insect’s responses to organophosphates.53 In the light of this knowledge, what exactly dunce is ‘for’ escapes easy definition. Although we know what it does under some circumstances, and what happens when specific parts of the gene are mutated, that does not mean that the gene has a single function. And remember, dunce is nothing special, it is just one gene out of billions that exist throughout nature.

Many of those philosophers who criticise the idea that genes contain information rightly point out that DNA can do nothing on its own, emphasising the role that proteins play in life.54 This is hardly a major criticism – it is true of all representations, codes or languages. The printed symbols that you are looking at represent words and ultimately concepts that I have encoded onto paper, but they mean nothing until they are read. That does not stop them from being part of a language, and does not undermine their fundamental importance in communication. As to the essential role of proteins, Crick said basically the same thing in his 1957 lecture:

the main function of the genetic material is to control (not necessarily directly) the synthesis of proteins. … Once the central and unique role of proteins is admitted there seems little point in genes doing anything else.55

Some of these critics argue that DNA is merely one of many factors, including the environment, that equally determine the life-cycles of organisms – this is called the parity thesis.56 There are a handful of scientists who agree with this extreme position and argue that proteins, the environment, or the cell’s metabolism, play a role that is equal to, or greater than, DNA in determining the characteristics of organisms.57 These scientists remain in a very small minority, because the overwhelming evidence is against their view. It does not correspond to what happens in our laboratories, where DNA is manipulated, altered and transferred according to gene-centred experimental protocols, and where the expected outcome occurs. When students in my laboratory take genes from three separate organisms and combine them, using a regulatory gene from yeast to drive the expression of a jellyfish gene that encodes fluorescent protein so that a single cell in a maggot’s nose glows, the determining causal factor is the genes. The environment, the cell, the maggot, and the ingenious humans who designed the experiment are all permissive factors that had to be in the correct state for the genes to produce their desired effect, but the contribution of these peripheral conditions to the outcome is qualitatively unlike the contribution of the DNA. In this case, the genes function exactly as if they contained information that determines the outcome, because they do.

Although philosophers tend to be interested in the majority of cases where genes are not destiny, it is worth remembering that in some situations they most definitely are. If you have two copies of the sickling version of the haemoglobin gene, you will suffer from terrible anaemia and other debilitating symptoms. Nothing in your environment or upbringing seems to be able to alter that. Even more tragically, if you carry a single copy of the Huntingtin gene containing a CAG trinucleotide that is repeated potentially sixty times over, then you will eventually suffer from Huntington’s disease, a neurodegenerative disorder. This genetic disease shows varying symptoms in different individuals, partly as a result of differences in the number of CAG repeats, but it is always fatal.58

In many cases, however, genes are not the ultimate determiner or cause of biological phenomena. In the example of sex determination in crocodiles given earlier, the genes are constant, and the proportion of male and female crocodiles is determined not directly by the genes but by the way in which the temperature affects the activity of those genes. In that case, the decisive causal factor is temperature, but it does not act alone. Temperature exerts its effects by altering the activity of sex-determining genes, through the production of proteins and RNA molecules that are themselves the product of other genes. Genes need cells, which they create, to realise the conditional instructions that they contain, and the environment has to be permissive. However, in similar conditions, similar effects will tend to be produced. The way in which those effects percolate out into the anatomy, physiology and behaviour of a whole organism can be unpredictable, making it hard to draw a direct line between a particular gene and a particular character.

The behaviour geneticists Doug Wahlsten and John Crabbe explored this problem in 1999 when they got separate laboratories to carry out the same behavioural experiments on the same inbred strains of mice. There were systematic differences in the behaviour of the mice in different laboratories, indicating that the route from gene to behaviour depends on many complex factors, including the experimental set-up and the immediate environment.59 That does not mean that it is impossible to test reliably for genetic effects on behaviour: in 2006, Wahlsten and Crabbe reported that inbred mice strains can show very high levels of behavioural consistency over time (for example in locomotor activity or in preference for ethanol), even when the experiments were conducted with a gap of fifty years.60

These results are not particularly surprising to anyone who has done an experiment on the genetics of behaviour. Organisms are not robots, and their continual interaction with the environment throughout their development and during the experiment creates genetic, physiological and behavioural noise that can affect the results. That does not mean to say that genes are not involved in determining anatomy, physiology and behaviour; it simply means that it is sometimes extremely hard to study these effects.

Attempts to detect genetic factors underlying intelligence have proved particularly problematic. There are clear genetic effects on cognitive ability: no chimpanzee will ever be able to act, speak and think like the average human. That flows from the relatively small differences in our DNA – our genes produce two species with different levels of intellectual ability. The problems begin when it comes to studying the differences in intelligence (whatever that might be) that can be observed between humans: pinning down what part is due to our slightly different sets of genes is very difficult. In 2014, a study of more than 100,000 people sought to correlate genetic variability with variations in cognitive ability and educational attainment.61 The authors found just three genetic variations across the whole genome that might be implicated in the cognitive differences they were measuring, and these all had extremely small effects. There are undoubtedly genetic differences between humans that affect our intelligence, but it seems probable there are very many such genetic factors, each contributing a tiny amount, with any individual having a mixture of a wide range of these genes. The lesson of such studies is that if the character that is being investigated is largely determined by the environment, as seems reasonable to imagine is the case for educational attainment, then it will be difficult to detect genetic effects.

If it turns out that there are important genetic factors underlying individual differences in human cognitive ability, the challenge would be for society to decide how to use – or not – that information. However, I would be very surprised if this were the case. The fact that genes contain information that determines the sequence of nucleic acids and proteins does not imply that all characters are genetically determined. Some are, many are not. Biology is complicated.

*

Describing the content of genes as information, and viewing the activity of cells and organisms as involving the movement of information, puts all levels of
life into a single framework. As Crick put it, life is characterised by the flow of energy, the flow of matter and the flow of information. Information flow involves the activity of specific molecules and, at the level of a whole organism, of cells or groups of cells. Conceptualising the whole process as having an underlying unity in terms of information provides a context that helps explain how molecules, cells, organs and organisms interact and are coordinated.

This reflects one of the central conceptual approaches in the history of the genetic code and of gene function, the cybernetic vision. Cybernetics – the study of control and apparently purposive behaviour in animals and machines – exerted tremendous influence in the late 1940s and throughout the 1950s, because it appeared that it would form a new science, providing a way of uniting all levels of biology with engineering and mathematics. That did not happen, and the tide of enthusiasm for cybernetics gradually ebbed when it became evident that, beyond its emphasis on control and the existence of negative feedback loops to produce apparently purposive behaviour, cybernetics did not provide a predictive framework for future discoveries.

‹ Prev Next ›