Programming the Universe

Page 19

by Seth Lloyd

Algorithmic Information

Algorithmic information is a measure of how hard it is to represent a text or a bit string using a computer. The algorithmic information content of a text or a bit string is equal to the length, in bits, of the shortest computer program that produces that text or bit string as output.

In chapter 2, we saw that computer languages provide a method for assigning meaning to strings of bits. Using such a language, these strings can be interpreted as instructions that tell a computer to produce a particular output. However, for any desired output, there are many languages to choose from, and typically many computer programs able to produce the sought-after result. For example, there are many different programs that produce as their output the first million digits of π. Significantly, though, they are not all of the same length. One program simply says “PRINT 3.1415926 . . .” (where the “ . . .” is a list of the remaining 999,992 digits). This program is straightforward, but long. A shorter, more sophisticated program would specify a particular method of calculating those million digits. For example, such a program could follow the example of the ancient Greeks and approximate the circumferences of smaller and smaller circles as a sequence of straight lines. A program to calculate the digits of π by this procedure might be no more than a few hundred instructions long.

For any desired output, the shortest of the available programs that produce it is particularly interesting. For any number, the “algorithmic information content” is defined as the length in bits of the shortest computer program enabling the computer to print out that number. This shortest computer program can be thought of as the most compact representation of the number in the given computer language.

Algorithmic information content was discovered independently at the beginning of the 1960s by the Cambridge, Massachusetts, scientist Ray Solomonoff, the Russian mathematician Andrei Nikolaevich Kolmogorov, and Gregory Chaitin, then an eighteen-year-old student at the City College of New York. These three researchers noted that algorithmic information content provided in some ways a more satisfying measure of information than the length of a number in bits (which is another way of describing the number’s information content) because algorithmic information respects the intrinsic mathematical regularities of a number in a way that the length in bits fails to grasp.

For most numbers, the algorithmic information content is close to the length of the number in bits. It can’t be much longer than the length of the number, since any number—e.g., 01110110101110111011101—can be produced by a program that says “PRINT 01110110101110111011101.” And for most numbers, the algorithmic information content can’t be much shorter than the number, either, simply because there are far fewer short programs than there are long numbers. We can ask, for example, how many twenty-bit numbers could be produced by ten-bit programs. There are 220 (that is, 1,048,576) twenty-bit numbers but only 1,024 = 210 possible ten-bit programs. So, at most, one out of 1,024 twenty-bit numbers could be produced by a ten-bit program.

The numbers that can be produced by short programs are those that have mathematical regularities. Pi is one such number, and so is the number consisting of a billion 1’s, which can be produced by a program that says “PRINT ‘1’ one billion times.” But, as noted, most numbers don’t have significant mathematical regularities. Most numbers are effectively random.

The shortest computer program that will produce a number is always defined relative to a given computer language—Java, C, Fortran, BASIC. But its length does not depend strongly on which language is used. Most languages can produce the first million digits of π using only a few hundred instructions. In fact, a program to produce a number written in Fortran can be translated into a program to produce the same number written in Java by supplying a translating program. As a result, the shortest Java program for producing the first million digits of π is no longer than the shortest Fortran program plus the length of the translating program. As the number to be produced gets longer and longer, the length of the translating program becomes, relatively, smaller and smaller, adding comparatively little length to the algorithmic information content.

This translatability of computer programs is a central feature of computation. A program written in Fortran can always be translated into another program written in Java. This translatability is an aspect of the universal nature of computation. Another familiar universal aspect of computation is that the same programs—Microsoft Word, for example—can run on computers of different architecture. The wiring diagrams of a Macintosh and a PC are very different, as is the way in which a particular instruction is represented and executed on either machine. But both computers can run Word. When Word is loaded into a Macintosh, the program is translated (or “compiled”) into a set of instructions the Macintosh understands, and the same is true of a PC. Despite these underlying differences, someone writing a Word document on a Mac will tap the same keys to write the same sentence as someone writing the same document on a PC.13

Algorithmic information is an appealing concept relying on the universality and translatability of computer languages. It allows bit strings with mathematical regularity to be expressed in a compact form. The shortest program to produce a bit string can be thought of as a compressed representation of the bit string.

Many bit strings in the real world have mathematical regularities and so can be compressed. For example, in English, the letters occur with different frequency: E is most common, followed by T, A, I, O, N, S, H, R, D, L, U (the top row of the bins of movable type used by typesetters). A program that recodes English so that E corresponds to a shorter code and Q to a longer one can compress English texts by a factor of about two.

Algorithmic Probability

Ray Solomonoff originally defined algorithmic information while looking for a formal mathematical theory of Occam’s razor. The medieval philosopher William of Occam was interested in finding the simplest explanation for observed phenomena. Pluralitas non est ponenda sin necessitate, he declared: “Plurality should not be posited without necessity.” Occam urged us to accept simple explanations for phenomena over complex ones. Occam would have scoffed, for example, at people who have postulated the existence of Martians in order to explain the regular lines observable on the surface of Mars. Geological faults or optical illusion would just as well explain these lines without requiring the presence of Martians. To declare the existence of Martians in order to explain the lines on Mars is to “multiply beings beyond necessity,” or, simply put, to make things unnecessarily complicated. Occam’s razor cuts away complex explanations by declaring simple ones to be a priori more plausible.

Solomonoff used algorithmic information content to make Occam’s razor mathematically precise. Suppose we are given a set of data, expressed as a string of bits. We are looking for mechanisms that might plausibly have produced this bit string. Phrased in terms of computation, we are looking for computer programs that produce this bit string as output. Among these computer programs, Solomonoff declared, the shortest program is intrinsically the most plausible guess as to which one generated the string.

Just how much more plausible? In the 1970s, Gregory Chaitin and his colleague Charles Bennett at IBM began to talk about algorithmic information in terms of monkeys. Suppose a monkey types random strings of bits as input into a computer. The computer interprets the strings as programs written in a suitable language—Java, say. What is the probability that the computer outputs the first million digits of π? It’s the same as the probability that the random strings typed by a monkey reproduces a Java program for calculating the first million digits of π. The probability of a monkey typing out the very first bit of such a program correctly is, of course, 50-50, or one-half. The probability of getting the first two bits right is 75-25, or one-quarter. The probability of getting the first 1,000 bits right is 1⁄2 multiplied by itself 1,000 times, or 1⁄21000, a very small number. Obviously, the longer the program, the more unlikely it is that a monkey will type it out correctly.
/> The probability that the random program the monkey inputs into the computer will give the first million digits of π as output is called the “algorithmic probability” of π. Since long programs are so much less likely to be typed correctly than short programs, the algorithmic probability is greatest for the shortest programs. The shortest program that can output a particular number is the most plausible explanation for how that number was produced.

To look at this from another angle, numbers produced by short programs are more likely to appear as outputs of the monkey’s computer than numbers that can be produced only by long programs. Many beautiful and intricate mathematical patterns—regular geometric shapes, fractal patterns, the laws of quantum mechanics, elementary particles, the laws of chemistry—can be produced by short computer programs. Believe it or not, a monkey has a good shot at producing everything we see.

Algorithmically probable things are just those things that exhibit large amounts of regularity, structure, and order. In other words, whereas the typewriter monkey universe is just gibberish, the computer monkey universe contains, along with plenty of gibberish, some interesting features. Large parts of the computer monkey universe consist of structures that can be generated from simple mathematical formulae and concise computer programs. If the monkeys type into computers instead of typewriters, they produce a universe with a mix of order and randomness, in which complex systems arise naturally from simple origins—that is, they produce a universe suspiciously like our own. Simple programs together with lots of information processing give rise to complex outputs. Can this be an explanation for the complexity of our universe?

What is required to make this explanation testable? For the computational explanation of complexity to work, two ingredients are necessary: (a) a computer, and (b) monkeys. The laws of quantum mechanics themselves supply our computer. But where are the monkeys? What physical mechanism is injecting information into our universe, programming it with a string of random bits? We need, again, look no further than the laws of quantum mechanics, which are constantly injecting new information into the universe in the form of quantum fluctuations. In the early universe, for example, galaxies formed around seeds—places where the density of matter was a tiny bit higher than elsewhere. The seeds of galaxy formation were provided by quantum fluctuations: the average density of matter was everywhere the same, but quantum mechanics added random fluctuations that allowed galaxies to coalesce.

Quantum fluctuations are ubiquitous, and they tend to insert themselves at the points where the universe is most sensitive. Take biology, for instance. You get your DNA from your mother and father, but your exact sequence of DNA is produced by a process of recombination after the sperm enters the egg and deposits its genetic material. Just which genes from your mother get combined with which genes from your father depends sensitively on chemical and thermal fluctuations during the recombination process, and these chemical and thermal fluctuations can be traced back to quantum mechanics. Quantum accidents programmed your DNA to be different from that of your brother or sister. You and I, and the differences between us, came from quantum accidents. And so, from quantum seeds, came the universe itself. Quantum fluctuations are the monkeys that program the universe.

Randomness arises in the computational universe because the initial state of the universe is a superposition of different program states, each one of which sets the universe down a different computational path and some of which result in complex and interesting behavior. The quantum-computational universe follows all of these computational paths simultaneously, in quantum parallel, and these computational paths correspond to the decoherent histories described earlier. Since the computational histories decohere, we can talk about them at the dinner table; one or another of these histories has actually occurred. One of these decoherent histories corresponds to the universe we see around us.

What Is Complexity?

The computational universe spontaneously gives rise to every possible form of computable behavior; anything it can be programmed to do, it does. Some of this behavior is orderly, some is random; some is simple, some is complex. But what is complexity, anyway?

Halfway through my Ph.D. in physics at Rockefeller University, I was almost expelled. I had come to Rockefeller because of its reputation for supporting independent work. After passing the qualifying exams, I was pursuing research on the role of information in quantum-mechanical systems and how quantum information processing might be related to a variety of fundamental processes in physics, including quantum gravity. In other words, I was doing just what I do now, some twenty years later. I had no immediate supervisor. One day in 1986 two professors and Heinz Pagels, the executive director of the New York Academy of Sciences, walked into my office. “Lloyd,” they said, “you must stop working on this crazy stuff and switch to a topic that we understand. If you do not, you must leave Rockefeller.”

This announcement came as a complete surprise. I knew I was working outside the bounds of the normal topics in the department. Most of the other graduate students were working on string theory, a highly abstract theory that was shaving space into multiple unvisualizable dimensions in an attempt to reconcile quantum theory with general relativity and thus explain all known fundamental physics. I couldn’t for the life of me see why what I was doing was crazier than string theory.

But many other people were working on string theory, and at that point only a handful of people around the world were working on quantum information. I would later meet these people and work with them, but at the time I didn’t even know who they were. So one immediate result of this meeting was that I knuckled under and agreed to solve two conventional problems in quantum field theory over the following few months. The best result of being on the verge of expulsion was that I got to work with Heinz Pagels. Heinz was a striking figure, fond of double-breasted pin-striped suits and patent leather boots. With his bouffant white hairdo and these Mafia duds, he resembled a slender John Gotti. He was a wild man of physics and he was willing to supervise me.

After four months, I had done both the problems I had been assigned. After eight months, I had convinced Heinz that looking at black-hole evaporation in terms of quantum information processing might not be such a bad idea. After a year, he was taking me on tours of the East Village to meet his even more outré friends, most of them in the world of performance art. He also introduced me to his wife, Elaine, the author of The Gnostic Gospels, a book that completely changed my ideas about the social nature of religion. I might have been headed for the profession of taxi driver, like so many other unemployed physics Ph.D.s, but now, at least, I was having some fun along the way.

The real turning point of our intellectual relationship came on the day Heinz walked into my office and said, “OK, Seth, how are we going to measure complexity?”

“We can’t,” I said. “Things are complex exactly when they defy quantification.”

“Bullshit,” said Heinz. “Let’s try.”

Asking how to measure complexity is like asking how to measure physics. There are many measurable quantities among the laws of physics—energy, distance, temperature, pressure, electric charge—but “physics” itself is not a measurable quantity. Similarly, in identifying the laws of complexity, we should expect there to be a number of measurable quantities that make up a complex system. I spent some months reading up on various methods of defining complexity. The first concept I looked at was computational complexity. The complexity of a computation is equal to the number of elementary logical operations that must be performed in the course of the computation. (A related concept, spatial computational complexity, is equal to the number of bits used in the course of the computation.) Computational complexity is not a measure of complexity so much as a measure of effort, or of the resources required to perform a given task. There are plenty of computations that take a long time and use up lots of space but do not produce anything very complex. As will be seen, computational complexity is an important ingred
ient in a good definition of complexity—but it is not itself a good definition.

Algorithmic information had also been proposed as a measure of complexity—in fact, Chaitin had originally called it “algorithmic complexity.” But bit strings with high algorithmic information content don’t look complex, they look random; indeed, algorithmic information has also been called algorithmic randomness. Moreover, bit strings with high algorithmic information content are easy to create: just flip a coin 100 times. The resulting bit string will likely have close to the maximum possible algorithmic information content. Heinz and I felt that a complex thing should be intricate, structured, and hard to reproduce. Things with high algorithmic information content may require lots of bits to describe, but most bit strings with high algorithmic information content are unstructured and easy to produce.

As I looked further and further, I found more and more definitions of complexity, each a variation on the theme of required effort or amount of information. Several years later, I gave a talk on these various measures of complexity at a conference sponsored by the Santa Fe Institute, which had been founded in the mid-1980s by George Cowan, Murray Gell-Mann, and a group of senior scientists at nearby Los Alamos who were interested in examining the rules that underlie and give rise to complex systems. The talk was titled “Thirty-one Measures of Complexity,” the “thirty-one” being a convenient allusion to the number of flavors of ice cream offered by Baskin-Robbins. Although I had not published a paper with this title, word of the talk spread throughout the Internet, and for many years my paper on thirty-one measures of complexity was my most frequently requested work, despite the fact that it didn’t exist. (I finally published the list as a paper a couple of years ago,14 just so that I would no longer have to respond to all the e-mails requesting this nonexistent paper.) As it happened, between the talk and the publication of the paper, the number of measures of complexity on the list grew from thirty-one to forty-two. (The length of the new list prompted the writer John Horgan to claim, in his book The End of Science, that the science of complex systems was bankrupt, since researchers couldn’t even agree on what complexity was, let alone conduct any significant research about it.)

‹ Prev Next ›