Babbage’s interest in codes and ciphers was born during his schoolboy days; it appealed to his desire to get to the essence and hidden meanings of things, demonstrated even in his infanthood when he would break his toys to find out what was inside. When he was at school, Babbage’s skill at decoding would often get him into trouble: “The bigger boys made ciphers, but if I got hold of a few words, I usually found out the key,” he boasted. “The consequence of this ingenuity was occasionally painful: the owners of the detected ciphers sometimes thrashed me, though the fault lay in their own stupidity.”15 Black eyes and bruised knuckles notwithstanding, Babbage gained the lifelong belief that “deciphering is, in my opinion, one of the most fascinating of arts.”
Like Babbage, Herschel was a schoolboy enamored of decoding. Herschel’s first known letter was sent to his mother from his school when he was seven years old requesting that she send his music and his “ciphering books.”16 Whewell was also drawn to codes and ciphers as a young man. His youthful courtship of a girl was abetted by a cipher; upon the young woman’s request, Whewell wrote a bit of doggerel in an elementary cipher, replacing the word “cipher/sigh for” with the symbol “Ø”:
U Ø a Ø, but I Ø U;
O Ø no Ø but O Ø me;
O let not my Ø a Ø go,
But give Ø a Ø I Ø U so.
(You sigh for a cipher, but I sigh for you; O sigh for no cipher but O sigh for me; O let not my sigh for a cipher go, But give sigh for a sigh, for I sigh for you so.)17 It is not known whether or not the cipher had the intended effect on the young lady.
It is little wonder that the members of the Philosophical Breakfast Club were all intrigued by ciphers. Deciphering is like scientific discovery—confronting an initially impenetrable wall, the cryptanalyst, like the scientist, must slowly chip away until the secrets that lie beneath are revealed. Both Whewell and Herschel explicitly drew this connection in their writings on scientific method, describing scientific discovery as a kind of decoding of nature. Whewell used this metaphor when he argued for the importance of predictive success, noting that such success is evidence that we have cracked nature’s code. And he came to hold that the world of facts is like an alphabet used to encrypt a secret message; when natural philosophers “had deciphered there a comprehensive and substantial truth, they could not believe that the letters had been thrown together by chance.”18 In an article written for the fledgling journal Photographic News, Herschel similarly proposed that finding a method for creating color photographs (which he had very nearly managed to do himself) was akin to breaking a seemingly impenetrable cipher; Herschel appended to the article a text written in a cipher of his own, leaving it for the readers to try to decode the message.19 He had previously tried to stump Babbage, sending him a letter written entirely in cipher (except for “Dear Babbage”). Babbage broke the cipher handily, leading Herschel to exclaim, “You are a real wonder.… I shall never try to trick you again!”20
BABBAGE HAD RETURNED to his childhood interest in ciphers during the 1830s. In 1835, Babbage employed his considerable skills in cryptanalysis to aid his friend and fellow founder of the Astronomical Society, Francis Baily, who was writing a book on John Flamsteed (1646–1719), the first Astronomer Royal. Baily was attempting to establish the accuracy of Flamsteed’s observations from the Royal Observatory at Greenwich, and the cause for some known errors in those observations. In his writings, Flamsteed had suggested that the problems were due to an error in his mural arc, the angle-measuring device built right into a wall lying on the prime meridian at Greenwich, marking the point of zero longitude. If the mural arc was responsible for the errors, the fault would lie with its builder, not with Flamsteed himself. Baily found a letter from Abraham Sharp, Flamsteed’s assistant, written in response to a query from a Mr. Crosthwait about whether such an error existed. Sharp’s letter, however, was written in a cipher, so Baily was unable to learn its contents. He mentioned the problem to Babbage, who, “by a laborious and minute examination and comparison of all the parts,” Baily later explained, was able to decipher the letter. This enabled Baily to discover that the errors arose not from the mural arc, as Flamsteed had rather deceptively implied, but from the refraction table he had used.21 Thanks to Babbage, Baily was able to solve a problem about Flamsteed’s observations that had haunted astronomers for over a century. It also influenced later astronomers’ opinions of the first Astronomer Royal; Herschel told Beaufort that reading Baily’s book had diminished his opinion of Flamsteed as a person and an astronomer.22
At around this time—just when Babbage was most involved with the newly founded Statistical Society of London, and the Statistical Section of the British Association—he began to apply statistics to the problem of deciphering, nearly a century before William F. Friedman, who is generally considered the first to have done so.23 In a letter to Quetelet, which the Belgian translated and had published in a French journal, Babbage listed tables of the relative frequencies of double letters in English, French, Italian, German, and Latin. In English, it turns out, the most frequently doubled letter is l, which occurs 27.8 times in 10,000 letters (16.1 times in the middle of a word, 11.7 times at the end). The next most frequently doubled letter is e, 18.8 times in the middle of a word, and 1.9 times at the end. The rarest doubled letter is g, which is found only 1.5 times in 10,000 letters.24
Babbage also began to count the relative frequency of the occurrence of single letters, and made lists of the most common two- and three-letter words, organizing them by their consonant/vowel pattern, such as CCV, VCC, CVC, VCV, CVV, VVC, and VVV. Then he began to order words that end in -ion according to word length.25 He started writing dictionaries of words that began with each of the twenty-six letters of the alphabet, ordered by how many letters were in each word. All of this was to be part of a planned book, called The Philosophy of Decyphering, but Babbage never actually wrote it.
Babbage had realized that these kinds of statistical studies would be invaluable in deciphering any monoalphabetic cipher. In a monoalphabetic cipher, letters in the plain text are substituted by letters in an alphabet defined differently from the standard ordered alphabet. This cipher alphabet can be shifted (e.g., instead of “a,b,c,d” you might have “b,c,d,e” or “c,d,e,f”), inverted (“z,y,x,w”), or ordered by the use of a key word or phrase, in which case the cipher alphabet begins with the key word and then continues with the rest of the alphabet, in order but minus the letters already appearing in the key word (e.g., key word “leopard”: l,e,o,p,a,r,d,b,c,f,g,h,i,j,k,m,n,q,s,t,u,v,w,x,y,z).
In order to encrypt a message using this cipher alphabet, the writer would place the normal alphabet over the cipher alphabet, and replace each letter in the plain text with the matching letter in the cipher alphabet.
a b c d e f g h i j k l m n o p q r s t u v w x y z
l e o p a r d b c f g h i j k m n q s t u v w x y z
So, “attack the enemy fortress” becomes “lttlog tba ajapy rcftfass.” To make the deciphering more difficult, the spaces could be omitted, and the cipher text would read “lttlogtbaajapyrcftfass.” The person sent the cipher text would have been provided with the key word, and would use the opposite procedure to decipher the message.
In a monoalphabetic cipher, the cipher alphabet is fixed for the entire encryption.26 Babbage’s study of letter frequency, especially of the double letters, provides an important and effective method for cryptanalysis. Since we know that the letter l is the most frequently doubled letter, we can begin deciphering a text by substituting l for all the cases of double letters. Similarly, Babbage’s list of the consonant/vowel patterns of two- and three-letter words can also help in finding a way into an encrypted message, by seeking those patterns in the cipher text. Further, his study of letter frequency is useful because the most common letters, e, t, and a, will generally stand out, however they are disguised, since they are substituted with the same letter each time. So, in the example above, since the letter a appears the most frequently, t
he cryptanalyst can start by assuming that a is the substitution for e, t, or a (in fact, it is the substitution for e). It was by applying methods such as this that Babbage was able to decipher the Childe letters; his first breakthrough came when he realized that the most common word in the cipher text was “sqj,” which he soon determined stood for the common plain-text word “the.” By replacing all the occurrences of s with t, q with h, and j with e, Babbage was able to start breaking down the wall.
But Babbage knew that these statistical methods were not, by themselves, enough to break the most difficult type of cipher, a polyalphabetic substitution cipher, in which the cipher alphabet changes during the encryption. The beauty of this method for the one sending the message is that the cryptanalyst loses the power of determining the frequency of letters: in a polyalphabetic substitution cipher, the first letter in a double pair is encrypted using a different cipher alphabet from the second letter in the pair, so in the cipher text no double appears at all. One polyalphabetic cipher had been known for centuries as the “undecipherable cipher.” Never one to shrink from a seemingly impossible task, Babbage threw himself into the attempt to crack it, like a man possessed.
AS IN THE Ninth Bridgewater Treatise, Babbage was determined to demonstrate the power of mathematics. This time he was not using statistics to uncover the divine origin of the universe, but rather to uncover secrets hidden by a cipher considered unbreakable. And, unlike in that contentious and curmudgeonly work, Babbage was now returning to one of the original goals of the Philosophical Breakfast Club: to use scientific method for the public good, rather than for promoting his own fame or the merits of his engines. Babbage tackled a cipher that had been used by the French during the Napoleonic Wars: the Vigenère. Babbage knew that if the British had only had the means to decipher the tactical messages being sent with this cipher, their victory could have come sooner, with less loss of life and less disruption to trade between the nations. And this cipher was now being used by a new enemy of Britain: Russia.
The cipher—known as le chiffre indéchiffrable, the indecipherable cipher—had been invented in 1553 by Giovanni Battista Bellaso, and publicized in 1586 by a young French diplomat named Blaise de Vigenère, by whose name the cipher became known. Vigenère’s cipher was a polyalphabetic substitution cipher utilizing twenty-six cipher alphabets. These are arranged in a “Vigenère square,” a plain-text alphabet followed by the twenty-six different cipher alphabets. These cipher alphabets are each shifted from the previous alphabet by one letter, as shown in the table below:
In the Vigenère cipher, a different line is used to encipher each letter of a plain-text message. So, the first letter of a plain-text three-letter word might be enciphered using row 1 of the Vigenère square, the second letter might be enciphered using row 11, and the third letter enciphered using row 26.
In order to encrypt a message that can be deciphered, there must be an agreed-upon system of switching between rows. This is achieved using a key word. To encrypt a message such as “attack the enemy fortress,” using the key word bacon, the first letter would be encrypted using the alphabet that begins with b (line 1 of the Vigenère square), the second letter would be encrypted using the alphabet that begins with an a (line 26), the third letter would be encrypted using the alphabet that begins with a c (line 2), the fourth letter using the alphabet that begins with o (line 14), and the fifth with the alphabet that begins with n (line 13), repeating the order of the cipher alphabet on the pattern baconbaconbaconbacon. So, in the case of “attack the enemy fortress,” you would have the following:
It is easy to see the challenges for the cryptanalyst. In the plain text there are three occurrences of double letters, while in the cipher text there are none, because of the use of different alphabets. For example, the first double, tt, is represented in the cipher text by tw. Nor does frequency analysis on the most common letters work, because in the cipher text the most common letters are t, p, and s, but each of these represents three different letters: s, for example, stands for e, f, and s. The Vigenère seems, indeed, to be undecipherable.
ON AUGUST 10, 1854, mere weeks after Babbage’s testimony in the Childe case, the Journal of the Society of the Arts published a letter by John Thwaites, a Bristol dentist, who claimed to have invented a new, unbreakable cipher. “Its uses must be obvious to all,” crowed Thwaites, since there was “not a chance of [its] discovery.” Recognizing right away that Thwaites had “reinvented” the Vigenère cipher, Babbage scolded, in a published letter signed only “C,” that “the cypher in the Journal is a very old one, and to be found in most of the books.”
Thwaites, who had applied for a patent for his “new” cipher, responded indignantly. Within the journal’s pages, Thwaites and Babbage faced off. Thwaites issued a challenge: he gave both the plain text and the cipher text, demanding that “C” find the key to the cipher. Babbage set to work, alongside his son Henry Prevost Babbage, who was home on leave from the Indian army.27 They discovered that Thwaites had doubly encrypted the passage, using two keys successively. Babbage issued a challenge of his own to Thwaites, daring him to find the key used in Babbage’s encryption of the same passage using the same cipher. Thwaites refused further comment. At around this time he abandoned the attempt to patent the cipher; the patent application bears the comment “void by reason of the patentee having neglected to file a specification in pursuance of the conditions of the letters patent.”28 Thwaites had apparently been convinced by “C” ’s letter or by someone else that his code was identical to the long-known Vigenère.
In the process of finding the key to Thwaites’s message, Babbage invented a general method for deciphering any text encoded by the Vigenère. This has only recently been discovered by a careful examination of notes scattered throughout the collection of Babbage papers held at the British Library. His notes show pages of equations expressing the mathematical relations between the letters of the plain text, the cipher text, and the key text. All the mathematical relations are spelled out, in very elementary terms, as if Babbage were writing not only for himself but for eventual publication in his deciphering book—or for explaining it to another interested party. But Babbage never published this method in full.
Babbage saw that the code was easy to break once the length of the key word was determined, even if the key word itself had not been discovered. Because the key keeps repeating itself, if the periodicity of the key is known, then the cryptanalyst can treat the cipher text as separate occurrences of a simple monoalphabetic code. His method involves looking for sequences of letters that appear more than once in the cipher text. This will always occur when the cipher text is long enough. For example, if the key word is bacon, which has five letters, there are only five possible ways that the word the can be encoded: uhg, tjs, vvr, huf, gie. Since the is a very common word, chances are that in a message several sentences long there will be at least one repeated occurrence of one of the five possible ways of encoding it. When a repeated sequence of letters is found, there are two possible explanations. The most likely is that the same sequence of letters in the plain text has been enciphered using the same parts of the key. It is possible, though much less probable, that two different sequences of letters in the plain text have been enciphered using different parts of the key, leading to the same sequence in the cipher text only by coincidence.
To determine the length of the key, the cryptanalyst looks for all repeated sequences of letters, and notes the number of spaces between the occurrences of each. He or she can use that to determine the possible length of the key, which would be a factor of those spaces. If a sequence is repeated after twenty letters, there are six possibilities: (1) the key is one letter long and recycled twenty times (but then the cipher would be monoalphabetic); (2) the key is two letters long and is recycled ten times in the course of the encryption; (3) the key is four letters long and is recycled five times; (4) the key is five letters long and is recycled four times; (5) the key is ten letters long and i
s recycled two times; (6) the key is twenty letters long and is encrypted one time.
In a long enough piece of cipher text, there will be more than one repeated sequence of letters. In this case, the cryptanalyst would be able to compare each of the different repeated sequences, in order to find the one possible key length that is shared by all of them. That is, he or she would look for the multiple of the distance between occurrences shared by all repeated sequences. If, for example, another sequence of letters is repeated after thirty letters, that would rule out the possibility that the key is four letters long or twenty letters long, because four and twenty are not multiples of thirty. That leaves open the possibilities that the key is two letters, five letters, or ten letters long. If yet another sequence is repeated after twenty-five letters, that would rule out a key term of two and ten letters, leaving only the possibility of a five-letter key term.
Once the cryptanalyst knows the length of the key, it is possible to break the cipher using frequency analysis. If the key is five letters long, then there are basically five monoalphabetic substitution ciphers at work. (For example, if the key word is bacon, there is one monoalphabetic cipher that uses the alphabet beginning with b, one that uses the alphabet beginning with a, one beginning with c, one with o, and one with n, repeating every five letters.) Grouping every fifth letter together, the analyst has five “messages,” each encrypted using a one-alphabet substitution, and each piece can then be solved using frequency analysis, by looking for the most frequent letters. As these patterns emerge, the cryptanalyst can begin to make guesses about what the key word is, and use this to solve the rest of the message. Once the whole message is deciphered, the cryptanalyst can very easily determine what each letter of the key is, and then apply the key to any future messages that were encoded using it.29
The Philosophical Breakfast Club Page 42