by DAVID KAHN
If one hears the fragment “It’s not hard for you to …,” the redundant elements say that a verb is likely to follow, although the free-will portion makes it impossible to know which one. This same prior knowledge, or, in other words, the redundant elements, detects and corrects errors that arise during the transmission of messages. This is why language tolerates so heavy a burden of redundancy. For example, if a dot is dropped in a telegraphed message in English, so that an i (..) becomes an e (.) and “individual” becomes “endividual,” the recipient will know that an error was made because English lacks the sequence “endividual.” But if the language used were the hypothetical four-letter language, in which all sequences of four letters were used and therefore all were potentially acceptable in the message, the same dropping of a dot would go undetected. “Xfim,” meaning perhaps “come,” would be changed to “xfem,” maybe meaning “go” and, without redundancy, no alarm bells would ring. (There is, of course, a higher order of redundancy—that mandated by context—which might sound the alarm. If “xfem” meant “green,” it would not fit the context. A perfectly nonredundant language can therefore probably not exist, since at least a few basic agreements that a few recurring experiences of the real world will be represented by the same verbal symbols appear to be essential for communication.)
Where the language has no redundancy—as with telephone numbers, where a single wrong digit can lead to a wrong connection—people put in their own redundancy. They repeat the number in giving it to someone. Or, in spelling out names, they say “B as in baby, not v as in Victor.” For the greater the redundancy, the easier it becomes to detect mistakes. If a language consisted only of alternations of consonants and vowels, any deviation from that pattern would flag an error.
This detection of errors is the first step toward their correction. And in this correction redundancy again plays the central role. After the recipient of “endividual” has hunted through his memory and his dictionary and found that it does not exist in English, he brings up the sequence “individual,” which does exist, from his store of prior information about English, and corrects his message. If the reader of a business letter sees the sequence “rhe company,” he will recognize “rhe” as a nonword, will remember that the rules of English often call for a similar-appearing group of letters, “the,” before a noun like “company,” will perhaps consider that r is near t on the typewriter keyboard, and then will conclude that “rhe” should be “the.”
This process is a first cousin to cryptanalysis.
For cryptanalysts bring to bear in their solutions the same prior knowledge of rules and spelling and phonetic preferences (that is, redundancy) that the ordinary reader does to correct a typographical error. What laymen do with accidental errors, cryptanalysts do with deliberate deformations. Of course a cryptogram is immensely more involved and obscure than an isolated misprint, but it has an underlying regularity that the single random error does not, and this structure assists and confirms the successive “corrections” that constitute a cryptanalysis.
But how does the cryptanalyst begin in the first place? In correcting a typographical error, all the redundant elements lie in plain view, ready for use. With a cryptogram, they are obscured. The cryptanalyst begins by breaking these elements down to their atomic form—letters. He then compares them to the redundant elements of a language that have been reduced to the same common denominator. In other words, he takes a frequency count of the letters of the cryptogram and matches it against a frequency count of the letters of the assumed plaintext language. (These counts must sometimes be modified by the conditions of the cipher. In polyalphabetics, a count must be made for each alphabet; in digraphics, the count must be of pairs. If the cryptogram is in code, the atomic forms are words, but the same principle applies.)
Having done this, how can the cryptanalyst be confident that the cryptogram’s plaintext will have approximately the same frequencies as those of plaintext in general? Why won’t the differences in subjects of discussion, in vocabulary, in expression, upset the frequencies? Because the redundant elements of language far outweigh the variable ones. The 75 per cent redundancy in English overwhelms the 25 per cent of “free will”—though this 25 per cent does keep frequency counts from matching one another exactly. The redundant elements in any text converge to make its frequency table. The need in any English text to use “the” frequently ensures that h will be a high-frequency letter. English’s preference for alveolar consonants will make n, t, r, s, d, and l all high- or medium-frequency letters. The language’s aversion to p and k keeps their frequencies low. These redundant elements are fixed and predetermined—necessarily so, if communication is to take place—and hence they stabilize the frequency tables that reflect them. The enormous preponderance of redundancy manifests itself in the closely equal proportions of e in the nine separate German frequency counts. And of course it manifests itself in the daily successes of cryptanalysts.
Shannon’s insight, his great contribution to cryptology, lay in pointing out that redundancy furnishes the ground for cryptanalysis. “In … the majority of ciphers,” he wrote, “it is only the existence of redundancy in the original messages that makes a solution possible.” This is the very basis of codebreaking. Shannon has here given an explanation for the constancy of letter frequency, and hence for the phenomena that depend on it, such as cryptanalysis. He has thus made possible, for the first time, a fundamental understanding of the process of cryptogram solution.
From this insight flow several corollaries. It follows that the lower the redundancy, the more difficult it is to solve a cryptogram. Shannon’s own two extremes of redundancy illustrate this. The last few words of Finnegans Wake are these: “End here. Us then. Finn, again! Take. Bussoftlee, mememormee! Till thousendsthee. Lps. The keys to. Given! A way a lone a last a loved a long the.” This would interpose distinctly more difficulties to a cryptanalyst than a portion of the New Testament in Basic English: “And the disciples were full of wonder at his words. But Jesus said to them again, Children, how hard it is for those who put faith in wealth to come into the kingdom of God!”
Puzzle cryptograms achieve their goal of being as hard as possible to solve by using archaic and esoteric words dredged from the far corners of the dictionary and combined in almost meaningless texts. Their redundancy is relatively low. One such cryptogram gives a self-description: Tough cryptos contain traps snaring unwary solvers: abnormal frequencies, consonantal combinations unthinkable, terminals freakish, quaint twisters like ‘myrrh’ But even here the redundant elements win out. Though a few may be suppressed, others remain, and these permit solution. The interesting question of whether the differences in redundancy among natural languages make cryptograms in some inherently more difficult to solve seems never to have been put to a test.
The problem of low redundancy arises in practice with a vengeance when the cryptanalyst is faced with enciphered code. To strip the encipherment from encicode, the cryptanalyst must solve a cryptogram whose plaintext consists of codewords and which may look like IXKDYWUKJTPLKJE…. This is of very low redundancy because of the more even use of letters, the greater freedom in combining them, the suppression of frequencies by the use of homophones, and so on. But the unavoidable repetitions of orders and reports, the pressure of the redundancy of the language pent within the vessel of the code, and the engineering of codewords so that garbles can be corrected—all these give the underlying codetext a fibrous enough texture for the cryptanalyst to grasp it for solution.
These considerations suggest that reducing the redundancy will hinder cryptanalysis. Shannon himself prescribes operating on the plaintext “with a transducer which removes all redundancies…. The fact that the vowels in a passage can be omitted without essential loss suggests a simple way of greatly improving almost any ciphering system. First delete all vowels, or as much of the message as possible without running the risk of multiple reconstructions, and then encipher the residue.” Experts who have at
tacked cryptograms from whose plaintexts only the letter e has been eliminated have found that the difficulty of solution increased noticeably. Reducing redundancy is especially effective because it robs the cryptanalyst of one of his chief tools for attack instead of just bolstering the wall of secrecy. Cryptographers of the Italian Renaissance did this when they ordered cipher clerks to drop the second letter of a doublet, as the second l in sigillo.
Such techniques rely upon the cipher clerks’ knowledge of their language to supply the suppressed elements of redundancy. Abbreviations likewise may have such low redundancy, may require such an extensive furnishing of information, as bn for battalion, that they may not only make plaintexts harder to solve, but may themselves function as a rough form of cryptography. Two gossips, for example, may refer to a third party by her initials. They hope that no one within hearing will have sufficient knowledge of the contextual situation to restore the eliminated portion of the name. Much of the Masonic ritual is printed in that form: “Do u declr, upn ur honr, tt u r promptd to….”
Another corollary is that more text is needed to solve a low-redundancy cryptogram than one with a high-redundancy plaintext. Shannon has managed to quantify the amount of material needed to achieve a unique and unambiguous solution when the plaintext has a known degree of redundancy. He calls the number of letters the “unicity distance” (or “unicity point”), and he calculates it by means of a rather complicated formula. This formula naturally differs for different ciphers, but it always includes the redundancy as one of its terms. In his original paper, in which he considered the redundancy of English at only 50 per cent, Shannon found the unicity point for monalphabetic substitution at 27 letters, for polyalphabetics with known alphabets at twice the period length, for those with unknown alphabets at 53 times the period length, for transposition at the keylength times the logarithm of the keylength factorial.
One of the most interesting uses of the unicity-point formula is in determining the validity of an alleged solution to a cryptogram, especially one of the questionable solutions, such as those claimed to be hidden in the Shakespearean plays to prove that Francis Bacon wrote them. “In general,” wrote Shannon, “we may say that if a proposed system and key solves a cryptogram for a length of material considerably greater than the unicity distance the solution is trustworthy. If the material is of the same order or shorter than the unicity distance the solution is highly suspicious.” Shannon’s formula was not applied to most of these “decipherments” because most were published before his work was; furthermore, the formula would ramify to unmanageable terms to account for the many subrules and exceptions in these extremely flexible “systems.” It triumphed in its only known combat action—one that took place in the pages of Life magazine on a solution proposed by Ib Melchior, son of the opera star Lauritz Melchior.
Melchior thought that the decipherment of a cryptogram that he detected on Shakespeare’s tombstone might lead him to an early text of a play. He obtained a numerical ciphertext by counting the number of successive capitals and small letters in the epitaph on Shakespeare’s grave. This he solved to read: elesennrelaledelleemnaamleetedeeasen. But the eleven letters ledelleemna made no sense to Melchior, and, noting that they came from the letters between the two THE ligatures on the tombstone, he concluded that they were change symbols to signal a shift in cipher alphabets. With this change, the new “solution” read: elesennrelaedewedgeeereamleetedeeasen. Taking away the “obvious nulls” and modernizing the Elizabethan spelling, Melchior read: Elsinore laid wedge first Hamlet edition. This was supposed to mean that a first edition of Hamlet was buried in a wedge-shaped cell deep within the castle of Elsinore. But even granting the generously low redundancy of only 50 per cent, a crucial section of the cipher flunks the Shannon unicity test completely, while the remaining letters barely meet the minimum and do not fulfill the requirement for a “length of material considerably greater than the unicity distance.” Despite this implied prediction of failure, Melchior, backed by a Life expedition, went to Elsinore anyway. Cryptologists were not surprised when the team brought back an excellent picture story for the magazine—but no “first Hamlet edition.”
The concept of redundancy thus repeatedly demonstrates its power by bringing under a single broad generalization numerous cryptologic phenomena that had heretofore to be given individual explanations. Why are puzzle cryptograms harder to solve than ordinary messages? Previously, cryptanalysts could only say that it was because they used rarer and odder words; today they can invoke the wide-ranging principle of redundancy and point out that such cryptograms have a lower redundancy than the normal ones. Why have stereotyped expressions—“Reference your telegram of …”—so often helped cryptanalysts? Because they raise the redundancy to delightfully high levels. On the other hand, the use of codenames for places, operations, and so forth, within a plaintext lowers redundancy. As General Marcel Givierge wrote, “the fact that one expects to find Paris in a text will cause him to search for the letters and syllables in Paris and not that of the codename which replaces Paris.” Similarly, bisection of a message—cutting it in half and tacking the start onto the end—buries the frequently routine start of a message in the middle and brings the middle of a phrase to the head of the message. This substantially lowers the redundancy of that vulnerable point. Shannon’s information theory shows how to make cryptanalysis more difficult and tells how much ciphertext is needed to reach a valid solution. In all these ways it has contributed to a deeper understanding of cryptology.
Shannon has also viewed cryptology from a couple of other perspectives, which, while not as useful as information theory, are enlightening. The first, in fact, is a kind of corollary to the information-theory view.
“From the point of view of the cryptanalyst,” Shannon wrote, “a secrecy system is almost identical with a noisy communication system.” In information theory, the term “noise” has a special meaning. Noise is any unpredictable disturbance that creates transmission errors in any channel of communication. Examples are static on the radio, “snow” on a television screen, misprints, background chatter at a cocktail party, fog, a bad connection on the telephone, a foreign accent, perhaps even mental preconceptions. Shannon is suggesting that noise is analogous to encipherment. “The chief differences in the two cases,” he wrote, “are: first, that the operation of the enciphering transformation is generally of a more complex nature than the perturbing noise in a channel; and, second, the key for a secrecy system is usually chosen from a finite set of possibilities while the noise in a channel is more often continually introduced, in effect chosen from an infinite set.”
When Carl W. Helstrom, author of Statistical Theory of Signal Detection, was asked whether the techniques of isolating signals from noise had any relevance to cryptanalysis, he replied: “I suspect that the analogy between the enciphering rule of ‘key’ and random noise will not prove very fruitful. It seems to me more appropriate to regard the encipherment as a filtering of the original message to produce a transformed version. The ‘filter’ is a definite transformation rule, but the analyst doesn’t know what it is…. The problem is then to discover the transformation rule, or the nature of the filter, when given the statistics of the input and output. It is like finding the structure of an electrical filter by passing random noise through it and measuring the statistical distributions of the input and output voltages.”
Cryptology may also be regarded as a conflict in the sense employed in The Theory of Games and Economic Behavior by John Von Neumann and Oskar Morgenstern. As Shannon, who first made the allusion, puts it: “The situation between the cipher designer and cryptanalyst can be thought of as a ‘game’ of a very simple structure; a zero-sum two-person game with complete information, and just two ‘moves.’ [A zero-sum game is one in which one contestant’s advances are made at the expense of the other.] The cipher designer chooses a system for his ‘move.’ Then the cryptanalyst is informed of this choice and chooses a method of analysis. The
‘value’ of the play is the average work required to break a cryptogram in the system by the method chosen.”
Cryptology is, by definition, a social activity, and so it may be examined from a sociological point of view. It is secret communication, and communication is perhaps man’s most complex and varied activity. It encompasses not just words but gestures, facial expressions, tone of voice, even silence. A glance can express a tale more sweetly than a rhyme. Basically, all forms of communication are sets of agreements that certain sounds or signs or symbols shall stand for certain things. One must be a party to these preconcerted rules if one wants to communicate.
But all forms of communication are not at all times and all places known. Those who happen to know one system that others around them do not can use it for secret communication. Irish troops sent to the Congo as part of the United Nations force in 1960 spoke Gaelic over the radio, and the U.N. commander, General Carl von Horn of Sweden, called it the best code in the Congo. This is a kind of cryptography by default, depending upon a fortuitous ignorance—a defective cryptography. Effective cryptography deliberately establishes special rules of communication that deny information to those who would otherwise understand the messages.