Book Read Free

Code Warriors

Page 40

by Stephen Budiansky


  L E T T E R X

  14 3 11 6 17 13 10

  X B I N N E N

  (Binnen is Dutch for “within.”) The break was then extended in each direction by guessing additional words that followed or preceded the ones already discovered, revealing additional plaintext in the paired message.1

  The index of coincidence test (appendix E) offered a trial-and-error method for locating two messages in depth by sliding a set of intercepted messages through every possible relative position. But for a more general solution, it was necessary to reproduce the sequences of key the machine generated so that any message could be broken, not just pairs found to be in depth. The Hagelin used a set of wheels with movable pins to generate the key sequence, and U.S. Army cryptanalysts devised an elaborate procedure to derive the pinwheel settings from the key strings they recovered. (The key strings could be obtained, once the plaintext of a message in depth had been read, simply by adding the recovered plaintext to the cipher text; since cipher = key – plain, therefore key = cipher + plain.) That in turn allowed them to break the indicator system that told the intended recipient of a message what pinwheel settings to use; with the indicator system cracked, the codebreakers could then read every subsequent message directly, without having to rely on finding further depths.2

  APPENDIX D

  Bayesian Probability, Turing, and the Deciban

  The tests that cryptanalysts had traditionally relied upon for placing messages in depth or setting them against known key, such as searching for double hits, subtracting hypothetical key and examining the result for high-frequency code groups, or measuring the index of coincidence between two streams of polyalphabetic cipher text (appendix E), were all based on probability: it was not that such coincidences as double hits were impossible to occur by random chance between two strings of unrelated cipher text, just that they were unlikely to. The problem was that once cryptanalysts began to use machine methods to make vast numbers of comparisons, the number of false alarms rose as well: a common problem in early computer-aided cryptanalytic runs was the absence of more sophisticated statistical tests to determine which hits were real and which were just the product of chance.

  Alan Turing, among his other contributions to cryptanalysis, developed a method of enduring utility for assessing the odds of whether any given discovered coincidence was “causal” or accidental. The immediate problem Turing was working on involved a method he had developed to reduce the number of tests required to recover the daily key of the naval Enigma. The first step required discovering the relative rotor starting positions of a large number of Enigma messages, which the analysts attempted to do by locating repetitions between two messages where it looked like the same frequently occurring letters or words (often the ubiquitous eins) had been enciphered with the same key, allowing them to be placed in proper depth. The method involved sliding punched sheets of messages one against the other and scanning by eye for repetitions of single letters, bigrams, trigrams, tetragrams, and so forth. (The task was “not easy enough to be trivial, but not difficult enough to cause a nervous breakdown,” in the words of Turing’s colleague Jack Good.)1

  The chance of any given repeat occurring by chance alone is a basic calculation in probability, akin to the odds of drawing the same two or three or four letters from a hat containing all letters of the alphabet on two successive draws. But Turing noted that this information alone was little help in judging whether a trial alignment of the two messages containing such a repetition was correct or not, given the vastly greater number of “wrong” alignments that are possible for every causal one. What one really wanted to know was if given, say, the discovery of one tetragram, two bigrams, and fifteen single letters lining up between two messages, what was the “weight of evidence” in favor of the hypothesis that this was real versus a false alarm? Each one of these coincidences was, a priori, unlikely to be the product of random chance, but with thousands of pairs of messages and hundreds of places in each message where a repeat could occur, the odds grew that such a random hit would occur. (As Good observed, the phenomenon is akin to the famous “birthday problem” of probability theory. Although the odds of any two people having the same birthday is only 1 in 365, if you have 23 people in a room it is odds-on that two will have the same birthday, because of the much larger number of different pairwise comparisons that are possible with that number of people: 22 + 21 + 20 + 19 +…+ 3 + 2 + 1 = 253.)

  Turing’s “factor in favor of a hypothesis” was an application of what is more generally known as Bayesian probability, a method to assess from available evidence which of two competing hypotheses is more likely to be supported. By taking into account the greater number of noncausal comparisons that could be made for every correct one, Turing and Good calculated the odds in favor of any given repetition being causal for each kind of repeat. The total odds were obtained by multiplying together the odds of each individual observed coincidence (a basic law of probability: for example, the odds of two coin tosses coming up heads is the odds of each event multiplied together, ½ × ½ = ¼). To simplify the calculation, Turing expressed the odds as logarithms; adding the logarithms of two numbers together is the same as multiplying the numbers themselves. Turing dubbed his unit for the logarithm of weight of evidence the “ban,” an insider-joke reference to the punched paper sheets used in the naval Enigma problem, which were printed in the nearby town of Banbury. One ban equals odds of 10 to 1 in favor; the “deciban,” which turned out to be a more practical unit, is equal to a tenth of a ban. A rule of thumb for the Bletchley codebreakers was that odds of 50 to 1 in favor, which equals 1.7 bans or 17 decibans, was a virtual certainty of a comparison being causal.2

  The power of the method was that it is completely general, applicable to an array of problems in cryptanalysis. Max Newman, the mathematician at Bletchley Park who led the attack on the German teleprinter ciphers, considered Turing’s concept “his greatest intellectual contribution during the war”; GCHQ and NSA cryptanalysts still use the ban, an enduring tribute to Turing’s legacy to the statistical analysis of cipher problems.3

  APPENDIX E

  The Index of Coincidence

  In a polyalphabetic cipher, such as that generated by a rotor machine, the substitution alphabet changes with each successive letter of the message. The resulting stream of cipher text has a flat distribution of letter frequencies: each letter of the alphabet has the same 1/26 (or 3.8 percent) chance of appearing at any given position of polyalphabetic cipher text. The standard cryptanalyst’s trick for solving a simple monoalphabetic substitution cipher—counting the frequency of each letter and assigning the most commonly occurring ones to corresponding high-frequency letters of the underlying language (such as E, T, A, O, I, N, and S in English)—is thus of no help against a string of polyalphabetic cipher.

  William Friedman in 1922 developed one of the most fundamental statistical tools to overcome this obstacle and successfully break a polyalphabetic cipher. If any two randomly chosen strings of letters are placed side by side, the chances that any specified letter (say, A) will appear in the same position in both texts is 1/26 × 1/26, or 0.15 percent; the chance that any pair of identical letters (A and A, or B and B, or C and C, or so forth) will occur is 26 times that, or 3.8 percent.

  Two strings of plaintext, however, will on average show a much greater number of coincidences when placed side by side. This is because the high-frequency letters such as E, T, and A greatly increase the chances of the same letter appearing in the two strings in the same place. For example the plain-language frequency of E, the most common letter in English, is 12 percent; therefore the odds of an E occurring simultaneously at any given point in two strings of plaintext is 12/100 × 12/100, or 1.4 percent, almost ten times the chances of a given coincidence in random letter strings. The chance of two Z’s occurring in the same position is obviously much lower (Z’s occur 0.3 percent of the time in English plaintext, thus the odds of two of them occurring simultaneously at any giv
en location in two strings of plaintext are 0.3/100 × 0.3/100, or 0.0009 percent), but the high-frequency letters so outweigh the low-frequency letters as to elevate the total odds: summing the plaintext coincidence probabilities for all twenty-six letters of the alphabet yields a total probability of about 6.7 percent for a coincidence, nearly double the 3.8 percent of the random situation.

  Friedman called this test—the number of such coincidences that occur in two strings of text divided by the total number of letters—the index of coincidence.1 For example, in these two random sixty-letter strings:

  an actual count finds that there are two coincidences, yielding an index of coincidence of 2/60, or 3.3 percent; while in these two sixty-letter strings of plaintext:

  there are four coincidences, for an index value of 4/60, or 6.6 percent.

  Friedman’s striking insight was that if two texts had been enciphered with the same sequence of polyalphabetic key—that is, if they are in depth—this same nonrandom unevenness in the underlying plaintext persists as a ghostly remnant that can be detected by the index of coincidence test. At any given position of two messages that are in depth, each has been enciphered with the same monoalphabetic substitution; thus if, say, E has been enciphered as B at one position, the chance of a B occurring simultaneously in both cipher texts at that particular spot shares the same elevated probability of the E’s in the underlying plaintext. To put it another way, any coincidence that occurs in the underlying plaintexts will also occur at the same place in their corresponding polyalphabetic cipher texts if they are in depth: the identities of the letters have been altered, but the coincidences remain.

  To locate a depth, two streams of cipher text can thus be “slid” past each other in every possible relative position and the index of coincidence calculated for each trial alignment; the index will suddenly jump from a value closer to the random 3.8 percent to the 6.7 percent of plaintext when they are properly aligned in a true depth. For example, the two plaintext previous sequences above, when each enciphered by an Enigma machine at the same chosen setting, yield the following two cipher texts, which preserve the coincidences of the two underlying plaintexts in the same positions:

  thus yielding the same index of coincidence of 6.6 percent. But if they are misaligned, for example shifted one letter off:

  the number of coincidences falls sharply (in this case to one, for an index value of 1.7 percent).

  This simple test was the principle behind many of the electromechanical, and then electronic, cryptanalytic machines developed in the 1940s and later to locate depths, which could then lead to further exploitation of a target cipher system.

  NOTES

  Full references to the sources cited in shortened form in the notes below are found in the bibliography. Frequently cited works and document collections are identified by the following abbreviations:

  AC Johnson, American Cryptology During the Cold War

  CCH NSA Center for Cryptologic History, Web

  CIAL Central Intelligence Agency Library, Web

  CW Gaddis, Cold War

  DNSArch Digital National Security Archive, George Washington University, Washington, DC

  FRUS Foreign Relations of the United States

  HCR Historic Cryptographic Records, NARA

  HV Benson and Phillips, History of VENONA

  NARA Record Group 457, National Archives and Records Administration, College Park, MD

  NCM National Cryptologic Museum Library, Ft. George G. Meade, MD

  NSAD NSA Declassification and Transparency, Web

  NSA60 NSA 60th Anniversary, Web

  NYT New York Times

  OH oral history

  TArch TICOM Archive, Web

  TNA UK National Archives, Kew, UK

  WM Burke, Wasn’t All Magic

  AUTHOR’S NOTE

  1. “Edward Snowden: The Whistleblower Behind the NSA Surveillance Revelations,” Guardian, June 11, 2013; Fred Kaplan, “Why Snowden Won’t (and Shouldn’t) Get Clemency,” Slate, January 3, 2014; “Snowden Persuaded Other NSA Workers to Give Up Passwords,” Reuters, November 7, 2013.

  2. “Edward Snowden,” Guardian, June 11, 2013.

  3. “A Guide to What We Now Know About the NSA’s Dragnet Searches of Your Communications,” American Civil Liberties Union, August 9, 2013, Web; “Are They Allowed to Do That? A Breakdown of Government Surveillance Programs,” Brennan Center for Justice, New York University School of Law, July 15, 2013, Web; “N.S.A. Said to Search Content of Messages to and from U.S.,” NYT, August 8, 2013.

  4. “SIGINT Enabling Project,” Excerpt from 2013 Intelligence Budget Request, “Secret Documents Reveal N.S.A. Campaign Against Encryption,” NYT, September 5, 2013; “Carnegie Mellon Researcher Warns US Officials of Dangers of Weakening Cybersecurity to Facilitate Government Surveillance,” Press Release, Carnegie Mellon University, November 4, 2013, Web.

  5. Abelson et al., Keys Under Doormats, 2–3.

  6. United States District Court for the District of Columbia, Memorandum Opinion, Klayman et al. v. Obama et al., December 16, 2013. In a separate case, a U.S. appellate court in May 2015 found the bulk calling data collection program to be a violation of the law, holding that its “all-encompassing” nature went far beyond what Congress had authorized under the 2001 USA Patriot Act, which allowed the government to seek secret orders from the FISC for production of records related to an “authorized investigation” of foreign espionage or international terrorism: United States Court of Appeals for the Second Circuit, ACLU v. Clapper, May 7, 2015.

  7. United States Foreign Intelligence Surveillance Court, Opinion and Order, In Re Orders of this Court Interpreting Section 215 of the Patriot Act, September 13, 2013; “House Votes to End N.S.A.’s Bulk Phone Data Collection,” NYT, May 13, 2015; “U.S. Surveillance in Place Since 9/11 Is Sharply Limited,” NYT, June 2, 2015.

  8. “Edward Snowden,” Guardian, June 11, 2013; Greenwald, No Place to Hide, 177, 196, 227.

  9. Testimony of NSA Deputy Director John “Chris” Inglis before the House Intelligence Committee, June 18, 2013, and his speech to the National Cryptologic Museum Foundation, October 15, 2014; “Ex–CIA Director: Snowden Should Be ‘Hanged’ If Convicted for Treason,” Fox News Politics, December 17, 2013, Web.

  10. “Udall, Wyden Call on National Security Agency Director to Clarify Comments on Effectiveness of Phone Data Collection Program,” Press Release, Office of Sen. Ron Wyden, June 13, 2013; Bruce Schneier, “How the NSA Threatens National Security,” January 6, 2014, Web.

  11. McConnell, “Future of SIGINT,” 42.

  PROLOGUE: “A CATALOGUE OF DISASTERS”

  1. Höhne and Zolling, General Was a Spy, 160–63; Bower, Red Web, 130–31; Hess, “Hans Helmut Klose.”

  2. Bower, Red Web, 143–47.

  3. Andrew and Gordievsky, KGB, 385; Bower, Red Web, 154, 161.

  4. Bower, Red Web, 145, 152–53, 214.

  5. Andrew, Secret Service, 343, 438; Macintyre, Spy Among Friends, 52.

  6. Bower, Red Web, 204–5.

  7. Ibid., 205, 213; Bower, Perfect English Spy, 207.

  8. Bower, Perfect English Spy, 204–7; Andrew and Gordievsky, KGB, 387–89; Aid, “Cold War,” 30–31.

  9. Macintyre, Spy Among Friends, 141; Bethell, Great Betrayal, 145, 165, 174–75.

  10. Smith, New Cloak, 114–15; Aid, “Cold War,” 31.

  11. FRUS, Intelligence Community, 1950–55, 695n3.

  12. Kistiakowsky quoted in Aid, “Cold War,” 31.

  1 THE RUSSIAN PROBLEM

  1. At the end of World War II, the Army Signal Intelligence Service had 7,848 men and women working at its headquarters in Arlington, Virginia; the Washington complement of the Navy’s Communications Intelligence Section, Op-20-G, numbered 5,114: Achievements of the Signal Security Agency in World War II, SRH-349, Studies on Cryptology, NARA, 3; Survey of Op-20-G, Naval Inspector General, July 13, 1945, DNSArch, 16.

  2. Budiansky, “Cecil Phillips,” 98–99; HV, 40n16.
r />   3. HV, 8, 19.

  4. U.S. Army, “Arlington Hall”; Alvarez, Secret Messages, 122–23; HV, 20n34.

  5. Achievements of the Signal Security Agency in World War II, SRH-349, Studies on Cryptology, NARA, 3; Treadwell, Women’s Army Corps, 316; Historical Review of Op-20-G, SRH-152, Studies on Cryptology, NARA.

  6. Budiansky, Battle of Wits, 225, 262.

  7. Standard #530 Bombe, Tentative Brief Descriptions of Cryptanalytic Equipment for Enigma Problems, NR 4645, HCR; Wilcox, Solving Enigma, 32, 54.

  8. For an explanation of Turing’s method and the design of the bombes, see Budiansky, Battle of Wits, 127–31.

  9. Survey of Op-20-G, Naval Inspector General, July 13, 1945, DNSArch, 5–7.

  10. Budiansky, Battle of Wits, 243.

  11. Ibid.; Achievements of the Signal Security Agency in World War II, SRH-349, Studies on Cryptology, NARA, 3; Benson, U.S. Communications Intelligence, 67, 85.

  12. History of Cryptanalysis of Japanese Army Codes, NR 3072, HCR, 13.

  13. Geoffrey Stevens, September 28, 1942, HW 14/53, TNA.

  14. HV, 36.

  15. HV, 17–18, 26, 33; Rowlettt, Story of MAGIC, 153; Budiansky, Battle of Wits, 163, 218, 321; “Frank W. Lewis, Master of the Cryptic Crossword, Dies at 98,” NYT, December 3, 2010.

  16. Alvarez, Secret Messages, 152, 154, 156–57; Alvarez, “No Immunity,” 22–23.

  17. Callimahos, “Legendary William Friedman,” 9, 13; Deavours and Kruh, Machine Cryptography, 47.

  18. Alvarez, Secret Messages, 155.

  19. Hinsley et al., British Intelligence, I:199n; HV, 9.

  20. Andrew, Secret Service, 268–69, 293–93, 296.

  21. Geoffrey Stevens to London, Appendix, August 1, 1942, HW 14/47, TNA.

  22. Peterson, “Before BOURBON,” 10.

 

‹ Prev