by DAVID KAHN
New York was the world center of commercial code activity because commercial codes served mainly in cables between Europe and America. English was the language of most codes, not only because it has always been the language of commerce but because most messages went to America. To cross language barriers, some codes, such as Bentley’s and Lieber’s, were translated into other languages; some were bilingual. Marconi’s Wireless Telegraph Company Limited made a supreme effort in this field: its code, compiled by James C. H. Macbeth, a quiet, blue-eyed Scot in his early thirties who had become interested in codes while in business in Malaya, encompassed nine languages—English, Dutch, French, German, Italian, Japanese, Portuguese, Russian, and Spanish. Each of its four massive volumes contained three languages, one of which was always English. The eight other languages had indices referring to the place in the code, which was arranged according to the English word-sequence, where a particular expression would be found. It was to serve as a kind of automatic translator. An American would encode the word a or an as ABABA, and a Frenchman, receiving that codeword, would decode it as un or une. This sort of thing can be done with code because code operates upon linguistic entities. The idea stands in the line of great efforts to create a universal language, and in 1663, in fact, Athanasius Kircher compiled a Marconi-like code of 1,048 words from each of five languages, the coded version to serve as an international language.
The only one of these proposals that seems ever to have worked is the International Code of Signals. The 1857 British Board of Trade code was improved by a conference in Washington in 1889 and distributed to maritime powers in 1897, enabling a ship of one nation to hoist flags which would be read by a ship of another nation in its own language by virtue of a codebook in that language. The International Radiotelegraph Conference of Washington in 1927 agreed that two codes, one visual, one radio, should be compiled. The editorial committee assembled in London in October, 1928, and completed its work in December, 1930. Several nations published the codes in English, French, German, Italian, Japanese, Spanish, and Norwegian editions. The visual code employs colored flags—U is quartered red and white, G has vertical yellow and blue bars—to represent the letters of the codewords. One-letter codewords stand for urgent signals: G = I require a pilot; U = You are standing into danger. The same flags have the same meanings in the other languages. Two-letter signals are for distress and maneuvering (AP = I am aground), three-letter for words, phrases, and sentences, four-letter for geographical expressions and for the signal letters of ships. The radiotelegraph code uses five-letter groups. Both codes are universally employed.
The International Code of Signals has succeeded because it fills a need: mutually intelligible signals among crews speaking different languages are essential on the sea. But it faced no competition. Among the great variety of commercial codes, any of which could have filled the need for cutting cable tolls, why do some succeed and some fail? There appear to be two reasons, one intrinsic, the other extrinsic.
The extrinsic factor is the salesmanship of the compiler, and this often outweighs all else. The Acme Code succeeded commercially because its compiler, William J. Mitchel, was a convincing salesman, whereas the Universal Trade Code of Yardley and Mendelsohn, intrinsically about as good a code, never sold well because its compilers, busy with other matters, never pushed it. The intrinsic factor, or the quality of the code, refers primarily to its condensing power: how many plain-language words are represented by a single five-letter codeword. The later codes average a condensing power of between 5:1 and 10:1, which means that they reduce messages to one fifth or one tenth of their plain-language length. The ratio depends, of course, upon the vocabulary. How, then, is a vocabulary constructed?
“By reading telegrams,” said Mitchel, who has compiled not only the public Acme Code but also many private ones. The code compiler must read thousands of business telegrams to get the most-used phrases, which he writes out on slips of paper. These not only give him specific entries for his code, but also suggest others, in the manner described in the 1930s by John W. Hart-field:
I had a great mass of material accumulated from years past, different codes, and gleanings of suggestions made by different people and so forth. I took these and made notes of them on sheets of paper, writing phrases on sheets of paper. As I wrote phrases, other phrases suggested themselves and I interpolated those. I read the phrases and as I read them, other phrases suggested themselves and I wrote those. Then I rewrote them into alphabetical sequence, and as I rewrote them into alphabetical sequence other phrases suggested themselves, and those I interpolated. Then I went through this different data I had and made further additions, kept on enlarging various subjects. Some people suggested to me that the subjects in my 1905 book were not adequate and should be improved upon. These subjects I enlarged, amplified.
And so the books grew.
Larger codes usually have a greater condensing power than small ones because they can include many long phrases, some with 20 or 30 words. But more important than size is how well a code’s phraseology accords with business usage. Thus Cyrus Tibbals’ Western Union Code, whose 300,000 equivalents make it probably the largest commercial code ever compiled, did not afford as much economy as the 100,000-codeword Acme Code because its vocabulary was not as good. Business firms compared codes by using them in their cables to see which saved more money before investing several hundreds or thousands of dollars in buying scores of copies of a code for their offices around the world. Many companies had private codes compiled in which their products are listed in great detail. Though this may cost a large firm up to $50,000 (including printing), they soon recover that amount in their cable toll savings; they also get a dividend of secrecy, which is sometimes important.
Up to the mid-1920s, the codewords of a code did not affect its quality very much, since all included the two-letter differential. Then Mitchel introduced a new safeguard in his Acme Code: no codeword was included if it could be formed from an already existing codeword by the transposition of two adjacent letters. Thus, if the code included LABED, excluded would be ALBED, LBAED, LAEBD, and LABDE. Since such transpositions are not at all rare in communication, both code and plain, resulting usually from psychological rather than telegraphic slips, Mitchel’s idea spread rapidly.
To generate the enormous quantities of codewords needed, compilers used construction tables. For five-letter codes, these consisted of a square of single letters with two squares of letter-pairs adjoining it, one at the top and one at the side. The letter pairs were so chosen and arranged that all in a given column or row in a square differed by two letters from one another. To keep the codeword stock free of transpositions of adjacent letters, the squares must have an odd number of cells on each side. Since the normal alphabet has 26 letters, this can be done either by dropping one letter or by adding an extra character and then eliminating from the code stock all words formed with it; the latter procedure, which saves the 26th letter, naturally produces more codewords. A miniature codeword construction table, based on the six-letter alphabet, A, B, C, D, E, and F, with † as the extra character, can demonstrate the procedure:
To construct codewords, the compiler takes two elements from the same column and two elements from the same row, with the single letter at the pivot of the column and row. Thus, the codeword series would run AAAAA, AAABB, AAACC, … AAA††, AABBA, AABCB…, AA†F†, ABBAA, ABBBB…, ABB††, ABCBA…, A†FF†, BBBBA, BBBCB, BBBDC…. These words all show a 2-letter difference and exclude alternate-letter transpositions. The number of 5-letter codewords using a 26-letter alphabet showing a simple 2-letter differential is 264, or 456,976. The alternate-letter restriction lowers this to 440,051 codewords constructed with a 27-character alphabet, or 390,625 if constructed with a 25-letter alphabet. These are theoretical maximums, however, and although some cryptologists, notably Friedman, Mendelsohn, and Schauffler, have used mathematics to examine the best ways of constructing stocks of codewords, “most codemakers,” Sch
auffler has written, “are pure empiricists” and “many an inelegant solution” robs them of usable codewords. But what deprived the code compilers of the greatest number of codewords was the International Telegraph Union rule that the words be pronounceable. This slashed the number available from about 400,000 to about 100,000.
The pronounceability rule consequently became increasingly unpopular during the code-boom period of the early 1920s. It restricted the size of codes when they were bursting at the seams. It caused innumerable arguments at the telegraph counters. It engendered disputes between the cable companies and the governmental telegraph administrations. So the 1925 Paris conference of the International Telegraph Union sent the entire codeword question to a special 15-delegate committee, which met for a month in 1926 at the resort town of Cortina d’Ampezzo, Italy. It scrutinized the answers to questionnaires it had sent out, read the comments submitted to it by operators and users and code compilers, discussed the problem, and decided (all but the British delegation) to recommend to the next conference that “Code words must be formed of a maximum of five letters, chosen at the will of the sender, without any condition.” But the 1928 Brussels conference ignored this recommendation. It sought instead to quantify pronounceability by requiring that all codewords of ten letters have at least three vowels. The rule ran into strong opposition, and finally, at Madrid in 1932, what had become the International Telecommunications Union at long last abandoned any effort to legislate the nature of codewords and acceded, in effect, to the Cortina proposal. Much of the rationale for pronounceability was dissolving with the introduction of teletypewriters into the cable circuits. The sound of the codewords may have mattered to the Morse-code operators who listened to the signals of the Morse sounder; it did not matter to touch typists. What did matter was that the codewords became five letters long instead of ten. The teletypist could now take in and remember a word at a single glance, which he could not do with the artificial ten-letter words, even if they were pronounceable, without a fair proportion of errors. The new regulations thus speeded transmission and reduced errors.
Simultaneously, the number of permissible codewords bounded upward. This did not mean much in most public codes, where codes of 50,000 to 100,000 elements are the largest practicable, since beyond that size no code clerk takes the time to search out the most precise and economical phrase. But in private codes the many new codewords were very advantageously employed. When Ernest F. Peterson revised a cash register company’s 100,000-word code, he found that 1,000 words in the old code, from KAJAN to KUTAZ, conveyed shipping instructions. Thus, KUBOR meant We are shipping to you, in care of your agent at Shanghai. The description of the machine had to go into the next codeword. Taking advantage of the new wealth of codewords, Peterson combined each of the 10,000 shipping instructions with each of the firm’s 200 models of cash registers, and assigned each a codeword. This used 200,000 words, or twice as many as the old code had had altogether. But it saved a cable word, and when Peterson finished making similar changes elsewhere, the code could express common transactions that had formerly required four five-letter codewords in just two, greatly lowering the firm’s cable bill. Similarly, he expanded a bank’s code from 100,000 to 400,000 words.
Such savings were important in the Depression, and commercial codes were widely used—though the code compilers suffered as much from the economic slump as the rest of the business world. World War II, whose numerous national censorships frowned on codes and limited the number of permitted ones, dealt the code business a severe blow. And after the war, the rising cost of labor dealt it a mortal one. It often cost more to have a clerk code a message than the coding would save in cable tolls. At the same time, the greater ease of international communications militated against the use of codes. Sending a cable message once involved a mystique of writing it out on a blank in telegraphic English and having a messenger take it down to the cable office, a dramatic place where men could touch a key and make something go “click” in Europe, a week away by boat. Codes and coding were part of this mystique. But when business firms installed teletypewriters that could be linked directly to the cablehead, or even to a firm’s European branch, it became simpler just to sit down at the keyboard and type out the message without bothering with the whole rigamarole of coding. Transatlantic telephone calls and letters by jet, which leave London one day and arrive in New York the next, stole business from the cables and reduced the need for codes.
At the same time, the march of progress was making codes less and less useful. For a code once compiled does not retain its value forever. A code reflects the world at a particular instant, and as the world moves on it out-modes the code. New products, new ways of doing things, new political or economic facts begin to make its vocabulary old-fashioned. No codes compiled in the 1920s or 1930s had any phrases referring to transatlantic air travel, yet cable traffic today is replete with such references. Ironically, the better a code is at the moment of its compilation, the more closely its vocabulary fits the business requirements of its time, the more rapidly will it obsolesce. Of course many phrases will remain viable, but the lack of many badly needed phrases renders the code as a whole almost useless. Why bother to encode at all if half the message has to be sent in plain anyway?
Thus the use of code fell off drastically after World War II. Many companies resorted to code only when they needed a modicum of secrecy—a return, at the moment of the commercial code’s death, to the motive advanced as its main reason for being at the moment of its birth. Today only commodity exchanges use commercial codes extensively (for economy, not secrecy). Old codes are still reprinted and sold, but the printings have dropped in size. Only a handful of commercial codes—probably all private—were compiled in the 1950s, and it is almost certain that since 1960 not one has been. There is today not a single practicing code compiler in the United States, and probably not one in the world.
Even an injection of the wonder drug of modern business—the electronic computer—failed to stem the decline. Robert W. Bemer of I.B.M. proposed placing a business vocabulary in a computer memory and assigning digital “codewords” to its words and phrases on the basis of frequency—shorter groups of digits for the common phrases, longer groups for the less used ones. The computer would automatically encode the message. Bemer called the idea “digital shorthand” and found that it would compress a message to one third its normal length, thereby in effect tripling the capacity of a communications link. But though the method was technically feasible, economically it never got off the ground, and the code business remained moribund.
The rise and fall of an industry is not a new story in the history of the world. As a business, the making of nonsecret codes is as dead as armor making or buggy-whip making. Did it have any aftereffect on civilization, after fulfilling its function of helping that civilization advance? Did it leave anything beyond hundreds of dusty tomes filled with outmoded references to ships being coaled and defunct names like St. Petersburg, and some lessons in codeword construction? There is one thing that may be distilled from any human experience because it represents the universal, and that is art. Commercial codemaking stimulated the best humor in cryptology—a small contribution to the world’s store of art, but one that gives lasting pleasure nonetheless. The author, Jack Littlefield, offered some “Melancholy Notes on a Cablegram Code Book” to the readers of the July 28, 1934, issue of The New Yorker*—the code in question being the Acme.
Every time I receive a cablegram in code, I have the same feeling of pleasure-able excitement. There is the familiar envelope lying on my desk, marked “Cablegram: Urgent.” 1 rip it open and discover inside the single mysterious word BIINC. The message is from our Venezuela office. Visions at once loom of secret documents, beautiful women, and dark Latin-American intrigue. Then I turn to my code book and find BIINC: What appliances have you for lifting heavy machinery? This sort of thing can be very debilitating.
It is not the fault of the code book, either. That ha
ndy volume is full of interesting messages that my correspondents never seem to get around to sending. For years 1 have been on the watch for wires like NARVO (Do not part with the documents), OBNYX (Escape at once), ARPUK (The person is an adventurer, have nothing to do with him), or BUKSI (Avoid arrest if possible), but they never seem to arrive. And yet, if the code book is to be believed, they are fair samples of the kind of thing with which our telegraph wires are humming daily.
Not all the code-book suggestions, of course, are on this high level of adventure. Our telegraph-users, it would seem, have a wide range of concerns. At this very moment a perplexed customer in some distant part of the globe is inquiring URPXO (For what use was the mixing machine intended?); in the next town, perhaps a ship’s captain is reporting diffidently ELJAZ (Will have to get bottom examined before proceeding); while somewhere anew parent is voicing his elation in the form of AROJD (Please advertise the birth of twins).
The dominating note of the code book, however, is one of resigned melancholy. Its pages are replete with such gloomy sentiments as ZULAR (Unfortunately too true) and CULKE (Bad as possibly can be), expressions that seem only too justified when we consider the extraordinary series of disasters that has been stored up for users of the code. Every possible variety of mishap has been foreseen and embalmed in a group of doleful entries ranging from the comparatively trivial AIBUK, which describes the bursting of a donkey boiler, to the truly cataclysmic PYTUO (Collided with an iceberg). Even the usually trustworthy mail and express services share in the general debacle. Our very letters, it is predicted, will be unreadable, the writing having been obliterated by water (SKAAE); and shipments will inevitably arrive in clammy condition (HEHST). It is all very sad.
Nor is the code book a volume to be recommended for shipboard reading. Never frolicsome, it is at its gloomy best when describing sea accidents. This it does not only with gusto but with an unpleasantly convincing eye for detail. Listings like LYADI, for example {Arrived here with decks swept, boats and funnel carried away, cargo shifted, having encountered a hurricane), are just circumstantial enough to be a trifle discommoding to the ocean-traveller. And when, a few pages farther on, he encounters the still more ominous UZSHY (Body now lies in the mortuary), he cannot help feeling an awful assurance as to the identity of the corpse in question.