by John Simpson
Her doctors told us that even if researchers did find a cure for her condition, it was too late for her—she was already living with it, and cures only worked before the condition became embedded. So she had to live with this, as did we.
In those days a strange thing was happening in the world of lexicography: research was moving online. Since time immemorial we had collected our data on index cards, from things called books. Back in 1989, when the Second Edition of the dictionary was published, we started collecting data (still from printed sources) on computer, and then filing our discoveries in what became a huge electronic database. As the 1990s took shape, things changed again, and we started to notice the possibilities that the World Wide Web and the Internet offered us from the perspective of routine historical dictionary research and publishing.
For a hundred years the dictionary staff had researched words on foot. Editors swept around the departmental library, and researchers padded softly around the wooden floors of national research libraries, in search of historical evidence for the language. But the earth was shifting. The Second Edition of the OED had been available on CD from 1992, and scholars were growing used to being able to search for information in the dictionary other than by “dictionary word.” The OED’s editors were themselves becoming more familiar with this type of research just by having the searchable OED on their desktop computer. We soon discovered that the mass of historical text in the dictionary itself made it a remarkable research resource when we were searching for material on the history of words. To our amazement, we found hundreds of new first uses there, hidden away in other entries, and unlockable before digitalisation. Just one example: we found our first reference to militia as a locally organised fighting force of civilians (not professional soldiers) tucked away in a quotation for folkmoot (a local general assembly of the people): “Commanders of the Militia in every County were elected . . . in a full Falkmoth” (1642).
There had been signs of activity in the real world of electronic text in the late 1980s and early 1990s, and the OED’s editors were very early adopters of this trend. In the 1980s, editors had been accessing the Lexis/Nexis databases of American newspaper text and British legal material. The access system was quite primitive at first, and it was expensive to use, but we probed it cautiously. Results had to be printed on continuous form paper and chopped up for the editor’s perusal.
The Chadwyck-Healey English Poetry Database appeared on CD in 1992. That helped to excite people about the possibilities of text searching for research, but didn’t shake our bones too much. The database offered access to an enormous mass of poetry written in English over the centuries. But adding quotations from poetry was not core business for the OED.
By 1995 we were using the huge Making of America (MoA) database, a collection of early American books and magazines published online—originally from texts held at the University of Michigan. This was one of the first online reference resources we picked up on as reference works began their shift away from the old-style CD format. We searched and searched this database, always surprised at how much new information we found about the OED’s words from the nineteenth and twentieth centuries: one first attestation out of hundreds we discovered (or recovered) was—somewhat surprisingly—for magnetic compass, from Michigan’s Biblical Repertory of 1838. You never know where you will find useful material.
Over the course of the late 1990s, more and more databases became publicly available online, often through huge grant-funded agencies in America. We worried that American money was going to make American text more accessible than British, Australian, and other regional varieties of English, and for a time that was certainly the case. I worried that human library research, too, would become a thing of the past. But our library researchers hung on. It turned out that although we could extract remarkable swaths of historical material from the electronic databases, we sometimes needed the raw brainpower and ingenuity of a researcher to track a problem right down to earth: there was room for both research techniques.
As we became more familiar with this new research technology, we appointed research assistants to work exclusively on finding useful online material, so that it was already in place when an editor came to work on a word. Milestones included the availability of the Chadwyck-Healey Literature Online (LION) database in 1998, and the extraordinary Early English Books Online (EEBO), a collaboration of a consortium of universities including Michigan and Oxford in 1999. EEBO gradually made more and more books and pamphlets from the beginning of printing in the late 1400s until 1700 searchable or readable in facsimile online. It has been a stunning resource for historians. One example amongst thousands, again, was the new first example of radiancy (the fact of being radiant), from Michael Drayton’s Endimion and Phoebe, published in 1595: “Her richest Globe shee gloriously displayes, / Now that the Sun had hid his golden rayes: / Least that his radiencie should her suppresse.” How did the original Victorian readers for the OED miss that? Were they dreaming? But it is far harder to recognise first uses than you might think. Google Books appeared in 2004, offering lexicographers new worlds of words to search through; Britain fought back in 2007 with the British Library Nineteenth-Century Newspaper Collection: our first use of Red Bank oyster (found off County Clare) now comes from the Dublin Freeman’s Journal, thanks to the British Library collection.
Did we trust these sources? That was tricky. Most of the early texts were scanned on to computer, and the scanning software often found old typography hard to understand (you’ll remember that the OED had been keyed on to computer, then proofread, and so was a virtually faultless digitised representation of the original). To overcome scanner problems, databases soon started offering links to facsimile versions of the documents they had scanned. That was good—you could rely on facsimiles you could see. But however good your search software is, you cannot search successfully for something that has been garbled in the first place when scanned. It’s still a problem with some sites.
Did this online material let us work faster? No—the sheer quantity meant that we had to apply pragmatic rules to the number of databases we accessed, or risk being swamped. That was parallel to the old rule from the regime of library research which said that you could spend twice as much time on a word, but only improve your research by 1 percent—so at some point you hit steeply diminishing returns and had to call a sharp halt.
What happened to all of that material collected for well over a century on index cards? Well, we operated a hybrid system. Editors had the old cards on their desks alongside the computers as they searched the new online data.
I used to ask myself whether we were in danger of being mesmerised by these treasure troves of historical quotations, to the extent that we would begin to neglect our definitions and etymologies in the search for better and better documentation. But better documentary evidence worked the other way—every aspect of each entry received a boost from this new data. The dictionary’s traditional “readers” have largely survived the transition, too. They find earlier and better uses of terms which computers can’t (in the jargon) disambiguate—distinguish from other similar uses. A human reader can understand nuance in a way that the computer cannot always recognise.
Most of the entries we researched benefited from the data accessible through the Internet. By the time we revised the entry for hotdog, in 2008, those incremental Internet discoveries had changed the face of our understanding of the sausage.
There was no entry for hotdog in the original edition of the OED. If there had been, it would have been published in 1899 in the letter H. In fact, the expression did exist—as we now know—at that time, but only on the streets and in the colleges of North America, a long way from the collection bowls of the OED researchers. And in 1899, hotdog was only just starting to make its presence felt.
It wasn’t until 1976, a couple of months after I joined up with the OED, that it crept into the second volume of the Supplement to the OED (no thanks to me). In those days,
we, or at least my immediate predecessors, were firmly of the view that the word hotdog had three meanings. The earliest, dating (it was thought) from 1896, was “someone who is skilled or proficient in some activity” (someone, presumably, who was pretty “hot” at it). The second meaning, chronologically, was the sausage in a roll (recorded since 1900). The final sense was some kind of surfboard smaller than a “big gun,” known from the 1960s. Beyond that, the entry didn’t seem to offer much in the way of a coherent explanation for the word’s chronology or semantic development.
When we came to update the hotdog entry in 2008, the Internet caused us to reconsider it from the ground up. We knew (because we subscribed to his journal) that an American scholar had been busy publishing earlier and yet earlier examples of hotdog for several years, and of late most of his discoveries had been made online. These discoveries would have a big effect on our entry. Other online researchers had been at work on the word too. Earlier examples than the ones the OED knew about had been discovered, for the most part, in old issues of local American newspapers of the sort that had been pouring on to the Internet in recent years, for those who knew how to find them.
What became clear was that the OED had got it wrong in 1976. Online historical research into the hotdog revealed that the “sausage” sense was really top dog. Constant research for hotdogs in earlier American newspapers by editors and researchers throughout the world had resulted in an example being found as early as 1884, from an Indiana newspaper called the Evansville Daily Courier. (I wondered whether the people in Evansville knew about this, whether they were proud of it, and whether they were desperately hoping that someone wouldn’t find an earlier example in a newspaper from, say, Des Moines or Milwaukee.)
Worse still, once our lexicographers had looked closely at the earliest recorded examples of hotdog in our possession, they realised that in these early examples the word didn’t mean “a frankfurter” or “a sausage laid to rest in a bread roll,” but “sausage-meat.” Here’s the Evansville reference: “Even the innocent ‘wienerworst’ man will be barred from dispensing hot dog on the street corner.” That’s “sausage-meat,” however it is dressed up, even if it is dressed up as a hotdog. If you don’t see a difference, that’s one of the reasons you’re not working on the OED. One is an individual thing (a sausage) and the other is a type (sausage, or sausage-meat).
Editors’ ears were pricking up, as we would now have official sanction to break the meaning into two. We had identified the collective “sausage-meat” meaning from 1884, and the “frankfurter” meaning from a few years later (not unreasonably) in 1892. For this second meaning, the Internet had provided us with a reference from the Paterson Daily Press in New Jersey: “The ‘hot dog’ was quickly inserted in a gash in a roll, a dash of mustard also splashed on to the ‘dog’ with a piece of flat whittled stick, and the order was fulfilled.” (Surprisingly, I haven’t heard that the inhabitants of Paterson have been out on the streets celebrating, but there’s still time.) One more thing: some jokesters suggested that hotdog first meant “sausage-meat” because it was sarcastically (I hope) first said to have consisted of warmed-up dog meat. You can decide whether you believe that. Pass the sauce.
This earth-shattering news about the sausage-type hotdog is bad news for the “skilled or proficient” meaning. That is bumped down to second place, dating now from 1894. This time, the reference—captured online—is from a University of Michigan “humour magazine” called Wrinkle. I’m sad to report that nothing much has happened to the surfboard sense; it is still bringing up the rear from the 1960s.
Access to the Internet, by dictionary editors and by any other interested party, could have profound effects on any one of the OED’s entries. That made working on each entry even more of a challenge than it had been before, but it also meant that the rewards were far greater. There was such a clatter of text available for analysis that you could almost feel that you were with those eighteenth-century ladies discussing their hair-styles, or inching forward with the Cavalier forces during the English Civil War. You always had additional context to whatever you were defining, and sometimes it seemed as if you were caught in the middle of a novel whose storyline involved the tracking and capturing and unmasking of your word. It was a race against time, too. If you missed a clear antedating, and the entry was published online, then someone else would pick it up and email the dictionary. You didn’t want that to happen if you could help it. On the other hand, that was a safety net: the dictionary was dynamic, and it would change like the language whether you wanted it to or not.
The OED moved online in several stages. It would have been enough of a change if the dictionary had only benefited from the availability of the astounding pools of raw, historical materials that these new databases brought us in the 1990s and beyond. But that was not all. Right at the time when those first databases of online text were becoming available in 1993—and the OED’s new North American reading programme was up and running efficiently—my colleague Jeffery Triggs, in New Jersey, was able to turn his attention to other things, and in this case it was how to put the OED on the Internet.
In Oxford in 1993 the suggestion that the OED might migrate away from being at heart a printed book would have been sacrilege. There are lots of things that Oxford University and the University Press (beyond the dictionary) were extremely good at, but at that time, foreseeing the value of the Internet was not at the top of the list. Jeffery, however, realised that it would be possible to write software to publish the OED as a fully searchable online database. He started coding, and he liked coding. After a few weeks he had thousands of lines of code, and he told me what he was planning. We had the OED on CD, and it was proving a great success with scholars and researchers, not just in the field of language, but across the spectrum of disciplines. What Jeffery wanted to do was to transfer the search and display capabilities of the dictionary on CD to the new Internet, so that it wasn’t just accessible to the CD user, but to anyone across the world. We didn’t really know where it would end, but no one else seemed to have managed to do this with a massive, complex reference text. This was off-plan. No one had agreed to it yet, because no one knew about it. The OED had led the way in the past, and perhaps this was where it had a chance to lead again. At the same time, it could draw a new generation of users into its net.
To start with, we didn’t tell Oxford what we were up to. We wanted to get just a bit further along to see how practical the whole plan might be. So by day I was a regular dictionary editor, as we progressed through M into N (fortunately a very short letter), and then into the vowel O (full of awkward prepositions, adverbs, and prefixes that take ages to edit—try on, off, out, and omni- for size); and in the evenings Jeffery and I were exchanging ideas across the Atlantic on how best to replicate, on the Web, the functionality of the OED on CD, and how to do new things with the dictionary—things that no one had thought of doing before—such as linking the OED quotations to the texts they came from.
This seemed to me the most important thing I could do for the OED, after hammering out its editorial policy for the future. We were battling to keep the OED relevant by updating it editorially, but we saw that the updated dictionary would be twice as powerful if it could be accessible to a new and wider audience online. Jeffery and I didn’t keep our experimentation to ourselves for long. After a while I let Ed in on what was happening, and he, too, was excited by the opportunities. We trooped along to the dictionary’s business director, the Admiral, who could also—fortunately—see the possibilities of this new vision for reference works. A bit later, in 1995, once we had a prototype of the OED online and running smoothly, we showed it privately to some decision makers in the University and the University Press; sadly, they didn’t seem to be able to imagine how an online dictionary fitted into a traditional book-publishing outfit like the University Press (the inability to see beyond the past is known to lovers of punctuation as the “Oxford coma”).
Naturally, we thou
ght we were right all the same, so we carried on. Soon we had an online prototype that anyone around the world with online access could visit and use for their research. The only problem was that it was in secret development; no one knew it existed except a handful of us in Oxford and a few friends and acquaintances in computer science departments in America. It was one of the first five hundred sites on the Web, so in reality, not many people had the faintest idea how this sort of technology worked in the first place. Nowadays there are over 3 billion Internet users; when work started on the OED Online there were only 14 million, and they were mostly using email, discussion forums, bulletin boards—and a bit of e-commerce.