by Eric Flint
Keyboarding text into a computer is no faster than typing it at a typewriter. The advantage is that the text can be stored electronically, and that it can be sent to a computer printer which can make multiple copies a lot faster than could be done with a typewriter alone (in other words, the transcription speeds are the same, but the computer system has a higher publication speed).
Book scanning is something of a nuisance. The original must be correctly positioned on the scanner window (books have a tendency to shift), and then you do a preview scan. If you are happy with the result, you do the real scan. You can scan 2–3 pages/minute (ppm)(own testing; Cohen; Project Gutenberg). Double that if you scan two pages simultaneously.
It can be tricky to get a book to lie flat enough so that you can scan the entire page of text, especially if it has a narrow "gutter." In fact, you might have to unbind the book (e.g., with a guillotine cutter) in order to make a successful scan.
In theory, unbinding also allows you to use a scanner with an automatic document feeder (if any are available in Grantville), and thereby increase scanning efficiency (to perhaps 25 ppm). The catch is that books often have pages which are too small, large, thick or thin to make the ADF happy, and it jams up. Another problem with thin paper is "bleed through." (Adams)
If the book page is scanned in, then you either need optical character recognition software to convert the image into text, or a graphics quality printer. Printing graphics may be slower than printing text. However, OCR processing will require another 12–60 seconds/page.
Some books, because of their fonts and layout, will be better suited to OCR than others, so it may pay to limit OCR to the works best suited for it. Sometimes, OCR is used just to generate a searchable database to supplement a scanned image (Cohen). OCR works best with unbound sheets, avoiding page curvature, gutter shadow, etc.
A final possibility is to use speech recognition software. Achievable dictation speeds are variously reported as 40–160 wpm (Johnson, Kramer, Devine, Griffith, Patterson)(punctuation and digits count as words). Errors are common and, when correction time is taken into account, typing is faster even for slow typists (Hah).
OCR and Speech recognition software were commercially available as of RoF, but we don't know whether they existed in Grantville. There was almost certainly OCR software bundled with the scanners, but those were most likely "limited" editions. We don't know if anyone upgraded. Speech recognition software might have been owned by someone with a disability hindering typing.
Computer Systems: Text Output
The principal kinds of computer printers which might be found in Grantville are daisy wheel printers, dot matrix printers, inkjet printers, and laser printers. In 1997, 85.5% of home computers were equipped with printers (Statistical Abstracts 2000, Table 912). (Those which nominally lacked a printer were probably sharing one with another computer.) Grantville probably has over 600 printers.
Daisy wheel printers use a spoked wheel; there is a raised letter at the end of each spoke. The printer controller spins the wheel to the appropriate position and then a pawl strikes the back of the selected spoke. Daisy wheel printers were superseded in the late eighties by dot matrix printers.
Daisy wheel printers can only produce character-based graphics (with an appropriate daisy wheel bearing the characters). The other three printer types can produce dot (pixel)-based graphics and hence are far more suitable for reproducing illustrations.
Dot matrix printers produce characters by using tiny electromagnets to drive pins against a ribbon, causing a dot to appear on the paper. Each character is the result of a particular pattern of dots. DMPs have a very long working life and, in addition, it should be possible to build at least draft-quality DMPs within a few years after RoF.
To take maximum advantage of our DMPs, we need tractor-fed paper. The supply in Grantville is probably pretty small, so the ability of down-time papermakers to duplicate it is going to have a significant effect on the long-term utility of DMPs. Tractor-fed paper has regularly spaced holes on both sides to receive the tractor pins, and ideally has seams for separating the pages and removing the holes when the run is complete.
Dot matrix printers were eclipsed by laser printers for general purpose printing, and the DMPs in Grantville are probably nestled away in attics and basements. They are going to come out of storage because, like the even older daisy wheel printers, and unlike laser or inkjet printers, they can be used to make stencils.
The laser printer is, essentially, a modified photocopy machine, with similar paper requirements. The principal advantage of inkjet printers during the Nineties was that they could be used to produce color prints; laser printers were faster and had a lower page cost. There are undoubtedly inkjet printers in Grantville homes, but the businesses are more likely to have laser printers.
Computer Access
Typing at a computer keyboard, of course, is no faster than typing at a typewriter. There are two advantages to typing at a computer; you obtain computer-searchable text, and you can take advantage of fast computer printers (especially dot matrix and laser) to print multiple copies. They aren't necessarily faster than a mimeograph, but the reproduction quality is higher.
Scanning is definitely much faster than typing—especially if you are content to print the scanned image per se. In 1999, 36% of internet households had scanners (InfoTrends). In August 2000, about 81% of computer households had internet access (Census Bureau P3-207). So I would estimate that there are around 200 scanners in Grantville. Scanning is a fairly processor-intensive process, and it is possible that some computers with scanners will be set up as dedicated scanning stations for time-critical duplication jobs.
Computer printers provide a combination of print quality and speed which is exceeded only by photocopy machines. The problem is that the computers have so many uses, other than for simple text input and output, and the scanner, keyboard and printer can't be used without the computer.
It has been argued that the computers usually have plenty of spare processing power, and hence can be used to print queued jobs in the background (or, for that matter, when the computer would otherwise be idle or turned off). Indeed, those jobs could be sent to them over a Grantville network. Hence, the up-time owner could earn some extra cash by participating in a network-based "printing cooperative."
That's true, but there are a few caveats. First, there may be other jobs which can be run the same way but which can earn more money; e.g., engineering modeling, or calculating reference tables. There are hundreds of possible tables; it isn't just construct a log table and then you're all set. Secondly, there is a hidden price. If the computer is kept on all night to run print jobs, then it is going to fail that much sooner. Finally, there is only a limited supply of suitable paper for the computer printers, and it's uncertain whether it can be duplicated well enough with down-time technology. The same, of course, is true of the toner for the laser printers.
Photocopiers
For the purpose of this article, I am going to assume that photocopies will be made for down-timers only in limited quantities. The most prominent example, of course, is that which appears in Viehl, "A Matter of Consultation" (Ring of Fire): William Harvey (the personal physician to King Charles of England) was able to obtain copies of four medical texts and, more ominously, "a few pages" from Trevelyan's History. In agreeing to have the copies made, Ed Piazza comments, "We have to conserve its use these days, but doing the books are no problem. Especially after your generous gift of coffee, and telling us where to find the Turkish traders to buy more."
A Kinko's in 2000 might have had a 100-200 ppm copier, but the photocopy machines in Grantville are not likely to be the ones which were state-of-the-art then. The schools and the public library might have 40 ppm machines. The churches, the banks, the accounting firm and the real estate agents probably made do with 20 ppm copiers. Perhaps another fifty businesses have "personal" (8 ppm) copiers (which may in fact be dual-duty laser printers or fax machine
s).
In copying bound volumes, one can waste quite a bit of time positioning the book on the window, especially if the book page is larger than the paper size and you're trying to make sure that all of the text gets copied. You might spend five seconds flipping the page, flattening down the book, and making sure it is where it should be. The copier then has to at least partially warm up again. This rigmarole limits the effective speed to 10–20 copies/minute, even if the rated maximum speed of the machine is higher. On the bright side, with a small book, or a copier which can handle large format copy paper or with reduction capability, you might be able to copy two pages at a time.
Assuming 10–20 copies/minute, 300–600 words/page, and 1–2 pages/copy, the effective transcription speed of the photocopier is 3,000–24,000 wpm, making it, by far, the fastest transcription technology in Grantville. If you are willing to unbind the book, so the pages can be fed automatically into the copier, the actual copying will be even faster.
Let's now look more closely at the issue of how long the photocopiers will be operational.
Paper. Photocopiers are extremely finicky about their paper, and it is doubtful that sufficiently high quality paper can be made down-time in the first few years after RoF. A 20–25 ppm photocopier might be used in an office with an average monthly volume of 8–10,000 copies, a 35–45 ppm machine in one making 10–20,000 copies/month (Monash). It is unlikely that the office kept more than a month's stock of paper on hand. The total stock of paper for the ten office-grade (20+ ppm) copiers is probably about 80–100,000 pages. The personal copiers are likely to have low copy volumes, perhaps 500 pages/month. The average user probably has 500–2,500 pages paper on hand. That gives us another 25–125,000 pages which could be run through the office copiers. So our total photocopy paper supply is probably on the order of 100–200,000 pages. Sounds like a lot, but a single encyclopedia is probably 30,000 pages.
Toner. A toner cartridge might be good for 4000–8000 pages. There are probably two or three replacement cartridges on hand for each office copier, and just one for each personal copier. The toner cartridges aren't interchangeable between models. It isn't going to be easy to manufacture replacement toner; particle size and melt point are critical, and modern formulations contain polymers as well as carbon.
Typesetting and Letterpress Printing
There are several publishers in Grantville. The Grid suggests that the Grantville Times was in business pre-RoF. Times Printing Press is a subsidiary of the Grantville Times, founded in 1631, with the aid of Arnold Selfish, down-timer printer from Leipzig. Staff includes six down-timer pressmen and four down-timer translators. There is also the Grantville Free Press, the Grantville Daily News, The Street, and the Grantville University Press, all presumed to have their own presses. It wouldn't surprise me if independent printers set up shop in Grantville.
Publishing was a big business in down-time Europe, and some books will be purchased (or borrowed) in Grantville, and reprinted elsewhere—especially in Germany, the Netherlands, England, France, Venice, and Florence. The work can be typeset in Grantville, and "flongs" (lightweight molds of the set type, made of papier-mache, plastic or rubber) shipped to foreign presses. They in turn can prepare "stereotypes," printing plates made by pouring metal into the flongs. Stereotyping was invented in 1730 and perfected in the nineteenth century.
A typical sixteenth-century print run was 1,250 copies, but runs of 3,000–4,000 were not unusual. The big runs were typically religious (the first edition of Luther's German Bible was 4,000) or legal (the commentaries were good repeat sellers to law students). However, a run of over 3,000 copies of a volume of erotic Latin poetry has been documented. (Jardine, 160–1) What that says for the prospects of Harlequin romances is uncertain. A printing press could use 1,500 sheets a day, and paper was about two-thirds of the cost of production (Jardine, 162).
The bottleneck in seventeenth-century printing is typesetting. Manual typesetting is slow. The winner of a late nineteenth-century hand typesetting championship "produced 2,277 ems of correctly spelled and spaced type per hour." Even production of more than one thousand ems per hour was enough to earn a hand compositor the nickname, "the velocipede." (Sonn, 150). If an average word, including the trailing space, is six ems, 1200 ems/hour equals 3.33 wpm.
I will leave it to someone else to determine how soon the automatic typesetting machine (e.g., Linotype) will be re-invented. I would be very surprised if this happened before 1635. Likewise, I am ignoring photographic printing methods (other than photocopying).
Proofreading and Correction
Scribes, typists, typesetters and text input software (OCR and speech recognition) have one thing in common: they all make mistakes.
Down-time printers were surprisingly tolerant of typos; Erasmus complained that books were set with thousands of mistakes, rather than hire proofreaders. (Jardine, 228). However, I expect at least part of the Grantville corpus to be treated with more respect.
In estimating the time (and expense) needed to make accurate copies of the books of Grantville, we must include proofreading (checking just for faithfulness to the original) and correction time.
Proofreading by inspecting the typed copy, and looking back at the original when something seems wrong or prone to error, can be done at speeds of 120–200 wpm (WikiWPM; Wald, ViaVoice, Weiler). The safest method of proofreading is to have one person read aloud (probably at 100–160 wpm) the original text while another checks the copy. If the transcribed text is computerized, text-to-speech software can be used. If the text has many errors, proofreading speed is reduced.
The correction time will depend on the number of errors (OCR and speech recognition are error-prone) and on the copying method. It takes time to find as well as fix the error in the document. Correction can be as easy as a global search-and-replace, or as hard as repairing an improperly cut stencil.
OCR character accuracy in 2000 was somewhere in the 90–99% range, depending on the quality of the original (ComputerWorld; Rice). What does 90% accuracy mean? Well, if a page is 300 words, and the average word is 5 characters (ignoring spaces), we have 1500 characters, and 150 mistakes among those characters.
In modern typing tests, the standard penalty per error is 2 wpm. However, that assumes that the typist is working at a speed of 50–60 wpm, and thus that each error takes about two seconds to correct. In my opinion, that's on the low side; in the mid-1900s, the penalty was 10 wpm/error (Wikipedia, "Typewriter").
Still, let's say it takes two seconds to correct each mistake in a 30 second scan-cum-OCR; 300 seconds for 150 mistakes. So, for corrected output, our speed is 300 words/330 seconds (<1 wpm). Increase the OCR accuracy to 99%, and the effective speed is still only 5 wpm. Of course, we could choose to publish uncorrected OCR'd text, or limit checking to numbers.
The total proofreading/correction time on the U. Michigan "Making of America" project (scanned nineteenth-century documents) was 8–9 minutes/page (vs. 2 seconds/page for the OCR processing)(Shaw).
Speech recognition software, in 1998–9, had a word error rate of 5–15% for dictation at 40–160 wpm (Devine, Kramer). Word errors, unlike character errors, destroy the meaning of the sentence so proofreading is critical. (typewell.com)
For computerized text, some improvement in proofreading speed can be achieved by use of spell-checking software. If it is set to replace without user approval, you can increase effective speed, but the software will introduce some errors of its own. If net OCR accuracy were nonetheless raised to 99.9%, effective speed would be 200 wpm.
A big advantage of the photocopier (and other photographic transfer processes) is that you don't have to worry about proofreading, just check that the page wasn't cut off and the text is legible. The same is true if you are just printing a scanned image.
Choice of Copying Method
I assume that the copying of the corpus is going to be sought mostly by down-timers. The down-timers don't have routine access to photocopiers and computer sy
stems. I suspect that sale of such equipment by the up-timers will be rare, and rentals will be quite expensive. The up-timers are well aware that both supplies and spare parts for that equipment are limited and will tend to reserve them (especially photocopiers) for their own use. There will be exceptions made for people like Gustavus Adolphus and the Abrabanel-Nasi contingent, but most interested down-timers will have to copy material by other means.
Those other means, of course, are hand copying and typewriting. Until September, 1633, the only typewriters available will be those of up-time provenance. It is debatable how willing the up-timers will be to sell or rent these. On the one hand, they will undoubtedly command a great deal of money (at least the manual typewriters, since those can be used outside of Grantville), and some of the up-timers are going to be in straitened circumstances (e.g., they were living on Social Security, or their jobs are irrelevant in the new economy). On the other hand, the up-timer, if able to type, can use the typewriter to make a living. There will also be a tendency to use typewriters, when possible, instead of computers, in order to conserve the latter.
A typewriter, of course, produces more legible copy than what can be done by hand. And a trained typist is at least two or three times faster than a scribe. But if those were the only advantages of the typewriter, they probably wouldn't be enough to justify buying one for use in duplicating books. The early seventeenth century was an era in which complex mechanisms were much more expensive (when available) than labor, even literate labor.
The key point (credit to Gorg Huff) is that the typewriter can be used, in conjunction with a mimeograph machine (or equivalent), as a "micro-press." If you can identify a sufficient number of books for which there is significant demand (say, a hundred copies), and the customers are willing to put up with the limitations of a ditto or mimeo copy, then the capital cost of the typewriter and mimeograph are spread out over all of those titles and customers. Naturally, it would be smart to collect subscriptions before you even started the reprint process.