Where Wizards Stay Up Late
Page 13
Because of Heart’s insistence on reliability, and Kahn’s early analysis of this area, a large number of error-control mechanisms were designed into the system. Every communications system is prone to errors in transmission caused by noise in the communication circuits. Voices passing through telephones, an analog transmission, can be garbled or ambiguous—as when the sounds of “s” and “f” are confused. Digital transmissions can also be distorted: a “1” can come through as a “0” and vice versa. Errors occur in bursts. If a given bit is in error, the probability of surrounding bits being in error is much higher than normal. Despite these problems, there are good techniques for detecting and even correcting digital errors, and the IMPs would have to rely on them.
Digital error correction rests upon the basic idea of the “checksum,” a relatively small number that is calculated from the data at its source, then transmitted along with the data, and recalculated at the destination. If the original and recalculated numbers do not agree, there has been an error in the transmission, unless perhaps the checking hardware itself failed, a very unlikely proposition.
Checksums appear in data transmissions of all kinds. For instance, every beep you hear at the supermarket checkout counter signifies that a tiny laser has scanned a bar code and transmitted its digits to a computer where the checksum has been calculated and found to be correct. The machine at the checkout counter has done some sophisticated decimal arithmetic along the way by shuffling, multiplying, and adding the scanned digits—all in the blink of an eye. In most supermarket systems the result must end in 0, the single-digit checksum used for all products.
If a product is scanned and the computer fails to beep, it means the arithmetic didn’t check. If the computer had a way of correcting the error, it would beep on every pass and save time. But error-correcting techniques add cost to the system, so the checkout person must pass the item through the scanner again, perhaps two or three times until the code is transmitted without error.
The IMP Guys faced a similar problem: If a checksum detected a packet error on the network, how should it be handled? Should the transmitting IMP send the packet again, or should the receiving IMP be augmented with hardware to correct the error? In a network, error correction eats up space on the communications circuits and increases hardware expense in the switching equipment. Consequently, the BBN team decided that if an IMP detected an error in a packet, it would discard the packet without acknowledging receipt. The source IMP, still possessing a duplicate packet and still awaiting acknowledgment but not getting it, would then retransmit the duplicate.
Before issuing the request for proposals, Roberts had had to decide on the type of checksum for the IMPs. How many bits should be assigned to it and how sophisticated should it be? The precise requirement, based on an average number of errors in the phone lines, was difficult to determine because there was no hard information available about error rates on the high-speed lines over which the data was to be sent. Still, it was obvious that a 1-bit checksum would never do. Nor would a 2-bit, or even an 8-bit. Even a 16-bit checksum might not be good enough.
Kahn had earlier documented that a 16-bit checksum might not be sufficiently powerful to reach the desired level of reliability in the network, especially given the uncertainty in error performance of the high-speed lines. Kahn shared with Roberts some rough calculations that strongly suggested a 24-bit checksum would be a much better choice, pointing out that the extra 8 bits added very little expense to the hardware. The checksum was one of many technical issues on which Roberts listened to Kahn’s advice, and a 24-bit checksum got written into the RFP. Later, Kahn argued the same case convincingly to Crowther and the others, and the IMP Guys settled on the 24-bit checksum as one vital piece of the error-control system.
The BBN engineers had good intuition for which problems to solve in hardware and which ones to solve in software. It made sense to let the IMP’s hardware calculate the checksum, because a software calculation would be too slow. The final IMP-to-IMP error-detection scheme was a clever mix of known engineering techniques and others of the BBN team’s own invention. As Crowther put it, “We’d steal ideas from anywhere, but most of the time we had to roll our own.”
On Valentine’s Day 1969, Cambridge was socked in by a snowstorm. About two dozen people were in attendance at an all-day meeting at BBN. This was the first meeting between Heart’s team and the researchers and graduate students from the host sites.
Through Heart’s cautious eyes, this crowd of mostly graduate students looked hungry to get their hands on the IMPs. He suspected that when ARPA decided to put the IMPs out at the sites, the researchers expected to have another computer to play with. He imagined they’d want to use the IMPs for all sorts of other things—to play chess or calculate their income taxes. “I took an extraordinarily rigid position,” Heart recalled. “They were not to touch, they weren’t to go near it, they were to barely look at it. It was a closed box with no switches available.”
Kahn was still hard at work on the host-to-IMP interface specification, so it remained unclear to host team members exactly what they’d be required to build. Some people from the host sites asked to see what BBN had in mind, but the IMP Guys hadn’t settled on a plan among themselves. On that issue nothing much was resolved at the meeting.
The graduate students decided to share with BBN a plan they had devised for the hosts to compute an end-to-end checksum. This would provide an extra layer of protection against errors in host-to-host communications. It was designed to catch various imagined errors, including the possible misassembly of message packets by the IMPs.
Heart was distressed to hear this, because it would slow down the hosts and make the entire system appear slow. Nor did the very idea that the IMPs might pass damaged packets up to the hosts sit well with him. The students argued that BBN’s 24-bit checksum didn’t cover the paths from the IMPs to the host computers, and that bits traveled “naked” between the two machines. Heart assured everyone, in no uncertain terms, that the IMP checksum would be reliable. It remained to be seen, and in time the students would be more right than wrong, but with Heart’s confidence on display, the host sites dropped their plans to include a checksum in the host protocols.
More problematic was the idea of connecting multiple host computers to the IMP at each site. When Roberts first designed the network, his idea was to connect one—and only one—host computer to each IMP. However, by the Valentine’s Day meeting, representatives from the sites were making it clear that they wanted to connect more than one host computer to each IMP. Every research center had multiple computers, and it made sense to try to connect more than one machine per site, if possible. Roberts sent word to Cambridge that BBN was to redesign the IMP to handle up to four hosts apiece. Walden, Crowther, and Cosell invented a clever way to do it.
After Valentine’s Day, the IMP Guys really went to work. Their working hours stretched long into the night. Heart, who lived in the rural town of Lincoln, tried to get home in time to have dinner, but often he didn’t make it. It was easier for the others to go home for dinner and return to work, or not go home at all. When he was deep into the project, Crowther would sit at his terminal until he fell asleep.
Now the real pressure was on Kahn. He spent much of the next two months on the phone with people at the host sites, grinding away at the interface specification. Kahn became BBN’s main point of contact with the host research community. Researchers called him regularly to check what was happening, and what the schedule was, or simply to pass along new ideas.
By mid-April, Kahn finished the specification, a thick document describing how a host computer should communicate with a packet switch, or IMP. “It had been written partly keeping in mind what we’d been told the hosts wanted, and quite a lot keeping in mind what was going to be possible to implement, and what made sense to us,” said Walden. A committee of representatives of the host sites reviewed it and told BBN where they didn’t think it would work. The specificati
on was revised until an acceptable design was reached. The host sites had something to build now. The UCLA team, which would be first, had less than five months to get ready for the arrival of its IMP.
Heart had drawn a clear line between what the IMPs would handle and what the hosts would do. “Early on, Frank made a decision, a very wise decision, to make a clean boundary between the host responsibilities and the network responsibilities,” said Crowther. Heart and his team decided to put “maximum logical separation” between the IMP and the host. It made conceptual and design sense for them to draw the line there to avoid cluttering or crowding the IMP’s functions. This also made building the IMPs more manageable. All IMPs could be designed the same, rather than being customized for each site. It also kept BBN from being caught in the middle, having to mediate among the host sites over the network protocols.
BBN had agreed with Roberts that the IMPs wouldn’t perform any host-to-host functions. That was a large technical problem. There were neither language standards nor word-length standards, and so far nothing that would facilitate easy communication between hosts. Even individual manufacturers, such as Digital, built a number of wholly incompatible computers.
The last thing BBN wanted was the additional headache of solving the host-to-host problems. Furthermore, Roberts didn’t want to give BBN or any other contractor that much control over the network design. Roberts was determined to distribute responsibilities evenly. Between Roberts and BBN it was settled: The IMP would be built as a messenger, a sophisticated store-and-forward device, nothing more. Its job would be to carry bits, packets, and messages: To disassemble messages, store packets, check for errors, route the packets, and send acknowledgments for packets arriving error-free; and then to reassemble incoming packets into messages and send them up to the host machines—all in a common language.
The IMPs were designed to read only the first 32 bits of each message. This part of the message, originally called a “leader” (and later changed to “header”), specified either the source or destination, and included some additional control information. The leader contained the minimal data needed to send and process a message. These messages were then broken into packets within the source IMP. The burden of reading the content of the messages would be on the hosts themselves.
The host computers spoke many different languages, and the hardest part of making the network useful was going to be getting the hosts to communicate with each other. The host sites would have to get their disparate computers to talk to each other by means of protocols they agreed on in advance. Spurred by ARPA, the host community was making an organized effort to begin resolving those protocol issues, knowing it would be quite a while before anything was settled definitively.
IMP Number 0
One spring day, a delivery truck from Honeywell turned down Moulton Street. Inside was the first 516 machine built to BBN’s specifications. The refrigerator-sized computer was brought off the truck and onto a loading dock at the back of the systems division building and then was rolled into a large room, soon to be known as the IMP room, adjacent to the dock. The team had converted a storeroom into space for testing the IMPs by adding a raised computer floor, bright fluorescent lighting, and air-conditioning. The windowless room was where the youngest man on the team, twenty-two-year-old Ben Barker, would soon spend a lot of time.
Barker was an undergraduate student whose brilliance had caught Ornstein’s attention in one of the classes he taught at Harvard. When BBN was awarded the ARPA contract, Heart had offered Barker a job, and Barker had taken a leave of absence to accept it. Barker, like Ornstein, was a hardware engineer and he showed signs of becoming an ace debugger—someone who could rescue a project when the time came. He was placed in charge of setting up each IMP Honeywell delivered and debugging the hardware before it left BBN’s shop.
This first machine was the prototype (IMP Number 0), a nonruggedized 516 containing Honeywell’s initial implementation of BBN’s interfaces. With the machine in the middle of the room, Barker ran power to the computer, plugged everything in, and turned it on.
Barker had built a tester and had written some debugging code. He was looking forward to working out whatever bugs the machine had. Undoubtedly there would be something that would need fixing, because there always was; bugs were part of the natural process of computer design. Heart and the whole team looked forward to finding out which parts of the IMP design worked and which needed more attention.
Barker tried loading the first IMP diagnostic program into the untested machine. He couldn’t get it to work. So he loaded some other code, and that didn’t work either. Barker tried a few other things and discovered that nothing worked. “The machine didn’t come close to doing anything useful,” he said. So far, the first IMP was a fizzling dud.
Prior to the IMP project, people at BBN and Honeywell had interacted casually and relations were friendly. In the days leading up to the IMP project a sense of teamwork grew. Honeywell had devoted a special systems crew to work on the BBN contract from day one. At BBN’s request, Honeywell had assigned one of its technicians exclusively to the task of shepherding Honeywell’s part of the job to completion.
This was unusual. In general, minicomputer manufacturers like Honeywell didn’t cater much to the special demands of their customers. “Most computer companies won’t build specials at all,” said Heart. “Or if they do, it’s under great duress.” Minicomputer salespeople went after a broad market, while mainframe computer makers were known to treat customers like royalty. Nobody in the minicomputer business did much hand-holding.
In the wake of IMP Number 0’s gross failure, BBN’s hardware chief Ornstein began to go over the design with the Honeywell team. He discovered that no one at Honeywell seemed to understand in much detail how the BBN-designed interfaces were supposed to work. He was surprised to learn that the technicians building the first interface didn’t really understand the drawings. Honeywell hadn’t attempted to develop any diagnostics to test the design; it had simply tried to produce a faithful implementation of the block diagrams that Ornstein had drawn and that BBN had included in its proposal to ARPA. The trouble was that in furnishing Honeywell with a set of fairly generic block diagrams, BBN assumed that Honeywell’s familiarity and expertise with its own machines would enable the computer manufacturer to anticipate any peculiar problems with BBN’s requested modifications to the model 516. Honeywell had its own logic modules, its own design system. But instead of working out the essential details in the blueprints, Honeywell had built BBN’s machine without verifying that the BBNdesigned interfaces, as drawn, would work with the 516 base model.
Of course, neither BBN in drawing the block diagrams nor Honeywell in implementing the design actually had all of the necessary tools to create a perfectly working prototype IMP on the first pass. In building new computers, said Barker, the operative assumption is that you design something you think will work, get the prototype ready, start testing, then gradually fix the design errors until the machine passes the test. It would have been an engineering fluke if the machine ran perfectly straight away. But even as a first pass, the condition of this prototype machine was unacceptable.
If Ornstein was concerned about Honeywell’s performance, Barker was downright nervous. As the chief debugger, he was the one responsible for getting the machine to actually work. At this stage in the IMP project, the interfaces on the 516, he said, “wouldn’t have come close to working even if Honeywell had implemented them properly.”
Barker was staring at weeks of concentrated work ahead. He felt the weight of the schedule suddenly grow heavier. If the BBN hardware team intended to hand off to the BBN programming team a working version of the modified Honeywell 516 any time soon, so that Crowther, Walden, and Cosell would have time for final debugging of their operating code, then the hardware specialists would have to hustle. With Ornstein’s help, Barker would have to “take the stuff Honeywell had built,” he said, “and figure out how to make it act
ually do what it was intended to do.”
The arrival of the prototype IMP in its initial state marked a real setback; correcting the course would take time, and soon there would be precious little of that.
Armed with an oscilloscope, a wire-wrap gun, and an unwrap tool, Barker worked alone on the machine sixteen hours a day. The circuitry of the computer relied on pin blocks, or wire-wrapped boards, that served as the central connection points to which wires, hundreds upon hundreds of wires, were routed. There were numerous blocks of thirty-four pins into which logic boards were plugged and which carried components to form the correct circuits. After figuring out where the wires should actually go, Barker had to unwrap each tightly wound misconnected wire from its pin. The pins in each block were about an inch long, and were closely spaced (1/20th of an inch apart) in a square matrix; each block looked like a miniature bed of nails with wires streaming Medusa-like into and out of it. Once he determined where the correct wires should be reconnected, Barker used the wire-wrap gun to wrap each wire carefully on its correct pin. It was a long, laborious, and delicate process.
To complicate matters, Barker had a slight palsy in his hands. Working with a wire-wrap gun called for a steady hand and good concentration. The close spacing of the pins, the weight of the wire-wrap gun, and the size of the nozzle on the gun that had to be slipped down over a single pin amid a small forest of pins all conspired against him. The risk was in getting the wire on the wrong pin or bending or breaking a pin. You could destroy a lot of fine work if you weren’t careful. So the shake in Barker’s hands caused quite a stir among the other IMP Guys when Barker took his wire-wrap gun to the pin blocks inside the IMP.
Most of the rewiring was done with the power off. When something had to be done with the power on, it was done with little clip-leads that slipped onto the pins. Here, though, there was a very real danger of shorting things out and blowing circuits.