Book Read Free

The Soul of a New Machine

Page 20

by Tracy Kidder


  Like Rasala, Holberger worries sometimes about playing the tough guy. He says he often feels sorry after he's been abrupt with one or another of the Hardy Boys, and he is consciously developing tact for the lab. "Let's see, I'm a little confused here," he tends to say now, when what he means is, "You guys are all wrong."

  Holberger is married but has no children yet. He says he has more than enough money right now. He has also received some company stock. Stock options, he notes, blur issues of salary; "Data General turns people into capitalists," he says. Holberger likes the local atmosphere. The jeans, West's casual dress, remind him, he says, "that we're not at IBM." He likes not having to punch a time clock. But he knows that his freedom from company clocks doesn't stem from corporate altruism. 'They don't want us to know how many hours we work. If we did, they'd have to pay us a lot more But," Holberger says, "I don't work for money."

  For the last two years, he has been involved in projects with the flavor of crisis about them. He worked on the M/600 with Rasala and went without a break into Eagle. He has been saying of late that he doesn't want to take on any more jobs like this one, but he's also been saying that he isn't sure he means it "It's very challenging and very interesting," he says. "There's a lot of, uh, prestige, I would say. Perhaps I like some of the things I say I don't like. It's consuming. I don't know. Perhaps I don't like it. But jobs like this aren't real common. In other companies people with our experience aren't allowed to do this, I think." He wears a wry smile. "Of course, that's how Data General gets cheap labor." Holberger has noticed that there is almost no one in the basement involved in CPU design who is over thirty-five. What happens to old CPU engineers? Holberger is twenty-six now, and though not exactly on his deathbed, he is curious about what a computer engineer does "afterward." Maybe, he says, efforts like this one can only be conducted by the very young.

  "Like "war," I suggest.

  "Yeah, really," he says, laughing.

  Holberger's father was an engineer, and so are three of his four brothers. He went to Clarkson and earned a master's degree at the University of Illinois, a Jerusalem of computer engineering, back then one of the few universities in the world where a student could do research in the hardware of computers. He hung around an old IBM machine in high school, and was taking things apart virtually from his infancy. It's something he still does. Recently, when he bought a digital watch, he took it apart. Same thing with his new programmable calculator. "I usually get things back together, too."

  Significant parts of Eagle are mainly of Holberger's own devising. He thrashed out the plan for the implementation of the memory management system in many long, loud sessions with Wallach. "He's a good play on Wallach," says Rasala, proudly, feeling that he himself could not have held his own with Wallach, as Holberger did. Holberger designed most of the IP in Eagle. He regrets that it isn't two years earlier, for if it were, Eagle's hardware design might represent an advance in the state of the art and not just an example of it. But Holberger feels that he and his colleagues have taken some original approaches. "There were some general ideas out there, but the actual implementation of the IP — with all due modesty — I took some vague specs and thought it up." In fact, he made the IP run faster than those specs envisioned. As for the question of whether the IP will ever work right, he isn't worried about that. The way he looks at it, the whole machine's a crossword puzzle that he and the rest of the designers thought up, and now they just have to solve it. "I'm getting quite good at it," he says. "I can track a problem back into the twilight zone quite well."

  With a touch of regret, Holberger puts the spring morning behind him and makes his way briskly through the basement — still mostly empty at this hour. As expected, when he gets to the lab, he finds Jim Veres there, sitting in front of Gollum. And right away, Holberger's back into it.

  Last night, per Holberger's instructions, both Coke and Gollum were left running the diagnostic program called "Eclipse 21." Some weeks before, the debuggers ran this program and the machine failed sporadically. They didn't closely examine the failure. They decided that it was probably a flakey — most likely a loose connection or "noise" — and they went on to other diagnostic programs. Now, however, the machines have successfully negotiated all of the basic Eclipse diagnostics except for Eclipse 21. So it's time to go back and clean up the problem, whatever it is — the failure that might be noise or a loose connection.

  One characteristic of the diagnostic programs is extreme repetitiousness. Each test contains a number of subtests, each one of which consists of dozens of instructions — ADDs, subtracts, jumps, loops, Skips On Equal and so on. The program has the machine perform one of these subtests dozens of times, each time with different data, and then tells it to go on to the next subtest. When the last of these subtests is completed, the diagnostic program directs the machine to go back and repeat the process all over again. The entire program, including all the repetitions of the subtests, is repeated a large number of times — say, a hundred — before one so-called pass is complete.

  If, during all these calisthenics, the machine fails to execute an instruction properly, the program directs it to confess the act by sending an error message out to its console and then tells it to go on with the exercise.. The debuggers can come in after the machines have been running all night and find out right away, by placing an order through the console, how many passes of the program the machine has run and how many times it has failed.

  Veres has already done this. He tells Holberger that Gollum ran 921 passes of Eclipse 21 last night, with only 30 failures. And Holberger makes a face.

  In this context, 921 is a vast number. It means that any given instruction in the diagnostic program may have been executed millions of times. Against 921 passes, 30 failures is a very small number. It tells them the machine is failing only once in a great while — and that's bad news, because it's hard to locate the cause of a failure that crops up only once in a while. As they say, the first step in fixing something is getting it to break. The problem could just be a loose connection or noise, though. But while noise and loose connections can cause sporadic failures, they usually do so erratically, in no discernable pattern. And when Holberger asks Veres how Coke performed running Eclipse 21, Veres tells him that the story is the same as it was with Gollum: "Nine hundred and twenty-one passes, thirty failures.

  "I'm still willing to call it a noise problem," says Veres. But Veres is thinking, "Either that noise is remarkably consistent or we've got a real problem in the logic somewhere."

  Holberger thinks it would be nice if noise was the culprit. This bug in Eagle has the feel of an unpleasant one. So, thinking wishfully, they concoct a few theories about noise. Finally, Holberger says: "Okay. Time to fix it."

  They don't need to say much to each other for a long time after that. Between Holberger and Veres there exists a kind of technical understanding that outruns the powers of speech. Most Hardy Boys share this specialist's ESP to some degree. It's a feeling that some good chess players say they share with worthy opponents, a kind of mind reading — what Holberger calls being "in sync." To a degree, all of the Hardy Boys are loners; all say they usually prefer to work by themselves. But Veres and Holberger have found that working together, they do produce results. To Veres, Holberger is "very quick," and because of his superior knowledge of Eagle's design, Holberger can often "fill in the details" for Veres. Holberger, for his part, is impressed with Veres, and calls him "one of the stars."

  Veres was given responsibility for the IP, and he designed a large part of it with Holberger's assistance. And when the debugging began, Veres quickly picked up technique and found his own style for the lab. Holberger feels that by now Veres, too, can find his way adroitly into the twilight zone of Eagle.

  None of the Hardy Boys hesitates to contradict what he considers to be a wrong technical statement, no matter who utters it. Veres can be abrupt in the lab, too. He is a tall, fairly husky young man with a stem glare. You notice this sometim
es when talking to him — he's looking at you and he's really listening; it makes some people nervous. His managers' confidence in him is tempered only by their feeling that he works too hard. That is how they express it.

  Veres owns his own small computing system and sometimes after a long day in the lab he will go home and tinker with it None of the old hands would dream of doing that, but some of the recruits are hobbyists. When the veterans in the group were growing up, computers were quite rare and expensive, but Veres went to school in the age when anyone with a little money and skill could make up a small personal system. Veres says that what he does at home is different enough from what he does at work to serve as recreation for him. At work he deals with hardware; when he's at home, he focuses on software — reading programming manuals and creating new software for his own computer.

  Veres has no real complaints about the work; on the contrary, his only gripe is that lately his managers have been scheduling the work in the lab in such a way that he can't always get his hands on Gollum for as long as he wants. He calls computers "the ultimate toy." He says: "I like to tinker. I like to build things." In his senior year at Georgia Tech, he got interested in digital clocks. "I built four or five. Then it was computer terminals. I built one. Then I decided I oughta have a computer to hook it up to. So I got a microprocessor and then I figured it was not worth much without an operating system, so I wrote a small operating system. I did a number of all-nighters building this computer junk."

  As it happened, Veres hated the first computer he dealt with. It was a big machine that many people shared, and it just spat out work; it was a cold, distant bureaucrat of a computing system, a machine to which you couldn't talk back. Soon afterward, though, he got to use a small Hewlett-Packard minicomputer; it stood alone and one could deal with it directly. "That made it friendly."

  Holberger and Veres hook the probes of two logic analyzers to various parts of Gollum, and they set the analyzers so that they will snap their pictures when the machine fails. They call this "putting on a trace." They back up the program just a little ways from the point of failure; they run it, and it doesn't fail. Another clue. It suggests that they may be facing "a cache interaction problem." In a machine with accelerators, history is important; often it's some complex combination of previous operations that leads to a failure later on. So now Holberger and Veres start the diagnostic program all the way back at its beginning and go out to the cafeteria for a cup of coffee. About fifteen minutes later, when they have returned to their chairs in front of Gollum, there is a quick flash on the screens of the analyzers. The machine has failed. They have their pictures. They pull up their chairs and start studying snapshots of signals.

  They are trying to figure out exactly what Gollum is doing when it fails. The pictures and the printed "listing" of the steps in the diagnostic program give them the answer.

  "Okay. It's doing a JSR and Return."

  In essence, the diagnostic program is telling the machine to take a short detour off the main road of the program. Gollum is supposed to "jump" away from the stream of instructions it's executing and go get a new instruction. This new instruction should direct the machine to go right back to the place where it was, before it took the jump. This small series of operations is a little hurdle, a trick question, a spot quiz, in the midst of a subtest of the diagnostic program.

  Further study also tells them that the machine did in fact jump to the right instruction, and it did return to the right place; but when it got there, it executed the wrong next instruction. This tends, as they put it, to implicate the memory system, and particularly the IP and System Cache.

  "Is it hitting the I-cache?" says Holberger.

  That's the next question. The IP's small storage compartment is known as the I-cache, and what they want to know is whether or not the instruction that the machine is supposed to return to and execute, after its jump, is residing in the I-cache. The IP saves instructions that it has been executing recently, so if the program has called for this instruction a short time ago it will probably be in the I-cache now, at the time of the failure. They look at more pictures and from them infer that the IP is in fact "hitting" its cache. And they go on, examining with the analyzers the contents of the I-cache. They discover that it has the wrong instruction at the address where the right one should be.

  The conversation that leads them to this conclusion is characterized by alarming brevity; even a skillful computer engineer from another project wouldn't be able to follow it. A rough translation may help. Imagine that Gollum's memory system is organized like a town in which every house has a mailbox. In the computer, there is a large number of mailboxes, each with its own unique address. Inside the Main Memory, there are thousands and thousands of these mailboxes. Identical copies of some of these mailboxes, labeled with the same address numbers and holding the same contents, are also in the System Cache. And a smaller number of mailboxes are in the I-cache. The diagnostic program has directed Gollum to jump to a particular mailbox, to a particular "address." At that address, in that mailbox, is an instruction that tells Gollum to go to another mailbox at another address. The IP looks through its cache and find that it has a mail box with this second address. This second address is indeed the right address, but the instruction in the mailbox is the wrong one. In fact, what's there is an "error message," an instruction that causes Gollum to confess failure on the system console, which sits beside Holberger and Veres. It's a postman's nightmare.

  Time dissolves in the lab on cases like this one. When Veres and Holberger look up from their analyzers, it is already two in the afternoon. In a moment, Jim Guyer comes in, puts down his motorcycle helmet, pulls up a chair, and starts asking questions.

  Talking about Guyer some time before, Rasala has outdone himself in the enthusiasm of his speech, as if portraying the person he is describing: "Guyer's stubborn. Oh, is he stubborn! And he has one amazing flaw. Ask him a question about any problem, any problem at all, and he will get down to the most boring, low-level detail." Rasala took a breath and went on: "Guyer's a mechanic, he likes to fix things. Holberger gets an esoteric notion of an idea and then starts implementing it. Holberger gets a thrill out of making it work, but also out of inventing it. Guyer's more of a craftsman. Guyer can build it and refine it and he works for the pleasure of getting the last bug out of it. I identify with Guyer more probably than with anybody in the group. I don't think he thinks he's a computer genius either, but just a damn good engineer."

  Guyer wears a brown beard that makes his face appear to rest inside an oval frame. He is much given to laughter. He abandons himself to his laugh; it's a high-pitched rapid one, the kind that makes the laugher close his eyes and shake his head. He often leaves his shirt unbuttoned down to the breastbone. A bachelor, he likes to go rock-climbing. He was one of those whom West must have had in mind when he said that computer engineering was the sort of thing that appealed to people who liked to climb up mountains. Guyer says, "I was considered wimpy in high school."

  He grew up in a suburb of Boston — not one of the very fancy ones, nor one of the tough ones, but a place in which athletic ability sometimes superceded other virtues. In his high school, he says, there were "the athletes" and "the nobodys," and he was a nobody, partly because he got very good grades and partly because he had asthma. He had trouble running the mile, but, he says, "I used to surprise people in gym class."

  Guyer, too, is the son of an engineer, and was a tinkerer practically from birth. "I took apart clocks and all kinds of stuff. Lawn mowers. I loved taking things apart. Loved putting them back together, too. Just to look inside and see how it works. Hands on, that's what I liked to do." He went to MIT as an undergraduate, determined to learn something and also to have a good time, and he says he accomplished both ambitions. He got mostly A's at MIT and went to Northwestern for a year of graduate school.

  He had taken up with computers in high school; his school had an old IBM machine, too. He was going to study physics in
college, but it bored him. He preferred engineering; he liked touching things, especially, by then, electronic ones. During one summer, he worked for a defense contractor in Boston. "I didn't especially appreciate that," fee says. But the money was good and, more important, the job gave him a chance to work on something that "had never been done before," on state-of-the-art electronics. He enjoyed this aspect of it immensely. He ignored the ultimate fruits of the project; he didn't bother to get a security clearance, and so afterward he wasn't ever allowed to look at the thing he had helped to make.

  Guyer has been at Data General for three years, ever since graduate school, and he likes the pace. He doesn't care a hoot, he says, about having "the president come down and shake my hand." And he doesn't think much about money, though he won't turn it down. He seems in his busyness, among the happiest of the group. Amazed at his equanimity, one of his colleagues has said, "The thing about Jim is, you can't make him mad." And he has become a favorite of both West's and Rasala's, mainly because of his attitude toward the debugging. "He started getting into everyone else's boards, and when I saw that — bang!" West has said. It's true. At the moment Guyer appears to be much less interested in the board he helped to design, the IOC, than he is in some of the other boards, particularly the IP. Partly, that's because he knows that right now fixing the IP is more important than fixing his own board, and Guyer, who has staked several years on the Eclipse Group, feels protective toward it. "If we don't succeed, we can kiss the Eclipse Group good-bye," he says. He also likes to work on the IP because he doesn't know how it works. "To me, if I can't do it, it's more of a challenge," says Guyer.

  He has spent entire nights alone in the lab, studying schematics and microcode listings in an attempt to fathom the IP. Guyer takes naturally to the night shift. He keeps microcoders' hours — -which is to say, peculiar ones — and he has amazed even some of Alsing's midnight programmers. Jon Blau remembers seeing Guyer in the lab at four-thirty one morning, surrounded by several logic analyzers, all of them hooked up to Gollum. Blau was on his way home at this hour, but after a full night of work on the IP, Guyer was evidently still going strong. "I would start sniffing something around ten-thirty at night," Guyer says. "And I just could not let go. I didn't know how to fix the problems that I saw, but I couldn't leave until I had a picture of them."

 

‹ Prev