by James Reason
3.2. Cued discovery
Sometimes a piece of problem-solving behaviour is not so much wrong as plain foolish—foolish, that is, when you come to realise that a far simpler solution was readily available. Occasionally, the world allows you to discover your foolishness by accident, as in the following personal example.
The right front tyre of my car was badly worn, so I decided to exchange the wheel for the spare. I jacked up the car and attempted to loosen the nuts securing the wheel. But these had been tightened by some muscle-bound mechanic and refused to budge. I tried various ways to shift them: brute force, penetrating oil, and more brute force. Then I tried hammering at the end of the spanner. Nothing worked. Not only were the nuts seized tight, but the wheel turned whenever I tried to get a purchase on the spanner. This, I concluded, was the root of the problem. If I could prevent the wheel from turning, the nuts would surely be unable to resist my efforts.
My subsequent thinking went along these lines. Putting on the handbrake won’t help because that only works on the rear wheels, and the same applies to putting the car in gear. So I’ll have to use the brake pedal. But I can’t do that and work the nuts at the same time. What about using something heavy to keep the brake pedal depressed? No, I don’t have anything handy that is heavy enough. I know. I’ll get my wife to sit in the car with her foot on the brake pedal. Ah! She won’t be able to get in through the driver’s door because the jack is preventing it from opening. And if she gets in on the passenger’s side, she might rock the car off the jack. I’ll have to bring the car down and take away the jack so she can get in ... So, I lowered the jack, and it was only when all four wheels were firmly back on the ground that I realised I had accidentally discovered the solution to the problem of the turning wheel. (Reason & Mycielska, 1982, pp. 81-82)
A more proficient tyre changer would never have jacked up the car without first loosening the nuts. But that is not the point. The problem here was not faulty logic. The error lay in following a blinkered line of thought from one difficulty to the next without considering the total ‘problem space’. And then, quite serendipitously, the answer appeared beneath my nose, revealing the earlier foolishness. I did not encounter a forcing function exactly. The environment simply provided an opportunity for rejoining the correct path.
3.3. System responses to error
Lewis and Norman (1986) identified six possible ways in which a system can respond to its operators’ errors. The actual examples are taken from human-computer interactions, but their underlying principles are applicable to a wide range of systems.
3.3.1. ‘Gagging’
A ‘gag’ is a forcing function that prevents users from expressing unrealisable intentions. In a human-computer interaction, this could take the form of locking the keyboard to prevent further typing until the terminal has been reset. Raskin (1974), cited by Lewis and Norman (1986), inserted such ‘gags’ within his tutorial language system FLOW. If a user attempts to key in a character that does not form a legal command, it is not accepted.
3.3.2. Warnings
Whereas the ‘gag’ presents a block to anything but appropriate responses from users, warnings simply inform them of potentially dangerous situations. The user is left to decide the correct course of action. Thus, the Macintosh interface provides menus that cover all possible actions, including those that are not legal at that particular time. These illegal actions are distinguished by a grey shading. As Lewis and Norman put it, such warnings are error messages before the fact.
3.3.3. “Do nothing”
As the name implies, the system simply fails to respond to an illegal input. It quite literally does nothing, and the user is left with the task of sorting out what went wrong. Such a device is only helpful when adequate feedback information is available.
3.3.4. Self-correct
Whereas the ‘do nothing’ method is the simplest error-preventing technique, ‘self-correct’ devices can be extremely sophisticated. Here, once an error (usually a programming error) is detected, the system tries to guess some legal action that corresponds to the user’s current intentions. A particularly Intelligent’ example is ‘DWIM: Do What I Mean’ (Teitelman & Masinter, 1981), available on the InterLisp system. Teitelman, the designer of the system, gave its rationale as follows: “If you have made an error, you are going to have to correct it anyway. I might as well have DWIM try to correct it. In the best case, it gets it right. In the worst case it gets it wrong and you have to undo it: but you would have had to make a correction anyway, so DWIM can’t make it worse” (quoted by Lewis & Norman, 1986, p. 423).
3.3.5. “Let’s talk about it”
Some systems respond to user errors by beginning a dialogue. A useful example cited by Lewis and Norman (1986) is the way that many Lisp systems react to user-induced problems. The user receives a message describing as far as possible the source of the difficulty. He or she is then automatically switched into the ‘Lisp Debugger’, which allows the user to interact directly with system so as to locate the error.
2.3.6. “Teach me”
On detecting an unknown or inexact input, the system quizzes the user as to what it was he or she had in mind. In short, the system asks the user to teach it. For instance, when one natural language inquiry system (Clout) encounters a word it does not understand, it asks the user for a definition. If the system fails to understand one of the words making up the definition, that too is queried and so on until the definition is comprehended. The new words or phrases are then stored by the system and are accepted without question in future interactions.
4. Error detection by other people
At Three Mile Island (see Chapter 7 for a more detailed discussion of this accident), the operators failed to recognise that the power operated relief valve (PORV) had not automatically closed, as it was designed to do in the course of recovery from a reactor trip. They were misled by the control panel indications showing that the valve was ‘commanded’ shut; they failed to appreciate that this did not, by itself, signify that the valve was actually closed (see also Chapter 4). This error was further compounded by two other crew failures. First, they twice misread a 285 degrees Fahrenheit temperature as being only 235 degrees. Second, they wrongly assumed that the observed high temperature was due to a chronically leaking valve. As a consequence, they did not identify the true state of the plant for more than two hours. The resulting water loss caused significant damage to the reactor.
The stuck-open PORV was only discovered two and a half hours into the incident when the shift supervisor of the oncoming shift noticed that the PORV discharge temperature was about 25 degrees hotter than the code safety discharge temperature. He correctly interpreted the reading as showing a stuck-open PORV. The associated block valve was shut, thus isolating the malfunctioning PORV. Only at this point did effective recovery actions begin.
At Oyster Creek (a General Electric boiling water reactor) on 2 May 1979, an operator erroneously closed four pump discharge valves instead of two. This effectively shut off all natural circulation in the core area. The error was only discovered 31 minutes later, when the engineering supervisor entered the control room and noticed a precipitous decline in the water level after a discharge valve had been opened. He noted the unintended closure of the B and C discharge valves while walking to check the pump seal display.
That these are not isolated incidents has been shown by Woods (1984), who analysed 99 simulated emergency scenarios, using 23 experienced nuclear power plant crews in 8 different events. He categorised operator errors into two groups: (a) state identification problems (misdiagnoses), and (b) execution failures (slips). Whereas half the execution failures were detected by the crews themselves, none of the diagnostic errors were noticed by the operators who made them. They were only discovered by ‘fresh eyes’ (i.e., by some external agent). Nearly three-fourths of all the errors remained undetected. Woods concluded that the most common cause of the failure to correct system state errors was ‘fixation’ on
the part of the operators. Misdiagnoses tended to persist regardless of an accumulation of contradictory evidence.
These observations are very much in keeping with what we know of knowledge-based processing in particular, and of mistakes in general. When the diagnostic hypothesis is incorrect, feedback that is useful for detecting slips is unavailable. There is no discrepancy between action and intention, only between the plan and the true state of affairs.
5. Relative error detection rates
We have now considered a number of studies specifically concerned with error detection. Each examined a different task or activity. In most cases, these studies also differentiated between errors detected at different levels of performance: skill-based, rule-based and knowledge-based. From these data, it is possible to make crude comparisons between the error detection rates associated with (a) various degrees of task difficulty, and (b) different performance levels.
5.1. Detection rates and task complexity
The overall detection rates for these studies (irrespective of error type) is listed below.
Speech errors (Nooteboom/Meringer, 1908): 64 per cent minimum. (Note: This is the correction rate. Detection rate would presumably have been much higher.)
Statistical problem solving (Allwood, 1984): 69 per cent.
Database manipulation (Rizzo et al., 1986): 84 per cent to 92 per cent. (Note: These two values are from the two conditions employed in the study. Only condition 1 was discussed earlier.)
Steel mill production planning (Bagnara et al., 1987): 78 per cent. Simulated nuclear power plant emergencies (Woods, 1984): 38 per cent.
Thus, the detection rates for all but one of these tasks were relatively high, ranging from just below 70 per cent to over 90 per cent. The clear exception was recovery from simulated nuclear power plant emergencies: 38 per cent. Against this relatively low figure, however, should be set the fact that the overall error rate among these operators was extremely low: only 39 recorded errors in 99 test scenarios (though this may say more about the investigators’ ability to detect errors than their actual commission by the operators).
This comparison by itself does not really provide sufficient grounds to infer that error detection rates decline as the task becomes more complex, though the performance of the nuclear power plant operators provides some hint that it does. The one legitimate conclusion to be drawn from these scanty data is that cognitive detection mechanisms succeed in catching most errors, though by no means all.
5.2. Detection rates and error types
Three of the studies (Allwood, Rizzo and colleagues, and Bagnara and colleagues) allowed a direct comparison between the detection rates for the three basic error types. In the Allwood study, the error types could be inferred from his descriptions; in the Italian studies the distinctions were made by the investigators themselves. Averaging over all three studies (with two conditions for Rizzo et al., 1986), we find the relative proportions of the three error types to be 60.7 per cent for skill-based errors, 27.1 per cent for rule-based errors, and 11.3 per cent for knowledge-based errors. The corresponding overall detection rates are 86.1 per cent (SB), 73.2 per cent (RB) and 70.5 per cent (KB). The detailed analyses of these data are shown in Figures 6.3 to 6.5.
On the face of it, the relative incidence of the three error types appears to conflict with the assertion, made in Chapter 3, that performance at the knowledge-based level was intrinsically more error prone than at the other two levels. It should be recalled, however, that this claim was made in relation to the relative opportunities for making these three types of errors, not to their absolute incidence. The latter will always be greater at the SB level than at the RB level, and likewise greater in RB performance than in KB performance. Not only does the SB level predominate in routine, problem-free environments, it is also extensively employed at both the RB and KB levels as well. Similarly, RB performance, albeit in a somewhat fragmented form, will also be used during KB processing. It is thus hardly surprising that SB slips occurred on average more than twice as often as RB mistakes, which in turn were almost three times more frequent than KB mistakes.
Although the averaged detection data from the Swedish and Italian studies (see Figures 6.3 to 6.5) differentiate between the error types with regard to their likelihood of discovery, the range of variation is quite small, from 87 per cent for SB slips to 71 per cent for KB mistakes. The differences between them, however, are considerably more marked when it comes to making an effective correction. Allwood’s data showed that the chances of properly recovering from an SB slip were twice those for an RB mistake and three times better than for a KB mistake. Moreover, whereas the correction rates for SB and RB errors were only slightly less than their respective detection rates, only half of the detected KB mistakes (i.e., the higher-level mathematical errors) were satisfactorily corrected.
Interestingly, one kind of slip—those that Allwood called ‘skip errors’ involving the omission of a necessary step in calculation — were the most resistant of all to detection. None of the 29 skip errors were discovered by the subjects, and only one of them was associated with an evaluative episode. This suggests we should be cautious in claiming that all skill-based errors are readily detected. Certain omissions stand apart as being largely ‘invisible’ to the detection mechanisms.
6. Cognitive processes that impede error detection
Lewis and Norman (Norman, 1984; Lewis & Norman, 1986; Lewis, 1986) describe three processes that can prevent or impede the discovery of one’s own mistakes: relevance bias, partial explanation and the overlap of the mental model and the world.
6.1. Relevance bias
‘Bounded rationality’ means that human problem solvers often have only a ‘keyhole’ view of all the factors that could lead to an adequate solution (see Chapter 2). As a result, they are forced to select evidence from a problem space that is generally too large to permit an exhaustive exploration. The most immediate guide to this selection is likely to be the problem solver’s current hypothesis about what constitutes a likely solution. Another way of looking at confirmation bias, therefore, is to suggest that it is a selective process that favours items relevant to the presently held view. According to Lewis (1986): “If disconfiming evidence is less likely to seem relevant than confirming evidence, the bias can be explained not as a bias toward confirming evidence, but as a bias toward relevant-appearing evidence.”
Figure 6.3. A comparison of the rates of error detection and correction in statistical problem solving for skill-based, rule-based and knowledge-based errors (data from Allwood, 1984).
Figure 6.4. A comparison of detection rates in a database-handling task when subjects were required to find a given item and report its values (data from Rizzo, Bagnara & Visciola, 1986).
Figure 6.5. A comparison of detection rates in a database-handling task in which subjects were required to find three items and change their values (data from Rizzo, Bagnara & Visciola, 1986).
In complex systems like nuclear power plants, this problem is further exacerbated by the fact that the plant is continually changing its state spontaneously, and these alterations may not be known to the operators. In short, troubleshooting in something like a nuclear power plant presents a multiple dynamic problem configuration (see Chapter 3). Under these circumstances, it is natural that operators should cling tenaciously to their hunches. Even when wrong, they confer order upon chaos and offer a principled way of guiding future action. In a situation as rich in information as a control room, confirming evidence is not hard to find. The more entrenched the hypothesis, the more selective becomes the working definition of what is relevant. But those who enter the situation afresh at some later point are not so theory-bound, at least initially. The nakedness of the emperor is readily seen by those who have not come to believe him clothed.
6.2. Partial explanations
Errors are not detected because people are willing to accept only a rough agreement between the actual state of the world and their current theor
y about it. Indeed, the forming of partial explanations is an essential part of the learning process. But what promotes learning can delay error detection.
Lewis (1986) described the case of someone learning a word-processing package who misread the instructions for a typing exercise. Instead of entering text, as the exercise intended, she simply executed a series of cursor movements around an empty screen. The learner was surprised to find that no text appeared, but rationalised her actions by telling herself that the cursor movements were defining a screen area into which text would later be entered. As a result of this partial explanation she wrongly concluded that she had done the exercise correctly.
6.3. Overlap of world and mental model
A person’s mental model of a particular problem space is likely to correspond in large part to the reality, even though it may be wrong in some respects. Lewis and Norman (1986) state: “If one’s model were too far off it would have to be adjusted. As a result of this rough agreement, finding points of disagreement is hard. Most of the things that one does will produce the results predicted by one’s model, which is thus supported.” Having expectations frequently confirmed reduces the sensitivity of the error detection mechanisms.