Human Error

Page 21

by James Reason

The program was run 100 times to simulate the performance of 100 ‘virtual’ subjects. The number of presidential names generated per run ranged between 2 and 17. The mean output was 9.95 presidential names. This corresponds reasonably closely to the level of output of 128 British psychology students, whose mean output was 10.4 names.

This group constituted the main comparison sample. As part of an earlier study (see Chapter 4), they had performed the following tasks:

(a) They were asked to generate exemplars of the category ‘American presidents’.

(b) One week later they were presented with a list of all 39 presidents in random order and asked (i) to indicate whether or not they recognised each name as belonging to an American president, and (ii) to rate each president on a frequency-of-encounter scale, ranging from 1 (hardly ever encountered) to 7 (encountered very frequently indeed).

Additional comparisons were made between the model’s output and that of 159 Yale and Purdue students (Roediger & Crowder, 1976). These U.S. students were given 5 minutes to write down, in any order, the names of all the presidents of the United States whom they could recall.

Overall, these model/subject comparisons yielded the following six measures for each of the 39 presidents: (a) total number of model returns (model), (b) per cent recall, British sample, (c) per cent recall, U.S. sample, (d) per cent recognition, British sample, (e) mean frequency rating, British sample and (f) actual encounters, 10 subjects over 13 weeks. The intercorrelations for these values are shown in Table 5.2.

The highly significant concordance between the model’s output and the responses of the human subjects indicates that the program has captured some essential features of human category generation. It is also consistent with the existence of a common factor controlling a substantial proportion of the variance across all six measures. Several considerations point to presidential ‘salience-in-the-world’, or relative frequency-of-encounter, as the principal basis for this common variance.

The first and most obvious fact is that the only systematic parameter influencing the model’s performance from run to run is the CALCFOE value. This, in turn, is constrained and thus shaped by the ACTFOE values.

Table 5.2. Product-moment correlations between the model’s output and data from U.K. and U.S. students.

Human subject measure

Correlation with model output

U.K. recall data

0.93 **

U.S. recall data

0.78 **

U.K. recognition data

0.87 **

U.K. frequency ratings

0.93 **

** = p < .001

Interestingly, the ACTFOE measure, by which the model was actually ‘tuned’, correlated less well with the program’s performance than did the rated frequencies of encounter obtained from the British psychology students. One interpretation of this finding is that these actual encounters by 10 individuals over a period of 13 weeks in the autumn of 1986 constituted a less reliable mapping of presidential salience than the more global and impressionistic ratings made by the larger undergraduate sample. Certainly these estimates corresponded more closely to the performance of the U.S. students 10 years earlier (r = 0.94, as opposed to 0.324 between ACTFOE and U.S. recall). This suggests that although relative presidential salience differs slightly from one side of the Atlantic to the other (Americans are more knowledgeable about the early presidents than their British counterparts), there is nevertheless a good deal of commonality across the two cultures.

The results for both the model and the humans were consistent with the theoretical assumptions spelled out earlier in the chapter. That is, when a retrieval task is minimally specified, as is the case in category generation, the responses will show maximum influence of the frequency-gambling retrieval heuristic.

8.2. Recognition from limited factual cues

This section describes a computer simulation (again implemented in Prolog by Philip Marsden) of the ways in which people with varying knowledge of the domain of U.S. presidents respond to the task of identifying an appropriate subset of 20 listed presidents on the basis of one to three biographical facts (retrieval cues). The model is built upon the foundations of the category generation program and is designed specifically to respond to section B of the presidential quiz described in Chapter 4.

8.2.1. The normative knowledge base

In the recognition model, the normative knowledge base (NKB) contains a specific number of true facts about each president. These facts are represented as an entity/attribute matrix of the kind shown in Figure 5.5. The entities in this case are the presidents, each of whom has a number of binary attributes relating to selected autobiographical facts. Each presidential frame thus contains the following items of information:

(a) A president’s name.

(b) An actual frequency of encounter (ACTFOE) value, derived from either actual observations or ratings by individuals within the common sub-culture.

(c) True binary values for each of 11 attributes, including whether completed (or did not) term of office, Republican or Democrat, was (or was not) a lawyer, was (or was not) a vice-president), did (or did not) have an overseas war during presidency.

(d) Salience weightings (0-100) for each attribute value. These reflect empirically-determined co-occurrence ratings between a president and a given fact.

Figure 5.5. The basic features of the entity/attribute matrix. This is the knowledge format used in the recognition program. The entities in this case are U.S. presidents (ranked from left to right according to frequency of encounter) and the attributes are biographical facts (ranked from top to bottom according to their salience) relating to one or more of these presidents. Thus, x is the most salient fact about the most frequently encountered president and y is the least salient fact about the least frequently encountered president. The attributes (e.g., Democrat, assassinated, etc.) vary in their diagnostidty, that is, their ability to discriminate between entities (presidents). For example, being a Democrat has relatively low diagnostidty, since it is an attribute shared by approximately half the presidents. Having been assassinated, on the other hand, is highly diagnostic since it is an attribute common to only four presidents.

As before, each run of the model represents the performance of one human subject. A different descriptive knowledge base (DKB) is created on each run. Only the DKB has any influence on the model’s output for that run.

8.2.2. The descriptive knowledge base (DKB)

Each DKB is an incomplete (but not distorted) version of the NKB, representing a single individual’s domain knowledge. Over many runs, the contents of the DKB will approximate to the NKB, but will never achieve the same completeness.

The conversion from NKB to DKB always involves quasi-random adjustments of the ‘tuning’ factors: ACTFOE and attribute salience values. These, in turn, determine a fact’s probability of appearing in the DKB. The basic assumption is: the more salient-in-the-world the president, the more will be known about him.

On each run, the model converts ACTFOEs into calculated frequency-of-encounter values (CALCFOEs). These are random numbers lying between 1 and the ACTFOE for each president. The values differ on each run. Since the presence or absence of facts depends, in part, upon these CALCFOEs, these run-by-run variations will cause changes in the degree of completeness (relative to the NKB) of the DKB.

The detailed steps in the creation of the DKB on each run are as follows:

(a) Program generates a CALCFOE for each presidential frame.

(b) The presidential frame is assigned to one of four knowledge bands on the basis of this CALCFOE value. The higher the band, the more facts will be ‘known’ about each president. Each president has a different probability of band assignment, depending upon his ACTFOE value.

(c) The program makes a quasi-random adjustment to the salience weights (attached to the attribute values) in each presidential frame. These salience weights take a value between 50 per cent and 1
00 per cent of the original NKB weighting (derived empirically). The 50 per cent lower cut-off is set to prevent facts from disappearing too readily.

(d) Each knowledge band has a threshold salience value. If a fact’s salience value exceeds this threshold, it is included in the DKB and could be used in making a recognition match.

8.2.3. The role of working memory

Working memory (WM) controls the retrieval sequence by executing various procedures (search, comparison and decision—see below). It also collects, collates and holds the products of these processes. WM memory engages in three types of activity.

(a) It elicits information from DKB relating to the current calling conditions. The information from this stage is held in three units: calling conditions, negative instances (contraindications) and number of cues present.

(b) WM then executes a more detailed search. It requests presidential names with attributes corresponding to the current retrieval cues. Note that WM has no influence over the actual search, but it can reject search products and/or reactivate the search processes.

(c) On receipt of a suitable candidate, WM begins to accumulate evidence both for and against the returned name. Having done this, it activates the decision-making procedure (see below). This results in either an output or a non-output. In the case of the latter, the search is reactivated.

8.2.4. The search processes

WM has no direct access to the DKB. Interactions between WM and the DKB are mediated by the basic search processes: similarity-matching and frequency-gambling.

Search is invoked by WM following the presentation of a set of calling conditions. Its termination follows acceptance by WM of a search product.

Similarity-matching seeks high-context (well-matched) exemplars of the calling conditions. Frequency-gambling looks for high-frequency exemplars.

8.2.5. Comparative processes

Once a candidate has been returned to WM (along with its CALCFOE and attribute list), the comparative process is initiated. The program accumulates two kinds of evidence: (a) confirmatory evidence—confirming a match between calling conditions and returned attributes, and (b) contradictory evidence—indicating the presence of disconfirming facts.

Each calling condition is examined in turn to see whether it can be matched to a known fact. When all possible evidence has been collated, the decisionmaking phase is initiated.

8.2.6. The decision-making processes

The function of this procedure is to weigh all the evidence and to select the most appropriate strategy for that particular configuration of calling conditions and recovered facts. Four basic strategies are available.

(a) Direct identification: This entails the direct matching of attribute values to calling conditions. Defined formally, it is where the degree of confirmatory evidence equals the number of calling conditions. When this occurs, the presidential name currently being considered is output to the list of exemplars with a confidence value of 100 per cent.

(b) Elimination: This involves the recognition of a mismatch between the attribute values and the calling conditions. It is the converse of direct identification, but comes in two forms: (i) strong elimination, leading to a confident rejection of the president under consideration, and (ii) weak elimination, where the contradictory evidence comprises a single item. Most often, the candidates in this position will be eliminated. Occasionally, however, a president will be returned with a single contradictory fact and a confidence of 0 per cent. The process determining which president will be returned in this manner is a randomiser. A random number is generated in the range of 1 to 100, and the threshold for return is set at greater than or equal to 95.

(c) Partial identification: Here there is partial matching of attribute values to calling conditions with no contradictory evidence. This, too, functions in a strong and a weak form, the major determinant being the degree of uncertainty. When it is low, the president under consideration will be returned with an associated confidence of 50 per cent. A low degree of uncertainty is defined as an identification count equal to one less than the number of calling conditions. The weak form of partial identification comes into play when the confirmatory evidence is greater than or equal to a single item, but not equal to one less than the number of calling conditions. The criterion for selection is a random number (ranging from 1 to 100) greater or equal to 40. When successful, this process returns a weak partial identification with a confidence of 10 per cent.

(d) Guessing: This strategy may be invoked following a major knowledge failure in which no evidence for or against a particular president can be found. Defined formally, it operates when the degree of uncertainty equals the number of calling conditions. Two quite distinct forms of guessing can be used: calculated guessing and frequency-gambling guessing. Calculated guesses produce only ‘unknown’ presidents—presidents of whom the ‘subject’ has no knowledge, but accepts their presidential status because their names appear on the list of presidents. In contrast to this, frequency-gambling guesses produce only high-salience presidents.

8.2.7. Evaluating the model’s output

The model was evaluated against the human data presented in Section 5.5. of Chapter 4. The comparison sample comprised 126 U.K. and U.S. students. These subjects were further subdivided into two samples, matched for knowledge of U.S. presidents: sample A (N=90) and sample B (N=91). The model’s output (selected presidents), over 100 runs, for each of the six presidential quiz questions was correlated with the selections made by the evaluation samples A and B. To provide some criterion of how good this match was, the selections of sample A were correlated with those for sample B.

There was a high degree of correspondence between the model’s output and the responses of samples A and B. The mean correlation coefficients (averaged over all six questions) were 0.85 and 0.87 respectively. This was not significantly less than the degree of correspondence between the two human samples (r = 0.96). These findings suggest that the model has indeed captured some of the fundamental knowledge retrieval processes involved in this particular recognition test. It remains to be seen how well this agreement holds up over other domains of knowledge, both declarative and procedural.

9. Summary and conclusions

This chapter has addressed the following question: What kind of information-handling device could produce both the correct performance and the recurrent error forms characteristic of human beings? Its starting point was the general observation, considered at length in Chapter 4; when cognitive operations are underspecified (at the planning, storage or execution phases), they tend to ‘default’ to contextually appropriate, high-frequency responses. Its basic assumption was that these responses were rooted in the processes by which stored knowledge items are retrieved in response to situational calling conditions.

The first part of the chapter described a notional model of human cognition having two structural components: a resource-limited, but computationally-powerful, serial workspace interacting with an effectively unlimited, parallel knowledge base. This model has three ways by which knowledge structures are brought into play in response to a set of calling conditions (retrieval cues), generated either by the environment or within the workspace. Two of them, similarity-matching (activating knowledge structures on the basis of similarity between calling conditions and stored attributes) and frequency-gambling (resolving conflicts between partially-matched ‘candidates’ in favour of high-frequency items), constitute the computational primitives of the system as a whole and operate automatically within the knowledge base. A third retrieval mechanism, inference, is the exclusive property of the workspace.

A key feature of the model is the assertion that the workspace can only direct knowledge retrieval through the inferential manipulation of the calling conditions. The actual search within the knowledge base is always performed by the similarity and frequency mechanisms. The only part that working memory can play in the retrieval of the products of knowledge structures (actions, images, words, etc.) is
to deliver the calling conditions, to assess whether the search product is appropriate and, if not, to reinstate the search with revised cues. The workspace can reject default search products, but only when it has sufficient resources available to sustain directed inference. Such resources are severely rationed in conditions of high workload and stress. Consequently, the computational primitives will exert a powerful and pervasive influence upon all types of performance, both correct and erroneous.

The model predicts that, when retrieval operations are underspecified either as the result of impoverished domain knowledge or because of incomplete or ambiguous calling conditions, human subjects will manifest increased frequency-gambling (i.e., they will show a greater tendency to select high-frequency candidates). Thus, the degree of expertise and cue specificity are seen as functionally equivalent; diminution of either will lead to increased frequency-gambling.

The second part of the chapter described a suite of computer programs (implemented in Prolog) that (a) embodies the cognitive theory set out above; and (b) attempts to simulate the ways in which people with only a partial knowledge of the domain of U.S. presidents respond to the tasks of generating exemplars, and recognising listed presidents who fit supplied facts. The outputs of the category generation and recognition programs were evaluated against the performance of human subjects and were found to have between 60 to 80 per cent shared variance in their respective selections for the domain of U.S. presidents. These findings provide (as yet domain-specific) support for the basic assumptions of the notional model and indicate that it is possible to create an information-processing device capable of simulating both the correct and incorrect choices of human beings.

‹ Prev Next ›