Title VII of the 1964 Act specifically did not prohibit the use of employment tests, provided that the tests were not “designed, intended or used” to discriminate against people because of their race, color, religion, sex, or national origin. It said nothing about group differences, although it was clear in 1964 that ability tests would result in disproportionately fewer high scores for at least some of the groups of people protected from discrimination by the act. Some of the act’s proponents believed that some of the group differences in test scores were being used as a pretext for unfair discrimination; for that reason the act included a proviso regarding the tests. The hope was that Title VII would promptly eradicate this unfair use of tests. It was left to the EEOC to come up with the means of doing so.
In 1966, the EEOC formulated the first of a series of guidelines. An employment test, it ruled, had to have a proven power to measure a person’s “ability to perform a particular job or class of jobs.”3 It was not enough, said the guideline, that the test be drawn up by professional testers; it also had to have some practical import—some “job relatedness,” in the evolving jargon of the field. Why this particular guideline? The answer is that staff for the newly launched EEOC had quickly become convinced that some employers were, as anticipated, hiding behind the credentials of professional testers to use ability tests that had little bearing on job performance, and that they were doing so to discriminate against blacks.4 The guideline was an attempt to pierce the veneer of professional respectability and thereby correct this violation of law and principle, as the EEOC saw it.
The criterion of job relatedness did not resolve the uneasiness about testing for the EEOC. Ability testing for employment had, after all, become an issue under Title VII because various groups of people get different average scores. This was the heart of the matter, and new guidelines laid down in 1970 addressed it frontally. For the first time, EEOC guidelines mentioned the issue of disproportionate success of different groups on any given test.5 When a test “adversely affects” (more jargon, along with “disparate impact” or “adverse impact”) members of a protected group, said the new guidelines, it had to be shown not only that the test really did predict job performance but that the prediction was strong enough to make a significant economic difference and that no nondiscriminatory alternative was available. An employer, the reasoning went, may have abandoned older and cruder forms of deliberately discriminatory treatment of workers or job applicants (often called “disparate treatment”) but still be violating the intent of the law by using a needlessly discriminatory test. Disparate impact, in other words, was to be the red flag that set the EEOC in motion.
GRIGGS AND AFTERWARD
Soon after, the U.S. Supreme Court entered the fray. Applicants for certain desirable jobs at the Duke Power Company had been required to have a high school diploma or to earn ability test scores above a cutoff. Fewer blacks were getting over these hurdles than whites; a suit found its way to the Supreme Court. The Court’s decision in Griggs v. Duke Power Co.,6 was instantly recognized as a turning point in the march of affirmative action in the workplace.7 The Supreme Court struck down the use of either the tests or the educational requirement, because the company was unable to satisfy the Court that either a diploma or a high score on a test had any bearing on the jobs the applicants were being hired for.8
Duke Power Co.’s defense was, among other things, that it was trying to raise the general intellectual level of its work force by imposing educational or ability test score requirements. In the Court’s unanimous decision (which reversed contrary opinions in both the federal district and circuit courts), Chief Justice Warren Burger approved unstintingly of the EEOC’s guidelines: Adverse impact placed a burden of proof on employers to show not just that they were not intentionally discriminating against the protected groups but that their testing procedures could be justified economically, and that no other available hiring procedure is equally useful but less discriminatory. Said the Court, good (i.e., nondiscriminatory) intentions do not excuse tests “that operate as ‘built-in headwinds’ for minority groups and are unrelated to measuring job capability.”9 There must be both “business necessity” and a “manifest relationship” between the test and the job, as the EEOC had ruled. Employers were being told to be wary of off-the-shelf tests of general ability; if they wanted to use a test at all, they would be well advised to write them for the specific job at hand and to do their own validation studies.
Ordinarily there is some presumption that people will obey guidelines proposed by a federal agency like EEOC, but not doing so does not violate the law. Indeed, in the legislative record, Congress was assured that the EEOC had no enforcement powers. However, the Court in Griggs said that the EEOC guidelines deserve “great deference,”10 which endowed them with authority verging on the power of law itself. This laying on of the hands of legality is one reason that Griggs has become the landmark case it has turned out to be, for only a defiant or reckless employer would disregard guidelines that the Court embraced so enthusiastically. Beyond that, however, Griggs transformed the very conception of affirmative action in the workplace.
The Court grounded its decision in the 1964 Civil Rights Act itself, although the act said nothing about job relatedness, adverse impact, or the lack of alternative hiring criteria. The act did, however, say that a test must not be “designed, intended or used” to discriminate against people in the protected minority groups. Like the EEOC, the Court considered job relatedness and adverse impact to be reasonable translations of Title VII’s principles into practice. But it can be argued that job relatedness and disparate impact per se go well beyond Title VII, because a test may have disparate impact and not be specifically related to the particular job being filled without the employer’s having designed, intended, or used it for discriminatory purposes.11
The issue hinges on whether each of the three terms—“designed, intended or used”—must signify discriminatory intent (i.e., the guilty mind usually required in cases of liability) or only the first two. The first two terms—“designed, intended”—clearly imply discriminatory intent. Must the third? No, said the Supreme Court, “used” need not. And if it need not, then an employer is violating Title VII even if he is not guilty of discriminatory intent, so long as the test has disparate impact and has not been proved, to the Court’s satisfaction, to be job related.12
After two decades in force, the Court’s interpretation may seem correct to many readers, but both the legislative record and the wording of Title VII belie it.13 Proponents of Title VII, on the floor of Congress and elsewhere, repeatedly assured the opposition that tests administered without discriminatory intent, however adverse their effects, were not being challenged, let alone banned.14 For example, in a memorandum submitted by Senator Clifford Case, one of Title VII’s leading advocates during the legislative debates, we find the following assurance: “No court could read Title VII as requiring an employer to lower or change the occupational qualifications he sets for his employees simply because fewer Negroes than whites are able to meet them.”15 Senator Hubert Humphrey, as we noted in Chapter 20, also assured fellow legislators that Title VII would never be used to impose percentage hiring requirements (disparate impact criteria) on employers.
A year later, in the Equal Employment Opportunity Act of 1972, Congress spoke for the third branch of government, allying itself with the Court and the EEOC. It disapproved of mere “’paper’ credentials” (such as cognitive ability test scores) that are of “questionable value.” It warned that such credentials burdened people who were “socioeconomically or educationally disadvantaged” with “artificial qualifications.”16 When it first enacted Title VII in 1964, Congress on the whole trusted general ability tests to serve the purpose of predicting worker quality; by 1972, Congress, echoing Griggs, had become far more skeptical of the predictive power of those tests and suspicious that they were a pretext for illegal discrimination.17 In the words of one legal scholar, “The central r
ationale of the Court’s decision in Griggs … was based on an assumption that those of different races are inherently equal in ability and intelligence, and on a deep skepticism about the utility of devices traditionally used to select among applicants for employment.”18
With all three branches of government pushing in the same general direction, affirmative action policies evolved toward greater reliance on disparate impact as the touchstone of illegality rather than on discriminatory intent or disparate treatment. As in Griggs, the Supreme Court in 1975, in Albemarle Paper Co. v. Moody,19 considered a case in which an employer used intelligence tests (among other criteria) to select workers for well-paying jobs. Once again, black applicants, who earned lower scores than white applicants, brought suit.20 The Court reaffirmed the general outlines of Griggs, but in filling out details, it provided three steps to follow in proving that an employment test was in violation of Title VII (as amended). First, the Court said, a complaining party must show disparate impact. This involved a statistical proof that those who were hired or promoted on the basis of the test included significantly fewer members of a protected group than random selection from the applicant pool would have produced. Given this proof of disparate impact, the burden of proof shifts to the employer, who must now prove that scores on the test have a proven and vital relationship to the specific job they were hired for. The criterion expressed in Griggs, “business necessity,” was carried forward into Albemarle. If the employer passes this hurdle, the complaining party can offer evidence that the employer could have used a different hiring procedure, one that was as effective in selecting workers but without the disparate impact. If this can be shown, then, the Court ruled, the employer has been shown to have discriminated illegally by failing to have used the alternative procedure.21
Other federal authorities besides the EEOC were monitoring and promoting affirmative action in the workplace. In the mid-1970s, as inconsistencies began to crop up, pressure built up for coordinating as broad a slice of the federal involvement in affirmative action as possible. After some false starts, the Uniform Guidelines on Employee Selection Procedures were adopted in 1978 by EEOC, the Civil Service Commission (later called the Office of Personnel Management), the Department of Justice, the Department of the Treasury, and the Department of Labor.22 At this writing, they are still in force. The Court’s decisions in Griggs and Albemarle set the broader framework for the Uniform Guidelines, but further details were elaborated, in some respects increasing the pressure on employers using tests. For example, the Uniform Guidelines held—in contrast to the Court in Albemarle—that the employer has a responsibility for seeking less discriminatory selection procedures, a rather different matter from giving a complaining party the opportunity to do so, as the Court had decreed.
VALIDATING EMPLOYMENT TESTS
The Uniform Guidelines attempt to define a unified approach to affirmative action in the workplace, but practices still vary, and there continue to be new laws and new interpretations by courts. But they come as close to a policy consensus as anything does. They also reveal the underlying assumptions about the facts. On the matter of test validation, the Guidelines espouse the stringent “business necessity” requirement held in Griggs and Albemarle. They provide detailed requirements for validating tests. Without submerging our readers more deeply in technical minutiae than seems appropriate here, let us say that the Uniform Guidelines lean sharply toward criteria that would be hard and expensive for employers to meet, even when cheaper or easier methods almost certainly would have been more effective.23 General ability tests, readily available and widely standardized, are rarely acceptable to the EEOC or the courts, unless the employer goes through the difficult, if not impossible, and, psychometrically speaking, needless, process of restandardization of an established test. To validate a test, an employer needs a measure of performance. The government typically rejects measures of training performance and supervisor ratings. As Chapter 3 detailed, both training scores and supervisor ratings may be suitable measures of performance, and they are relatively easy to obtain. The measures usually required by the government are all but impossible to obtain, especially for job candidates who are not hired.
Despite an air of rigor and precision in discussing validation, neither the EEOC nor any other branch of government involved in administering affirmative action policies has shown any interest in evaluating just how predictive of worker performance the stringent and costly validation procedures it demands are, or whether there is any gain in predictive power when they are used. The thrust continues to be, as it has been from the beginning, to increase the numbers hired or promoted from the protected groups, based on the underlying assumption that, except for discrimination or the legacy of past discrimination, the protected groups should be equally represented across the occupational spectrum.
DISPARATE IMPACT
According to the Guidelines, an employer that comes under their jurisdiction can expect to be required to validate a test—that is, to prove its business necessity—if there is disparate impact. And, the Guidelines further say, disparate impact is assumed if selecting employees by the test violates the 80 percent rule, explained in Chapter 20. As helpful as it may be to employers and regulators to have a fixed standard for disparate impact, the 80 percent rule is psychometrically unsound because it sets a fixed standard. Given two groups with differing average scores and a cutoff for hiring or promotion, the ratio of those selected from the lower group to those selected from the higher group, given a fair hiring process, shrinks as the cutoff rises.
Suppose that you are an employer faced with two groups that are of equal size in the applicant pool. The higher group averages one standard deviation above the lower on an IQ test, but the distribution of scores for each group is normal and has the same variability. The eighty percent rule fixes the ratio at eighty hired from the lower group (if it is protected by affirmative action) per hundred hired from the higher group. But if you want to establish a minimum IQ of 100 as the cutoff point for hiring workers, only slightly more than thirty applicants from the lower group would be selected for every hundred from the higher. Suppose that you need a work force with above-average IQs, so you raise the cutoff to an IQ score of 110. In that case, a fair hiring process could be expected to select only twenty of the lower group for each hundred selected from the upper group. If you need a work force with a minimum IQ of 120, the ratio drops to about ten from the lower per hundred from the higher. The ratio will continue to shrink indefinitely as the cutoff moves upward. In other words, applying the 80 percent rule has drastically different effects for an employer hiring people for janitorial jobs compared to an employer hiring lawyers or accountants. Even if one is in favor of the concept of avoiding “disparate impact,” the 80 percent rule is an extremely unrealistic way of doing so.
A REVERSAL IN THE AFFIRMATIVE ACTION TREND LINE, OR A BLIP?
The Supreme Court in 1989 backed off from its most demanding requirements for employment testing. In Wards Cove Packing Co., Inc. v. Atonio,24 it softened the obligation on the employer in justifying disparate impact of a test. “Business necessity,” the Court said, is an unreasonably stringent criterion, virtually impossible for most ordinary businesses to meet. The result of so extreme a requirement, warned the Court, would be “a host of evils.”25 It was, the Court now said, enough to show that the test serves legitimate business goals. It looked as if the Duke Power Co.’s defense in Griggs—to improve the general intellectual quality of its employees—would have met this new standard. Soon thereafter, however, Congress retaliated. The Civil Rights Act of 1991 repudiated Wards Cove and returned to the standards of Griggs and Albemarle—to business necessity, job relatedness, and disparate impact as those earlier decisions had defined it. Once again, employers evidently must satisfy a criterion for employment testing that the Court, two years before, judged to be impossibly demanding. The new law is fraught with ambiguity and will doubtless send lawyers, their clients, and courts back to wor
k to figure out what it requires.26 But the best guess is that the trendline had blipped, not reversed.
Notes
Abbreviations
DES. National Center for Education Statistics, Digest of Education Statistics. Published annually, Washington, D.C.: Government Printing Office.
NLSY. National Longitudinal Survey of Youth. Center for Human Resource Research, Ohio State University, Columbus, Ohio.
SAUS. U.S. Bureau of the Census. Statistical Abstract of the United States. Published annually, Washington, D.C.: Government Printing Office. For each cite in the text, we have added the year of theedition and table numbers to the abbreviation; e.g., DES, 19xx, Table xx”
Introduction
1 Galton 1869.
2 Forrest 1974.
3 For a brief history of testing from Galton on, see Herrnstein and Boring 1965.
4 In Introina, civil service examinations that functioned de facto as intelligence tests—though overweighted with pure memory questions—had been in use for more than a thousand years.
5 Spearman 1904.
6 Galton 1888; Stigler 1986.
7 A correlation matrix is the set of all pairs of correlations. For example, in a 20-item test, each item will have 19 unique correlations with the other items, and the total matrix will contain 190 unique correlations (of Item 1 with Item 2, Item 1 with Item 3, etc.).
8 We are glossing over many complexities, including the effects of varying reliabilities for the items or tests. Spearman understood, and took account of, the contribution of reliability variations.
9 Buck v. Bell, 1927.
The Bell Curve: Intelligence and Class Structure in American Life Page 77