The Death and Life of the Great American School System

Page 20

by Diane Ravitch

Testing experts know that tests have their limitations, and the testing companies themselves have publicly stated that the results of their exams should never be used as the sole metric by which important decisions are made. When students take the SAT for college admissions, their score on any given day is an estimate of their developed ability. If a student gets a 580 on his verbal SAT, that score is an approximation of his skills and knowledge; if he took the SAT a week later, he might get a 560 or a 600, or a score that is even higher or lower. The College Board reminds students, teachers, guidance counselors, and college admissions officers that the SAT score is not exact and that it might differ on another day or in response to coaching.2

The Committee on Appropriate Test Use of the National Research Council stated in an authoritative report in 1999 that “tests are not perfect” and “a test score is not an exact measure of a student’s knowledge or skills.” Because test scores are not an infallible measure, the committee warned, “an educational decision that will have a major impact on a test taker should not be made solely or automatically on the basis of a single test score.”3 This expert panel could not have dreamed that only two years later, a law would be passed that established harsh consequences not for test takers, but for educators and schools. Or that only ten years later, the president of the United States would urge states and school districts to evaluate teachers on the basis of their students’ test scores.

Psychometricians are less enthusiastic than elected officials about using tests to make consequential judgments, because they know that test scores may vary in unpredictable ways. Year-to-year changes in test scores for individuals or entire classes may be due to random variation. Student performance may be affected by the weather, the student’s state of mind, distractions outside the classroom, or conditions inside the classroom. Tests may also become invalid if too much time is spent preparing students to take them.

Robert Linn of the University of Colorado, a leading psychometrician, maintains that there are many reasons why one school might get better test scores than another. NCLB, he says, assumes that if school A gets better results than school B, it must be due to differences in school quality. But school A may have students who were higher achieving in earlier years than those in the other school. Or school A might have fewer students who are English-language learners or fewer students with disabilities than school B. School A, which is presumably more successful, may have a homogeneous student body, while the less successful school B may have a diverse student body with several subgroups, each of which must meet a proficiency target. Linn concludes, “The fact that the school that has fewer challenges makes AYP [adequate yearly progress] while the school with greater challenges fails to make AYP does not justify the conclusion that the first school is more effective than the second school. The first school might very well fail to make AYP if it had a student body that was comparable to the one in the second school.”4

State testing systems usually test only once each year, which increases the possibility of random variation. It would help, Linn says, to administer tests at the start of the school year and then again at the end of the school year, to identify the effectiveness of the school. Even then, there would be confounding variables: “For example, students at the school with the higher scores on the state assessment might have received more educational support at home than students at school B. The student bodies attending different schools can differ in many ways that are related to performance on tests, including language background, socioeconomic status, and prior achievement.”5 The professional organizations that set the standards for testing—such as the American Psychological Association and the American Educational Research Association—agree that test results reflect not only what happens in school, but also the characteristics of those tested, including such elusive factors as student motivation and parental engagement. Because there are so many variables that cannot be measured, even attempts to match schools by the demographic profile of their student body do not suffice to eliminate random variation.

Given the importance of test scores, it is not surprising that teachers and school officials have devised various ways of gaming the testing system: that is, tricks and shortcuts to achieve the desired results, without improving education. When the purpose of testing is informational and diagnostic, there is no reason for teachers and administrators to alter the results except through improved instruction. But when the purpose of testing is accountability, then teachers and administrators understand that there are real consequences if the scores in their classroom or their school change. If scores go up, they may get a handsome bonus; if they go down, their school will be stigmatized, and they may lose their jobs. The intense pressure generated by demands for accountability leads many educators and school officials to boost the scores in ways that have nothing to do with learning.

The most reprehensible form of gaming the system is plain old-fashioned cheating. There have been many news stories about a teacher or principal who was fired for correcting students’ answers before handing in the tests or leaking the questions in advance to students. In some instances, the cheating is systematic, not idiosyncratic. The Dallas Morning News analyzed statewide scores in Texas on the state’s high-stakes TAKS test—which determines schools’ reputations and teachers’ rewards—and found evidence that tens of thousands of students cheated every year without being detected or punished. The cheating was especially pervasive on eleventh-grade tests, which students must pass to graduate. Most of the cheating uncovered by reporters was in Houston and Dallas and was more common in low-achieving schools, “where the pressure to boost scores is the highest.” Cheating was found in charter schools at almost four times the rate of traditional public schools. In response to the story, Dallas school officials beefed up their school system’s testing security, but Houston school officials slammed the newspaper’s study as an effort “to dismiss the real academic progress in Texas schools.”6

Many ways of gaming the system are not outright illegal, yet they are usually not openly acknowledged. Most principals know that the key to getting higher test scores is to restrict the admission of low-performing students, because they depress the school’s test scores. As choice becomes more common in urban districts, principals of small schools and charter schools—both of which have limited enrollments—may exclude the students who are most difficult to educate. They may do it by requiring an interview with parents of applicants, knowing that the parents of the lowest-performing students are not as likely to show up as the parents of more successful students. They may do it by requiring that students write an essay explaining why they want to attend the school. They may ask for letters of recommendation from the students’ teachers. They may exclude students with poor attendance records, since poor attendance correlates with poor academic performance. They may limit the number of students they admit who are English-language learners or in need of special education. All such requirements tend to eliminate the lowest performers. Whenever there is competition for admission, canny principals have learned how to spot the kids who will diminish their scores and how to exclude them without appearing to do so.7

A lottery for admission tends to eliminate unmotivated students from the pool of applicants because they are less likely to apply. Principals know there is a wide range of ability within every racial and ethnic group, as well as among low-income students. A school can carefully weed out the lowest-performing students and still be able to boast that most or all of its students are African American, Hispanic, and low-income. Education researchers call this skimming or cream-skimming. 8 It is a very effective way for a school to generate high test scores regardless of the quality of its program. Schools of choice may improve their test scores by counseling disruptive students to transfer to another school or flunking low-performing students, who may then decide to leave. Not only do choice schools look better if they exclude laggards, but the traditional public schools look worse, because they must by law accept
those who were not admitted to or were booted out of the choice schools.9

Another way a school can improve its test scores is to reduce the participation of low-performing students on the state tests. Such students may be encouraged to stay home on the day of the big test or may be suspended right before testing day. Sometimes these students are inappropriately assigned to special education to remove them from a subgroup (white or African American or Hispanic or Asian) where their low score might prevent that group from making AYP. Or the principal may assign low-performing students to a special education program that is unavailable in the school, thus ensuring that the student will transfer to a different school. In California, dozens of schools reclassified students by their race or English fluency or disability status, moving them from one category to another to improve their school’s standing under NCLB (if schools have too few students in a specific group, that group’s scores are not reported).10 Presumably, schools that blatantly shift students from one category to another will be caught in the act at some point and sanctioned.

States can cleverly game the system to meet their testing targets by making the test content less challenging or by lowering the cut score (the passing mark) on state tests. State education officials tend to ignore critics who say the test is easier than previous tests, and outsiders seldom have enough information to verify their suspicions. Actually, the test may be equally difficult as in previous years, but if the state education department lowers the cut score, then more students will pass. Typically, the state releases the test scores, the press reports the results, public officials step up to take credit for any gains, and editorials congratulate the schools on their stunning progress. When the technical data are released a few weeks later, few in the media have the technical expertise to ascertain whether the cut scores were lowered; even if testing experts discover that the scores were manipulated, no one pays attention. Also, states may test only a narrow range of the state’s standards, so the test becomes predictable from year to year. All such tactics may produce a steady, even dramatic, increase in scores without improving any student’s education.11

Yet another way to raise the proportion of students who reach “proficiency” is to expand the pool of test takers who are eligible for accommodations, that is, extra time or a dictionary or other special assistance. School officials may increase the number of students who are classified as disabled so they will get extra accommodations. Or state officials may decide that students who were formerly classified as English-language learners should continue to receive extra accommodations even after they passed an English examination and achieved proficiency in English.12

Districts, too, have incentives to game the system. In 2007, Cleveland celebrated improved test scores, but an investigation by the Cleveland Plain Dealer determined that the district, as well as others in Ohio, had “scrubbed” or tossed out the test scores of students who were not continuously enrolled during the school year. Not surprisingly, most of the scores that were scrubbed were from low performers. The newspaper’s analysis found that “from 14 percent to 32 percent of the scores in grades 4 to 10 were eliminated in 2007.”13

In the NCLB era, many states and districts reported outsized test score gains, but the gains were usually not real. The state education department in New York quietly changed the scoring of the state tests in mathematics and English language arts, which produced dramatic gains in the proportion who met state standards each year. Between 2006, when the state introduced a new test, and 2009, the proportion of students in grades three through eight who reached proficiency on the state math test leapt from 28.6 percent to an incredible 63.3 percent in Buffalo, from 30.1 percent to 58.2 percent in Syracuse, and from 57 percent to 81.8 percent in New York City. In the state as a whole, the proportion of students who were proficient jumped in these three years from 65.8 percent to 86.5 percent. To an unknowing public, these breathtaking increases were solid evidence that the schools were getting better and that more students were meeting high standards. But in reality, state officials made it easier to pass the tests. In 2006, a student in seventh grade was required to get 59.6 percent of the points on the test to meet state standards in mathematics; by 2009, a student in that grade needed only 44 percent to be considered proficient.14 Most people would consider a score of 44 percent to be a failing grade, not evidence of proficiency.

A similar phenomenon affected New York’s Regents examinations, on which students must score 65 to receive a high school diploma. Many students would have failed to reach this high bar, but state officials took care of their difficulty by adjusting the cut scores. The public probably assumed that a student who received a 65 had correctly answered 65 percent of the questions. But in algebra, a student would receive a passing score of 65 if he earned only 34.5 percent of the possible points. To win a 65 on the biology Regents, the student needed to earn only 46 percent of the possible points. With this intricate conversion formula, the Regents diploma was turned into a goal that almost every student could reach.15 By making it easier to pass the Regents exams, state officials helped increase the graduation rate.

In 2009, the Civic Committee of the Commercial Club of Chicago released a study demonstrating that the city’s claims of dramatic test score gains were exaggerated. Chicago school officials had boasted that from 2004 to 2008, the proportion of eighth-grade students who met state standards in reading had increased from 55 percent to 76 percent, and in mathematics it had grown from 33 percent to 70 percent (President Obama recited these statistics when he announced the appointment of Arne Duncan, Chicago’s superintendent of schools, as U.S. secretary of education). The study concluded, however, that “these huge increases reflect changes in the tests and testing procedures—not real student improvement.” In 2006 the state hired a new testing company, which introduced a new test and lowered the cut scores (mainly in eighth-grade mathematics), thus producing the illusion of remarkable gains. At the same time that Chicago’s scores soared on the state tests, its scores on NAEP (from 2003 to 2007) were flat. Meanwhile, student performance levels in high school remained disastrously low from 2001 to 2008, suggesting that any modest improvements in the elementary grades disappeared by high school.16

Of all the ways of gaming the system, the most common is test preparation. Most districts, especially urban districts where performance is lowest, relentlessly engage in test-prep activities. Some preparation for test-taking is valuable; reading and studying, learning new vocabulary, and solving math problems are good ways to get ready for the tests. But school districts have invested hundreds of millions of dollars in programs and training materials that teach students the specific types of questions that will appear on the state tests. For weeks or even months before the state test, children are drilled daily in test-taking skills and on questions mirroring those that are likely to appear on the state test.17

The consequence of all this practice is that students may be able to pass the state test, yet unable to pass a test of precisely the same subject for which they did not practice. They master test-taking methods, but not the subject itself. In the new world of accountability, students’ acquisition of the skills and knowledge they need for further education and for the workplace is secondary. What matters most is for the school, the district, and the state to be able to say that more students have reached “proficiency.” This sort of fraud ignores the students’ interests while promoting the interests of adults who take credit for nonexistent improvements.

The National Research Council’s Committee on Appropriate Test Use held that “all students are entitled to sufficient test preparation” so they are familiar with the format of the test, the subject matter to be tested, and appropriate test-taking strategies. Surely students should know what a multiple-choice question is and should not be stumped by the nature of the testing process. (By now, there must be very few children in the United States who are unfamiliar with the nature of standardized testing and test-taking strategies.) The committee cautioned
, however, that the test results might be invalidated “by teaching so narrowly to the objectives of a particular test that scores are raised without actually improving the broader set of academic skills that the test is intended to measure.”18

Daniel Koretz, a psychometrician at Harvard University, contends that coaching students for state tests produces test score inflation and the illusion of progress. He criticizes the common practice of teaching students certain test-taking tricks, such as how to eliminate obviously wrong answers on a multiple-choice question and then making a guess among the remaining choices. It is equally questionable, he says, to teach students “to write in ways that are tailored to the specific scoring rubrics used with a particular test.” When teachers focus too narrowly on the test students are about to take, he writes, whatever they learn is likely to be aligned with that test and is not likely to generalize well to other tests of the same subject or to performance in real life.19

‹ Prev Next ›