USF Libraries
USF Digital Collections

Educational policy analysis archives

MISSING IMAGE

Material Information

Title:
Educational policy analysis archives
Physical Description:
Serial
Language:
English
Creator:
Arizona State University
University of South Florida
Publisher:
Arizona State University
University of South Florida.
Place of Publication:
Tempe, Ariz
Tampa, Fla
Publication Date:

Subjects

Subjects / Keywords:
Education -- Research -- Periodicals   ( lcsh )
Genre:
non-fiction   ( marcgt )
serial   ( sobekcm )

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
usfldc doi - E11-00266
usfldc handle - e11.266
System ID:
SFS0024511:00266


This item is only available as the following downloads:


Full Text
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c20029999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00266
0 245
Educational policy analysis archives.
n Vol. 10, no. 18 (March 28, 2002).
260
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c March 28, 2002
505
High-stakes testing, uncertainty, and student learning / Audrey L. Amrein [and] David C. Berliner.
650
Education
x Research
v Periodicals.
2 710
Arizona State University.
University of South Florida.
1 773
t Education Policy Analysis Archives (EPAA)
4 856
u http://digital.lib.usf.edu/?e11.266


xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 10issue 18series Year mods:caption 20022002Month March3Day 2828mods:originInfo mods:dateIssued iso8601 2002-03-28



PAGE 1

1 of 74 Education Policy Analysis Archives Volume 10 Number 18March 28, 2002ISSN 1068-2341 A peer-reviewed scholarly journal Editor: Gene V Glass College of Education Arizona State University Copyright 2002, the EDUCATION POLICY ANALYSIS ARCHIVES Permission is hereby granted to copy any article if EPAA is credited and copies are not sold. Articles appearing in EPAA are abstracted in the Current Index to Journals in Education by the ERIC Clearinghouse on Assessment and Evaluation and are permanently archived in Resources in Education .High-Stakes Testing, Uncertainty, and Student Learn ing Audrey L. Amrein Arizona State University David C. Berliner Arizona State UniversityCitation: Amrein, A.L. & Berliner, D.C. (2002, Marc h 28). High-stakes testing, uncertainty, and student learning Education Policy Analysis Archives 10 (18). Retrieved [date] from http://epaa.asu.edu/epaa/v10n18/.Related articles: Vol. 11 No. 24Vol. 11 No. 25 Abstract A brief history of high-stakes testing is followed by an analysis of eighteen states with severe consequences attached t o their testing programs. These 18 states were examined to see if t heir high-stakes testing programs were affecting student learning, t he intended outcome of high-stakes testing policies promoted throughout the nation. Scores on the individual tests that states use were not analy zed for evidence of learning. Such scores are easily manipulated throug h test-preparation

PAGE 2

2 of 74programs, narrow curricula focus, exclusion of cert ain students, and so forth. Student learning was measured by means of ad ditional tests covering some of the same domain as each state's ow n high-stakes test. The question asked was whether transfer to these do mains occurs as a function of a state's high-stakes testing program. Four separate standardized and commonly u sed tests that overlap the same domain as state tests were examined: the A CT, SAT, NAEP and AP tests. Archival time series were used to exa mine the effects of each state's high-stakes testing program on each of these different measures of transfer. If scores on the transfer mea sures went up as a function of a state's imposition of a high-stakes t est we considered that evidence of student learning in the domain and supp ort for the belief that the state's high-stakes testing policy was promotin g transfer, as intended. The uncertainty principle is used to inte rpret these data. That principle states "The more important that any quant itative social indicator becomes in social decision-making, the mo re likely it will be to distort and corrupt the social process it is intend ed to monitor." Analyses of these data reveal that if the intended goal of h igh-stakes testing policy is to increase student learning, then that policy i s not working. While a state's high-stakes test may show increased scores, there is little support in these data that such increases are anything but the result of test preparation and/or the exclusion of students from t he testing process. These distortions, we argue, are predicted by the u ncertainty principle. The success of a high-stakes testing policy is whet her it affects student learning, not whether it can increase student score s on a particular test. If student learning is not affected, the validity of a state's test is in question. Evidence from this study of 18 states wit h high-stakes tests is that in all but one analysis, student learning is indete rminate, remains at the same level it was before the policy was implemented or actually goes down when high-stakes testing policies are institut ed. Because clear evidence for increased student learning is not foun d, and because there are numerous reports of unintended consequences ass ociated with high-stakes testing policies (increased drop-out ra tes, teachers' and schools' cheating on exams, teachers' defection fro m the profession, all predicted by the uncertainly principle), it is conc luded that there is need for debate and transformation of current high-stake s testing policies. The authors wish to thank the Rockefeller Foundation for support of the research reported here. The views expressed are those of the authors and do not necessarily represent the opinio ns or policies of the Rockefeller Foundation. This is an era of strong support for public policie s that use high-stakes tests to change the behavior of teachers and students in desirable ways. But the use of high-stakes tests is not new, and their effects are not always desira ble. "Stakes," or the consequences associated with test results, have long been a part of the American scene. For example, early in the 20th century, scores on the recently i nvented standardized tests could, for immigrants, result in entrance to or rejection from the United States of America. In the public schools test scores could uncover talent, pr oviding entrance into programs for the

PAGE 3

3 of 74gifted, or as easily, provide evidence of deficienc ies, leading to placement in vocational tracks or even in homes for the mentally inferior. Test scores could also mean the difference between acceptance into, or rejection fr om, the military. And throughout early twentieth century society, standardized test scores were used to confirm the superiority or inferiority of various races, ethnic groups, and social classes. Used in this way, the consequences of standardized tests insured maintena nce of the status quo along those racial, ethnic and class lines. So, for about a cen tury, significant consequences have been attached to scores on standardized tests.A Recent History of High-stakes TestingIn recent decades, test scores have come to dominat e the discourse about schools and their accomplishments. Families now make important decisions, such as where to live, based on the scores from these tests. This occurs b ecause real estate agents use school test scores to rate neighborhood quality and this a ffects property values. (Note 1) Test scores have been shown to affect housing prices, re sulting in a difference of about $9,000 between homes in grade "A" or grade "B" neig hborhoods. (Note 2) At the national and state levels, test scores are now comm only used to evaluate programs and allocate educational resources. Millions of dollars now hinge on the tested performance of students in educational and social programs.Our current state of faith in and reliance on tests has roots in the launch of Sputnik in 1957. Our (then) economic and political rival, the Soviet Union, beat the United States to space, causing our journalists and politicians t o question American education with extra vigor. At that time, state and federal politi cians became more actively engaged in the conduct of education, including advocacy for th e increased use of tests to assess school learning. (Note 3) The belief that the achievement of students in U.S. schools was falling behind other countries led politicians in the 1970s to instigate a minimum competency testing movement to reform our schools. (Note 4) States began to rely on tests of basic skills to ensure, in theory, that all students would learn at least the minimum needed to be a productive citizen.One of these states was Florida. After some hasty p olicy decisions, Florida implemented a statewide minimum competency test that students w ere required to pass prior to being graduated. Florida's early gains were used as an ex ample of how standards and accountability systems could improve education. How ever, when perceived gains hit a plateau and differential pass rates and increased d ropout rates among ethnic minorities and students from low socioeconomic backgrounds wer e discovered, Florida's testing policy was postponed. (Note 5) In the 1980s, the minimum competency test movement was almost entirely discarded. Beyond what was happening in Florida, suggestions t hat minimum competency tests promoted low standards also raised concerns. In man y schools the content of these tests became the maximum in which students, particularly in urban schools, became competent. (Note 6) It was widely perceived that minimum competency te sts were "dumbing down" the content learned in schools.In 1983, the National Commission on Education relea sed A Nation at Risk, (Note 7) the most influential report on education of the past fe w decades. A Nation at Risk called for

PAGE 4

4 of 74an end to the minimum competency testing movement a nd the beginning of a high-stakes testing movement that would raise the n ation's standards of achievement drastically. Although history has not found the rep ort to be accurate, (Note 8) it argued persuasively that schools in the United States were performing poorly in comparison to other countries and that the United States was in j eopardy of losing its global standing. Citing losses in national and international student test scores, deterioration in school quality, a "diluted" and "diffused" curriculum, and setbacks on other indicators of U.S. superiority, the National Commission on Education t riggered a nationwide panic regarding the weakening condition of the American e ducation system. Despite its lack of scholarly credibility, A Nation at Risk produced massive effects. The National Commission on Education called for more ri gorous standards and accountability mechanisms to bring the United State s out of its purported educational recession. The Commission recommended that states i nstitute high standards to homogenize and improve curricula and rigorous asses sments be conducted to hold schools accountable for meeting those standards. Th e Commission and those it influenced intended to increase what students learn in schools. This report is an investigation of how well that explicitly intended outcome of high-stakes testing programs was achieved. We ask, below, whether incre ases in school learning are actually associated with increases in the use of hi gh-stakes tests? Although it appears to be a simple question, it is very difficult to answe r.The Effects of A Nation at Risk on Testing in AmericaAs a result of A Nation at Risk state policymakers in every state but Iowa develo ped educational standards and every state but Nebraska implemented assessment policies to check those standards. (Note 9) In many states high-stakes, or serious consequence s, were attached to tests in order to hold schools, ad ministrators, teachers, and students accountable for meeting the newly imposed high stan dards. In fixing high-stakes to assessments, policymakers borrowed principles from the business sector and attached incentives to learning and sanctions to poor performance on tests. High performing schools would be rewarded. U nder performing schools would be penalized, and to avoid further penalties, would im prove themselves. Accordingly, students would be motivated to learn, school person nel would be forced to do their jobs, and the condition of education would inevitably imp rove, without much effort and without too great a cost per state. What made sense in theory, gained widespread attention and eventually increased in popularity as a method for school reform.Arguments in Support of High-stakes Tests. At various times over the past years different argu ments have been used to promote high-stakes tests. A summary of these follows: students and teachers need high-stakes tests to kno w what is important to learn and to teach; teachers need to be held accountable through high-s takes tests to motivate them to teach better, particularly to push the laziest ones to work harder; students work harder and learn more when they have to take high-stakes tests; students will be motivated to do their best and sco re well on high-stakes tests; and

PAGE 5

5 of 74thatscoring well on the test will lead to feelings of s uccess, while doing poorly on such tests will lead to increased effort to learn. Supporters of high-stakes testing also assume that the tests: are good measures of the curricula that is taught t o students in our schools; provide a kind of "level playing field," an equal o pportunity for all students to demonstrate their knowledge; and that are good measures of an individual's performance, l ittle affected by differences in students' motivation, emotionality, language, and s ocial status. Finally, the supporters believe that: teachers use test results to help provide better in struction for individual students; administrators use the test results to improve stud ent learning and design better professional development for teachers; and that parents understand high-stakes tests and how to int erpret their children's scores. The validity of these statements in support of high -stakes tests have been examined through both quantitative and qualitative research, and by the commentary of teachers who work in high-stakes testing environments. A rea sonable conclusion from this extensive corpus of work is that these statements a re true only some of the time, or for only a modest percent of the individuals who were s tudied. The research suggests, therefore, that all of these statements are likely to be false a good deal of the time. And in fact, some research studies show exactly the opp osite of the effects anticipated by supporters of high-stakes testing. (Note 10) The Heisenberg Uncertainty Principle Applied to the Social SciencesFor many years the research and policy community ha s accepted a social science version of Heisenberg's Uncertainty Principle. That princip le is The more important that any quantitative social indicator becomes in social dec ision-making, the more likely it will be to distort and corrupt the social process it is intended to monitor (Note 11) When applied to a high-stakes testing environment, this principle warns us that attaching serious personal and educational consequences to pe rformance on tests for schools, administrators, teachers, and students, may have di storting and corrupting effects. The distortions and corruptions that accompany high-sta kes tests make inferences about the meanings of the scores on those tests uncertain. If there is uncertainty about the meaning of a test score, the test may not be valid. Unaware of this ominous warning, supporters of high-stakes testing, particularly politicians, h ave caused high-stakes testing to proliferate. The spread of high-stakes tests throug hout the nation is described next.Current High-stakes Testing PracticesToday, twenty-two states offer schools incentives f or high or improved test scores. (Note 12) Twenty states distribute financial rewards to succ essful schools, and nineteen states distribute financial rewards to improved schools.Punishments are attached to school scores twice as often as rewards, however. Forty-five states hold schools accountable for test scores by publishing school or district report

PAGE 6

6 of 74cards. Twenty-seven of those states hold schools ac countable through rating and ranking mechanisms; fourteen have the power to close, recon stitute, or take over low performing schools; sixteen have the authority to replace teac hers or administrators; and eleven have the authority to revoke a school's accreditation. I n low performing schools, low scores also bring about embarrassment and public ridicule. For administrators, threats of termination and cuts in pay exist, as does the potential for personal bonuses. In Oakland, California, for examp le, city school administrators can receive a 9% increase in pay for good school perfor mance with a potential for an additional 3% increase—1% per increase in reading, math and language arts. (Note 13) For teachers, low average class scores may prevent teachers from receiving salary increases, may influence tenure decisions, and in s ixteen states may be cause for dismissal. Only Texas has linked teacher evaluation s to student or school test results, but more states have plans to do so in the future.High average class scores may also bring about fina ncial bonuses or raises in pay. Eleven states disperse money directly to administra tors or teachers in the most improved schools. For example, California recently released each school's Academic Performance Index (API). This is based almost entirely on Stanf ord 9 test scores. Schools showing the biggest gains were to share $677 million in rewards while low performing schools in which personnel did not raise student achievement s cores were to face punishments. (Note 14) In addition, teachers and administrators in 1,346 California schools that demonstrated the greatest improvements over the pas t 2 years were to share $100 million in bonus rewards, called Certificated Staff Perform ance Incentive Bonuses, ranging from $5,000 to $25,000 per teacher. Although over $550 m illion had already been disbursed to California schools, the distribution of the staf f bonuses was deferred because some teachers who posted gains on the API scale, but fel t they were denied their share of the reward money, filed a lawsuit against the state. (Note 15) The court found in favor of the state.Schools and teachers were not the only targets of r ewards and punishments for test performance. Policy makers also attached serious co nsequences to performance on tests for individual students.Although test scores are often promoted as diagnost ic tools useful for identifying a student's achievement deficits and assets, they are rarely used for such purposes when they emanate from large-scale testing programs. Two major problems are the cause of this. First, test scores are often reported in the summer after students exit each grade and second, there are usually too few items on any one topic or area to be used in a diagnostic way. (Note 16) As a result of these factors, scores on large-scal e assessments are most often used simply to distribute rewards an d sanctions. This contributes to the corruptions and distortions predicted by the social science version of Heisenberg's Uncertainty Principle.The special case of scholarshipsThe distortions and corruptions predicted by the Un certainty Principle find fertile ground for developing when high scores on a test result in special diplomas or scholarships. Attaching scholarships to high performance on state tests is a relatively new concept, yet

PAGE 7

7 of 74six states have already begun granting college scho larships and dispersing funds to students with distinguished performance on tests. (Note 17) Michigan is a perfect example of the corruptions and distortions that occ ur when stakes are high for a quantitative social indicator.The Michigan imbroglio In spring 2000, Michigan implemented its Merit Aw ard Scholarship program in which 42,700 students who pe rformed well on the Michigan Educational Assessment Program high school tests we re rewarded with scholarships of $2,500 or $1,000 to help pay for in-state or out-of -state college tuition, respectively. (Note 18) There is quite a story behind these scholarships, h owever. (Note 19) In 1996, Michigan became the 13th state to sue the nation's leading c igarette manufacturers to recover health care costs encumbered by the state to treat smoking-related diseases developed by Michigan's poor and disadvantaged citizens. The car e and treatment of these citizens placed a financial burden on the states, so they su ed the tobacco companies for financial compensation. Michigan won approximately $384 milli on to recover some of these health care costs and then decided to distribute ap proximately 75% of this money among high school seniors with high test scores as Merit Award Scholarships. The remainder of the money went to health related needs and research more or less unrelated to smoking or disease treatment. Thus, the monies that were aw arded to the state did not go to the victims at the center of the lawsuit—Michigan's poo r and indigent suffering from tobacco related diseases—but went instead to those students who scored the highest on the Michigan Educational Assessment Program high sc hool test. These were Michigan's relatively wealthier students who had the highest p robability of enrolling in college even without these scholarships. (Note 20) Approximately 80% of the test-takers in an affluent Michigan neighborhood earned scholarships while only 6% of the test-takers in De troit earned scholarships. (Note 21) One in three white, one in fourteen African America n, one in five Hispanic, and one in five Native American test takers received scholarsh ips. (Note 22) In addition, from 1982 to 1997, while education spending for needy student s increased 193%, education spending for merit based programs such as the merit scholarships increased by 457% in Michigan. (Note 23) Tests have often been defended because they can di stribute or redistribute resources based on notions of "merit." But too often the testing programs become thinly disquised methods to maintain the sta tus quo and insure that funds stay in the hands of those who need them least.Michigan is now being sued by a coalition that incl udes students, the American Civil Liberties Union of Michigan (ACLU), the Mexican Ame rican Legal Defense and Education Fund (MALDEF), and the National Associati on for the Advancement of Colored people (NAACP). They are arguing that Michi gan is denying students scholarships based on test scores that are highly r elated to race, ethnicity, and educational advantages. Michigan appears to be a st ate where high-stakes testing has had a corrupting influence.The satisfying effects of punishing the slackers Connecting high-stakes tests with rewards for high performance, such as in the exampl e above, is not nearly as prevalent as have been punishments attached to student scores th at are judged to be too low. Punishments are used three times as often as reward s. Policy makers appear to derive satisfaction from the creation of public policies t hat punish those they perceive to be

PAGE 8

8 of 74slackers.Throughout the nation low scores are used to retain students in grade, using the slogan of ending "social promotion." Promotion or retentio n is already contingent on test performance in Louisiana, New Mexico, and North Car olina, while four more states have plans to link promotion to test scores in the next few years. (Note 24) Low scores may also prevent high school students fr om graduating from high school. Whether a student passes or fails high school gradu ation exams – exams that purportedly test a high school student's level of knowledge in core high school subjects – is increasingly being used as the only determinant of whether some students graduate or whether students are entitled to a regular high sch ool diploma or merely a certificate of attendance.In fact, high school graduation exams are the asses sments with the highest, most visible, and most controversial stakes yet. When A Nation at Risk was released, only three states (Note 25) had implemented high school graduation exams, then referred to as minimum competency tests on which students' basic skills were tested. But in A Nation at Risk the commission called for more rigorous examinations on which high school students would be required to demonstrate mastery in order to rece ive high school diplomas. (Note 26) Since then, states have implemented high school gra duation exam policies with greater frequency.Now, almost two decades later, eighteen states (Note 27) have developed and employed high school graduation exams and nine more states (Note 28) have high-school graduation exams underway. The frequency with which high school graduation exams have become key components of states' high-stakes t esting policies has escalated almost linearly over the past twenty-three years and will continue to do so for at least the next six years (see Figure 1). Figure 1. Number of states with high school graduat ion exams 1979–2008 (Note 29)

PAGE 9

9 of 74Who Uses high-stakes Tests?Analyses of these data reveal that high school grad uation exams are: more common in states that allocate less money than the national average per pupil for schooling as compared to the nation. High school graduation exams are found in around 60% of the states in which yearly p er pupil expenditures are lower and in about 45% of the states in which yearly per pupil expenditures are higher than the national average. (Note 30) more likely to be found in states that have more ce ntralized governments, rather than those with more powerful county or city govern ments. Of the states that have more centralized governments, 62% have or have plan s to implement high school graduation exams. Of the states that have less cent ralized governments, only 37% have or have plans to implement high school graduat ion exams. (Note 31) more likely to be found in the highly populated sta tes and states with the largest population growth as compared to the nation. (Note 32) For example, 76% of the country's most highly populated states and only 32% of the country's smallest states have or have plans to implement high school graduation exams. Looking at growth, not just population we find that 76% of the states with the greatest population growth and only 32% of the states with t he lowest population growth from 1990–2000 have or have plans to implement high school graduation exams. (Note 33) most likely to be found in the Southwest and the So uth. High school graduation exams are currently in use in 50% of the southweste rn states and 66% of the southern states. Analyses also suggest that high sc hool graduation exams will become even more common in these regions in the fut ure. By the year 2008, high school graduation exams will be found in 75% of the southwestern and southern states. High school graduation exams will probably continue to be randomly dispersed throughout 50% of the states in the Northeast and l east likely to be found in 33% of the mid-western states. The western states, over the ne xt decade, will have the greatest increase in the number of states with high school g raduation exams by region. While 10% percent of the western states have already impl emented high school graduation exam policies, 50% of these states will have implem ented these exams by the year 2008. (Note 34) More important for understanding high-stakes testin g policy is that high school graduation exams are more likely found in states wi th higher percentages of African Americans and Hispanics and lower percentages of Ca ucasians as compared to the nation. Census Bureau population statistics helped to verify this. (Note 35) Seventy-five percent of the states with a higher percentage of A frican Americans than the nation have high school graduation exams. By 2008 81% of such s tates will have implemented high school graduation exams. Sixty-seven percent of the states with a higher percentage of Hispanics than the nation have high school graduati on exams. By 2008 89% of such states will have implemented high school graduation exams. Conversely, 13% of the states with a higher percentage of Caucasians than the nation have implemented high

PAGE 10

10 of 74school graduation exams. By 2008 29% of such states will have implemented high school graduation exams. In other words, high school graduation exams affect students from racial minority backgrounds in greater proport ions than they do white students If these high-stakes tests are discovered not to have their intended effects, that is, if they do not promote the kinds of transfer of learning and e ducation the nation desires, the mistake will have greater consequences for America' s children of color. Similarly, high school graduation exams disproporti onately affect students from lower socioeconomic backgrounds. High school graduation e xams are more likely to be found in states with the greatest degrees of poverty as c ompared to the nation. Economically disadvantaged students are most often found in the South and the Southwest and least often found in the Northeast and Midwest. As noted, states in the South and the Southwest are most likely to have high-stakes testi ng policies. Further, 69% of the states with child poverty levels greater than the nation h ave or have plans to implement high school graduation exams. Seventy percent of the sta tes with the greatest 1990–1998 increases in the number of children living in pover ty have or have plans to implement such exams. (Note 36) That is, high school graduation exams are more likely to be implemented in states that have lower levels of ach ievement, and the always present correlate of low achievement, poorer students Again, if these high-stakes tests are discovered not to have their intended effects, that is, if they fail to promote transfer of learning and education in its broadest sense, as th e nation desires, the mistake will have greater consequences for America's poorest children Matters of national standards and implementation of high-stakes tests are less likely to be of concern for the reform of relatively elite sc hools, (Note 37) that are more often found in regions other than the South and Southwest Perhaps this helps to explain the more extensive presence of high-stakes tests in the South and Southwest. This seems a reasonable hypothesis especially when one purpose o f high-stakes testing is to raise student achievement levels in educational environme nts perceived to be failing. It should be noted, however, that there is consider able variability in these data. All states with high rates of children in poverty have not ado pted high-stakes testing policies while some states with lower rates of children in poverty have. In states with higher or lower levels of poverty, however, schools that exist with in poor rural and urban environments are still more frequently targeted by these policie s. Although legislators promote these policies, claiming high standards and accountabilit y for all, schools that already perform well on tests are not the targets for these policie s; poor, urban, under performing schools are. But, for different reasons, support for high-s takes testing receives support in both high and low achieving school districts. In success ful schools and districts, high-stakes testing policies are acceptable because the scores on those tests merely confirm the expectations of the community. Thus, in successful communities, the tests pose little threat and also have little incentive value. (Note 38) In poorer performing schools high-stakes testing policies often enjoy popular su pport because, it is thought, at the very least, that these tests will raise standards in a s tate's worst schools. (Note 39) But if high-stakes testing policies do not promote learning, that is, if they do not appear to be leading to education in the most profound sen se of that term, then the tests will not turn out to have any use in successful communities and schools, nor will they improve the schools attended by poor children and ethnic mi norities. If, in addition, the tests have unintended consequences such as narrowing the curri culum taught, increasing drop out rates and contributing to higher rates of retention in grade, they would not be good for

PAGE 11

11 of 74any community. But these unintended negative conseq uences would have a greater impact on the families and neighborhoods of poor an d minority students. Faith in testing The effects of high-stakes tests on students is w ell worth pursuing since it is unquestionably a "bull market" for testing. (Note 40) The faith state legislators have put into tests, albeit blind, has increased dramati cally over the past twenty years. (Note 41) The United States tests its children more than any other industrialized nation, has done so for well over thirty years, (Note 42) and will continue to depend on even more tests as it attempts to improve its schools. At the national level, President Bush has been unquestionably successful in passing his "No Child Left Behind" plan that calls for even more testing – annual high-stakes testing of every child in the United States in grades 3 through 8 in math and reading. Republicans and Demo crats alike have endorsed high-stakes testing policies for the nation making this President Bush's only educational proposal that has claimed bipartisan support. (Note 43) According to the President and other proponents, annual testing of every child and the attachment of penalties and rewards to their performance on those tests, will u nequivocally reform education. Despite the optimism, the jury is still out on this issue. Many researchers, teachers and social critics conte nd that high-stakes testing policies have worsened the quality of our schools and have c reated negative effects that severely outweigh the few, if any, positive benefits associa ted with high-stakes testing policies. Because testing programs and their effects change a ll the time, reinterpretations of the research that bears on this issue will be needed ev ery few years. But at this time, in contradiction to all the rhetoric, the research inf orms us that states that have implemented high-stakes testing policies have fared worse on independent measures of academic achievement than have states with no or lo w stakes testing programs. (Note 44) The research also informs us that high-stakes test ing policies have had a disproportionate negative impact on students from r acial minority and low socioeconomic backgrounds. (Note 45) In Arizona, for example, officials reported that in 1999 students in poor and high-minority school districts scored lower than mi ddle-class and wealthy students on Arizona's high-stakes high school graduation test, the AIMS (Arizona's Instrument to Measure Standards). Ninety-seven percent of African Americans, Hispanics, and Native Americans failed the math section of the AIMS, a si gnificantly greater proportion of failures than occurred in the white community, whos e students also failed the test in great numbers. (Note 46) Due to the high failure rates for different groups of students, as well as various psychometric problems, this test ha d to be postponed. In Louisiana parents requested that the office for civil rights investigate why nearly half the children in school districts with the greatest numbers of poor and minority children had failed Louisiana's test, after taking it for a second time. (Note 47) In Texas, in 1997, only one out of every two African American, Mexican American, and economically disadvantaged sophomores passed each section of Tex as' high-stakes test the TAAS – Texas' Assessment of Academic Skills. In contrast, four out of every five white sophomores passed. (Note 48) In Georgia, two out of every three low-income stud ents failed the math, English, and reading sections of G eorgia's competency tests. No students from well-to-do counties failed any of the tests an d more than half exceeded standards. (Note 49) The pattern of failing scores in these states are q uite similar to the failure rates in other

PAGE 12

12 of 74states with high school graduation exams and are il lustrative of the achievement gap between wealthy, mostly white school districts and poor, mostly minority school districts. (Note 50) It appears that a major cause of these gaps is tha t high-stakes standardized tests may be testing poor students on material they have not had a sufficient opportunity to learn.Education, Learning, and Training: Three Goals of S choolingIn this report we look at just one of the distortin g and corrupting possibilities suggested by Heisenberg's Uncertainty Principle applied to th e testing movement, namely, that training rather than learning or general education is taking place in communities that rely on high-stakes tests to reform their schools. As will be become clearer, if we have doubt about the meaning of a test score, we must be skeptical about the validity of the test. Our interest in these distinctions between training learning and education stems from the many anecdotes and research reports we read tha t document the narrowing of the curriculum and the inordinate amount of time spent in drill as a form of test preparation, wherever high-stakes tests are used. The former pre sident of the American Association of School Administrators, speaking also as the Supe rintendent of one of the highest achieving school districts in America, notes that: The issue of teaching to these tests has become a m ajor concern to parents and educators. A real danger exists in that the tes t will become the curriculum and that instruction will be narrow and focused on facts. ... Teachers believe they spend an inordinate amoun t of time on drills leading to the memorization of facts rather than sp ending time on problem solving and the development of critical and analyti cal thinking skills. Teachers at the grade levels at which the test is g iven are particularly vulnerable to the pressure of teaching to the test.Rather than a push for higher standards, [Virginia' s high-stakes] tests may be driving the system toward mediocrity. The classr oom adaptations of "Trivial Pursuit" and "Do You Want to be a Milliona ire?" may well result in higher scores on these standardized tests, but will students have acquired the breadth and knowledge to do well on other quality b enchmarks, such as the SAT and Advanced Placement exams? (Note 51) This is our concern as well. Any narrowing of the c urriculum, along with the confusion of training to pass a test with broader notions of learning and education are especially problematic side effects of high-stakes testing for low-income students. The poor, more than their advantaged peers, need not only the skil ls that training provides but need the more important benefits of learning and education t hat allow for full economic and social integration in our society.To understand the design of this study and to defen d the measures used for our inquiry requires a clarification of the distinctions betwee n the related concepts of education learning (particularly school learning and the concept of transfer of learning ), and training For most citizens it is education (the broadest a nd most difficult to define of the concepts) that is the goal of schooling. Learni ng is the process through which

PAGE 13

13 of 74education is achieved. But merely demonstrating acq uisition of some factual or procedural knowledge is not the primary goal of sch ool learning. That is merely a proximal goal. The proper goal of school learning is both more dis tal and more difficult to assess. The proper goal of school learning is transfer of learn ing, that is, the application or use of what is learned in one domain or context to that of another domain or context. School learning in the service of education focuses delibe rately on the goal of broad (or far) transfer. School instruction that can be characteri zed as training is ordinarily a narrow form of learning, where transfer of learning is mea sured on tasks that are highly similar to those used in the training. Broad or far measure s of transfer, the appropriate goal of school learning, are different from the measures ty pically used to assess the outcomes of training. More concretely, training in holding a pencil, or o f doing two-column addition with regrouping, or memorizing the names of the presiden ts, is expected to yield just that. After training to do those things is completed stud ents should be able to write in pencil, add columns of numbers, and name the presidents. Th e assessments used to measure their newly acquired knowledge are simple and direc t. On the other hand, learning to write descriptive paragraphs, arguing about how num bers can be decomposed, and engaging in civic activities should result in bette r writing, mathematics and citizenship. To inquire whether that is indeed the case, much br oader and more distal measures of transfer are required and these kinds of outcomes o f education are much harder to measure. Although enormously difficult to define, almost all citizens agree that school learning is designed to produce an "educated" person. Howard Ga rdner provides one voice for these aspirations by claiming that students become educat ed by probing, in sufficient depth, a relatively small set of examples from the disciplin es. In Gardner's curriculum teachers lead students to think and act in the manner of sci entists, mathematicians, artists, or historians. Gardner advocates deep and serious stud y of a limited set of subject matter to provide students with opportunities to deal serious ly with the genuine and profound ideas of humankind. I believe that three very important concerns should animate education; these concerns have names and histories that extend far b ack into the past. There is the realm of truth —and its underside, what is false or indeterminable There is the realm of beauty –– and its absence in experiences or objects that are ugly or kitschy. And there is the realm of morality –– what we consider to be good, and what we consider to be evi l. (Note 52) Gardner's "educated" student thinks like those in t he disciplines because the students learn the forms of argument and proof that are appr opriate to a discipline. Thus tutored, students are able to analyze the fundamental ideas and problems that all humans struggle with. It is a discussion and project-oriented curri culum, with minimum concern for test preparation as a separate activity. Gardener's disc ipline-based curriculum is explicitly concerned with transfer to a wide array of human en deavors. Despite the difficulty in obtaining evidence of this kind of transfer of lear ning, there is ample support for this kind of curriculum. Earl Shorris recently demonstra ted the effect of this kind of curriculum with desperately poor people who were gi ven the chance to study the disciplines with excellent and caring teachers. (Note 53) The experience of studying art,

PAGE 14

14 of 74music, moral philosophy, logic, and so forth, trans formed the lives of these impoverished young adults.Minnesota Senator Paul Wellstone also understands t hat school learning is not an end in itself. For him, our educational system should be d esigned to produce an "educated" person, someone for whom transfer of what is learne d in school is possible: Education is, among other things, a process of shap ing the moral imagination, character, skills and intellect of our children, of inviting them into the great conversation of our moral, cultural and intellectual life, and of giving them the resources to prepare to fully parti cipate in the life of the nation and of the world." (Note 54) Senator Wellstone, however, sees a problem with thi s goal: Today in education there is a threat afoot,...: the threat of high-stakes testing being grossly abused in the name of greater account ability, and almost always to the serious detriment of our children." (Note 55) The Senator, like many others, recognizes the possi ble distorting and corrupting effects of high-stakes testing. He worries about compromisi ng the education of our students, because of "a growing set of classroom practices in which test-prep activities are usurping a substantive curriculum." (Note 56) The Senator is concerned that test preparation for the assessment of narrow curricular goals will turn out to be more like training than like the kind of learning that promot es transfer. And if that were to be the case, the test instruments themselves are likely to be narrow and near measures of transfer, as befits training programs. If this scen ario were to occur, then broad and far measures of transfer, the indicators, we hope, of t he educated person that we hold as our ideal, might not become part of the ways in which w e assess what is being learned in our schools. To reiterate: education (in some broad and hard-todefine way) is our goal. School learning is the means to accomplish that goal. But, as a recent National Academy of Science/National Research Council report on school learning makes clear, schooling that too closely resembles training, as in preparation f or testing, cannot accomplish the task the nation has set for itself, namely, the developm ent of adaptive and educated citizens for this new millennium. (Note 57) Of course, school learning that promotes transfer is only a necessary, and not a sufficient condition, t o bring forth an educated person. The issue, however, is whether high-stakes tests, with their potential for distorting and corrupting classroom life, can overcome the difficu lties inherent in such systems, and thereby bring about the transformation in student a chievements sought by all concerned with public education. One of the nation's leading experts on measurement has thought about this issue: As someone who has spent his entire career doing re search, writing, and thinking about educational testing and assessment i ssues, I would like to conclude by summarizing a compelling case showing t hat the major uses of tests for student and school accountability during the past 50 years have improved education and student learning in dramatic ways. Unfortunately, I cannot. Instead, I am led to concl ude that in most cases the

PAGE 15

15 of 74instruments and technology have not been up to the demands that have been placed on them by high-stakes accountability. Asses sment systems that are useful monitors lose much of their dependability an d credibility for that purpose when high-stakes are attached to them. The unintended negative effects of high-stakes accountability uses often ou tweigh the intended positive effects." (Note 58) Transfer of learning and test validity This report looks at one of the effects claimed f or high-stakes testing: that states with high-stakes t ests will show evidence that some kind of broad learning, rather than just some kind of na rrow training, has taken place. It is well known that test preparation, meticulous alignm ent of the curriculum with the test, as well as rewards and sanctions for students and o ther school personnel, will almost always result in gains on whatever instrument is us ed by the state to assess its schools. Scores on almost all assessment instruments are qui te likely to go up as school administrators and teachers train students to do well on tests such as the all-purpo se widely-used SAT-9s in California, or the customized Texas Assessment of Academic Skills (TAAS), the Arizona Instrument to Measure St andards (AIMS), or the Massachusetts Comprehensive Assessment System (MCAS ). We ask a more important question than "Do scores rise on the high-stakes te sts?" We ask whether there is evidence of student learning, beyond the training that prepared them for the test s they take, in those states that depend on high-stakes tests t o improve student achievement? We seek to know whether we are getting closer to th e ideal we all hold of a broadly educated student, or whether we are instead develop ing students that are much more narrowly trained to be good test takers. It is impo rtant to note that this is not just a question of how well the nation is reaching its int ended outcomes, it is also an equally important psychometric question about the validity of the tests, as well. The National Research Council cautions that "An ass essment should provide representative coverage of the content and processe s of the domain being tested, so that the score is a valid measure of the student's knowl edge of the broader [domain], not just the particular sample of items on the test." (Note 59) So the score a student obtains on a high-stakes tes t must be an indicator of transfer or generalizability or that test is not valid. The pro blem is that: tests almost always are made up of fewer items than the number actually needed to thoroughly assess the entire domain that is of inte rest; 1. testing time, as interminable as it may seem to the students, is rarely enough to adequately sample all that is to be learned from a domain; and 2. teachers may narrow what is taught in the domain so that the scores on the test will be higher, though by doing this, the scores ar e then invalid since they no longer reflect what the student knows of the entire domain. 3. These three factors work against having high-stakes test scores accurately reflect students' domain scores in areas such as reading, w riting, science, etc. Because of this constant threat of invalidity, attaching high-stake s to achievement tests of this type may be impossible to do sensibly. (Note 60) How might this show up in practice? Unfortunately t here is already research evidence that reading and writing scores in Texas may not re flect the domains that are really of interest to us. The Heisenberg Uncertainty Principl e applied to assessment seems may be

PAGE 16

16 of 74at work distorting and corrupting the Texas system. The ensuing uncertainty about the meaning of the test scores in Texas requires skepti cism about whether that state obtained valid indicators of the domain scores that are real ly of interest. That is, we have no assurance that the performance on the test indicate s what it is supposed to, namely, transfer or generalizability of the performance ass essed to the domain that is of interest to us. For example, ... high school teachers report that although pract ice tests and classroom drills have raised the rate of passing for the read ing section of the TAAS at their school, many of their students are unable to use those same skills for actual reading. These students are passing the TAAS reading section by being able to select among answers given. But they are not able to read assignments, to make meaning of literature, to comp lete reading assignments outside of class, or to connect reading assignments to other parts of the course such as discussion and writing.Middle school teachers report that the TAAS emphasi s on reading short passages, then selecting answers to questions based on those short passages, has made it very difficult for students to handle a sustained reading assignment. After children spend several years in c lasses where "reading" assignments were increasingly TAAS practice materia ls, the middle school teachers in more than one district reported that [s tudents] were unable to read a novel even two years below grade level. (Note 61) A similar phenomenon exists in testing writing, whe re a single writing format is taught—the five paragraph persuasive essay. Each pa ragraph has exactly five sentences: a topic sentence, three supporting sentences, and a concluding sentence much like the introductory sentence. The teachers call this "TAAS writing," as opposed to "real writing." Teachers of writing who work with their students on developing ideas, on finding their voice as writers, and on organizing p apers in ways appropriate to both the ideas and the papers' intended audience find themselves in conflict with this prescriptive format. The format subordinates ideas to form, sets a single form out as "the essay," and pr oduces predictably, rote writing. Writing as it relates to thinking, to lang uage development and fluency, to understanding one's audience, to enrich ing one's vocabulary, and to developing ideas has been replaced by TAAS writi ng to this format. (Note 62) California also has well documented instances of th is. The curriculum was so narrowed to reflect the high-stakes SAT 9 exam, and the teac hers under such pressure to teach just what is on the test, that they voluntarily felt obl iged to add a half hour a day of unpaid teaching time to the school schedule. As one teache r said: This year [we] ... extended our day a half hour mor e. And this is exclusively to do science and social studies. ... We think it's very important for our students to learn other subjects besides Open Court and math ... because in upper grades, their literature, all that is based o n social studies, and science and things like that. And if they don't get that ba se from the beginning [in] 1st [and] 2nd grade, they're going to have a very h ard time understanding

PAGE 17

17 of 74the literature in upper grades .... There is no roo m for social studies, science. So that's when we decided to extend our day a half hour .... But this is a time for us. With that half hour, we can teach what ever we want, and especially in social studies and science and stuff, and not have to worry about, "OK, this is what we have to do." It's our o wn time, and we pick what we want to do. (Interview, 2/19/01) (Note 63) In this school the stress to teach to the test is s o great that some teachers violate their contract and take an hourly cut in pay in order to teach as their professional ethics demand of them. Such action by these teachers—in th e face of serious opposition by some of their colleagues–– is a potent indicator of how great the pressure in California is to narrow the curriculum and relentlessly prepare s tudents for the high-stakes test. The paradox is, that by doing these things, the teacher s actually invalidate the very tests on which they work so hard to do well. It is not often pointed out that the harder teachers work to directly prepare students for a high-stakes test, the less likely the test will be valid for the purposes it was intended.Test preparation associated with high-stakes testin g becomes a source of invalidity if students had differential test preparation—as often happens in the case of rich and poor students who take the SAT for college entrance. But even if all the students had intensive test preparation the potential for invali dity exists because the scores on the test may then no longer represent the broader domain of knowledge for which the test score was supposed to be an indicator. Under either of th ese circumstances, where there is differential preparation for the tests by different groups of students, or intensive test preparation by all the students, there is still a way to make a disti nction between training effects and the broader more desirable learning eff ects. That distinction can be made by using transfer measures, that is, other measures of the same domain as the high-stakes test but where no intensive test preparation occurr ed. The scores of students on tests of the same or similar domains as those measured by th e high-stakes test can help to answer the question about whether learning in the b road domain of knowledge is taking place, as intended, or whether a narrow form of lea rning is all that occurs from the test preparation activities. If scores on these other te sts rise along with the scores on the state tests then genuine learning would appear to be taki ng place. The claim that transfer within the domain is occurring can then be defended and support will have been garnered for the high-stakes testing programs now s weeping the country. We will now examine data that help to answer these questions ab out whether broad-based learning or narrow forms of training are occurring.Design of the StudyThe purpose of this study is to inquire whether the high-stakes testing programs promote the transfer of learning that they are intended to foster. A second report in this series inquires if there have been negative side-effects o f high-stakes testing for economically disadvantaged and ethnic minority students (see "Th e Unintended Consequences of high-stakes Testing by A. L. Amrein & D. C. Berline r, forthcoming, at http://www.edpolicyreports.org/ ). The sample of states used to assess the intended and unintended effects of high-stakes testing are the e ighteen states that have the most severe consequences, that is, the highest stakes associate d with their K–12 testing policies: Alabama, Florida, Georgia, Indiana, Louisiana, Mary land, Minnesota, Mississippi, Nevada, New Jersey, New Mexico, New York, North Car olina, Ohio, South Carolina,

PAGE 18

18 of 74 Tennessee, Texas, and Virginia. Table 1 describes t he stakes that exist in each of these states at this time.Table 1 Consequences/"Stakes" in K–12 Testing Policies in S tates that Have Developed Tests with the Highest Stakes (Note 64) States Total Stakes Grad. examaGrade prom. exambPublic report cardscId. low perform.d$ awards to schoolse$ awards to stafffState may close lowperform.gState may replace staffhStudents may enroll elsewherei$ awards to studentsjAlabama6X XXX XX Florida6X XXXX X Georgia5X2004 (Note 65) XXXX2004 Indiana6X XXX X X Louisiana7XX (Note 66) XX XXX Maryland6X X XX XX Minnesota2X X Mississippi3X XX2003 2003 Nevada6X XX XX X New Jersey4X XXX New Mexico 7XX (Note 67) XXX XX New York5X XX XX North Carolina 8XX (Note 68) XXXXXX (Note 69) Ohio6X2002 (Note XXXX X

PAGE 19

19 of 74 70) South Carolina 6X2002 (Note 71) XXX XX Tennessee6X XXXXX Texas8X2003 (Note 72) XXXXXX (Note 73) X Virginia4X XX X aGraduation contingent on high school grad. exam. bGrade promotion contingent on exam. cState publishes annual school or district report ca rds. dState rates or identifies low performing schools ac cording to whether they meet state standards or improve each year. eMonetary awards given to high performing or improvi ng schools. fMonetary awards can be used for "staff" bonuses. gState has the authority to close, reconstitute, rev oke a school's accred. or takeover low performing schools. hState has the authority to replace school personnel due to low test scores. iState permits students in failing schools to enroll elsewhere. jMonetary awards or scholarships for inor out of s tate college tuition are given to high performing students.These states have not only the most severe conseque nces written into their K–12 testing policies but lead the nation in incidences of schoo l closures, school interventions, state takeovers, teacher/administrator dismissals, etc., and this has occurred, at least in part, because of low test scores. (Note 74) Further, these states have the most stringent K–8 promotion/retention policies and high school gradua tion exam policies. They are the only states in which students are being retained in grade because of failing state tests and in which high school students are being denied regu lar high school diplomas, or are simply not graduating, because they have not passed the state's high school graduation exam. These data on denial of high school diplomas are presented in Table 2.Table 2 Rates at Which Students Did Not Graduate or Receive a High School Diploma Due to Failing the State High School Gradua tion Exam (Note 75) State (Note 76) Grade in which students first take the exam Percent of students who did notgraduate or receive a regularhigh school diploma because they Year

PAGE 20

20 of 74 did not meet the graduationrequirement (Note 77) Alabama*105.5%2001Florida*115.5%1999Georgia*1112%2001Indiana*102%2000Louisiana10 & 114%2001Maryland64%2000Minnesota82%2001Mississippi*11n/a (Note 78) n/a Nevada113%2001New Jersey116%2001New Mexico*10n/an/aNew Yorkn/a (Note 79) 10%2000 North Carolina* 9 (Note 80) 7%2000 Ohio82%2000South Carolina108%1999Tennessee92.5%2001Texas102%2001Virginia*60.5%2001 The effects of high-stakes tests on learning were measured by examining indicators of student learning, academic accomplishment and achie vement other than the tests associated with high-stakes. These other indicators of student learning serve as the transfer measures that can answer our question abou t whether high-stakes tests show merely training effects, or show transfer of learni ng effects, as well. The four different measures we used to assess transfer in each of the states with the highest stakes were: the ACT, administered by the American College Testi ng program; 1. the SAT, the Scholastic Achievement Test, administe red by the College Board; 2. the NAEP, the National Assessment of Educational Pr ogress, under the direction of the National Center for Education Statistics and the National Assessment Governing Board; and 3. the AP exams, the Advanced Placement examination sc ores, administered by the College Board. 4. In each state, for each test, participation rates i n the testing programs were also examined since these vary from state-to-state and i nfluence the interpretation of the

PAGE 21

21 of 74scores a state might attain.Transfer measures to assess the effects of high-sta kes tests As noted above, psychometricians teach us that one facet of validit y is that the scores on a test are indicators of performance in the domain from which the test items are drawn. Thus, the score a student gets on a ten-item test of algebra, or on their driving test, ought to provide information about how that student would sc ore on any of the millions of problems we could have chosen from the domain of al gebra, or on how that student might drive in innumerable traffic situations. The score on the short classroom assessment, or on the test of driving performance, is actually an indicator of the students' ability to transfer what they have demonstrated tha t they have learned to the other items and traffic situations that are similar to those on the assessment. In a sense, then, we don't really care much about the score that was obt ained on either test. What we really want to know is whether that student can do algebra problems or drive well in traffic. So we are interested in the score on the tests the stu dent actually took only in so far as those scores represent what they know or can do in the do main in which we are interested. This study seeks to clarify the relationship betwee n the score obtained on a high-stakes test and the domain knowledge that the test score r epresents. If, as in some states, scores on the state test go up, it is proper to ask whether the scores are also going up on other measures of the same dom ain. That is precisely what a gain score on a state assessment should mean. Gain score s should be the indicators of increased competency in the domain that is assessed by the tests, and that is why transfer measures that assess the same domain are needed. (Note 81) If the high-stakes testing of students really induc es teachers to upgrade curricula and instruction or leads students to study harder or be tter, then scores should also increase on other independent assessments. (Note 82) So we used the ACT, SAT, NAEP and AP exams as the other independent assessments, as meas ures of transfer. We are not alone in using these four measures to assess transfer of learning. For example, one analyst of the Texas high-stakes program believes: "If Texas-s tyle systemic reform is working as advertised, then the robust achievement gains that TAAS reports should also be showing up on other achievement tests such as the National Assessment of Educational Progress (NAEP), Advanced Placement exams and tests for coll ege admission." (Note 83) In addition, the RAND Corporation recently used thi s same logic to investigate the validity of impressive gains on Kentucky's high-sta kes tests. The researchers compared the students' performance on Kentucky's state test with their performance on comparable tests such as the NAEP and the ACT. Gains on the st ate test did not match gains on the NAEP or ACT tests. They concluded the Kentucky stat e test scores were seemingly inflated and were not a meaningful indicator of inc reased student learning in Kentucky. (Note 84) In assessing the effects of testing in Texas, other RAND researchers noted "Evidence regarding the validity of score gains on the TAAS c an be obtained by investigating the degree to which these gains are also present on oth er measures of these same skills." (Note 85) Because some test data from the states with high-st akes tests do not show evidence of learning on some of the transfer measures, journali st Peter Schrag noted that "...the unimpressive scores on other tests raise unavoidabl e questions about what the numbers

PAGE 22

22 of 74really mean [on the high-stakes tests] and about th e cost of their achievement." (Note 86) The National Research Council also supports transfe r measures of the type we use by relying on such data in their own analysis. They no te, with dismay, that "There is some evidence to indicate that improved scores on one te st may not actually carry over when a new test of the same knowledge and skills is introd uced." (Note 87) Sampling concerns In each state the ACT and SAT tests are designed to measure the achievements of various percentages of the 60–70 pe rcent of the total high school students in a state who intend to go to college. Wi thin each state these tests probably attract a broad sample of students intending to go to college, while the AP tests are probably given to a more restricted and higher achi eving sample of students. But in all three cases the samples are not representative of the state's high school graduate s. However, these are all high-stakes tests for the st udents, with each test influencing their future. Thus, their motivation to do well on the st ate's high-stakes test and these other indicators of achievement is likely to be similar. This leads to a conservative test of transfer of learning, because it ought to be easier to find indicators of transfer, if it occurs, among these generally higher ability, more motivated students, rather than in a sample that included all the students in a state. Motivation to achieve well may be diminished in the case of the NAEP because no stakes are attached to those tests. But the NAEP st ate data is obtained from a random sample of the states' schools, and thus may provide the most representative sample among the four measures of transfer of learning we use. Nevertheless, even with NAEP there is a problem. At each randomly selected schoo l it is the local school personnel who decide if individual students will participate in N AEP testing. As will become clear later, sometimes the participation rates in NAEP te sting seem suspect, leading to concerns about the appropriateness of the NAEP samp le, as well. In each high-stakes state, from the year in which t he first graduating class was required to pass a high school graduation examination, we as ked: What happened to achievement in the domains assessed by the American College Tes t (ACT), in the domains assessed by the Scholastic Achievement Test (SAT), in the do mains assessed by the National Assessment of Educational Progress (NAEP), (Note 88) and in the domains assessed by the Advanced Placement (AP) tests. We asked also ho w participation rates in these testing programs changed and might have affected in terpretations of any effects found. An archival time-series research design was chosen to examine the state-by-state and year-to year data on each transfer measure. Time-se ries studies are particularly suited for determining the degree to which large-scale social or governmental policies make an impact. (Note 89) In archival time-series designs strings of observa tions of the variables of interest are made before, and after, some policy is introduced. The effects of the policy, if any, are apparent in the rise and fall o f scores on the variable of interest. We may consider the implementation of the state pol icy to engage in high-stakes testing as the independent variable, or treatment, and the scores from year to year on the ACT, SAT, NAEP and AP tests, before and after the implem entation of high-stakes testing, as four dependent variables of interest. Relationships between the treatments and effects (between independent and dependent variables) are d emonstrated by studying the pattern in the trend lines before and after the interventio n(s), that is, before and after it was

PAGE 23

23 of 74 mandatory to pass state tests. (Note 90) Table 3 presents the dates at which high school graduation requirements of this type were first int roduced in the eighteen states under study.Table 3 Years in Which High School Graduation Exams Affected Each Graduating Class (Note 91) Graduating classes required to pass different graduation exams to receive a regular highschool diploma. StateYear in which the state's 1st graduationexam policy was introduced 1st ExamClass of... 2nd ExamClass of... 3rd ExamClass of... 4th ExamClass of... 5th ExamClass of... Alabama1983198519932001, 2002, 2003 Florida19761979199019962003 Georgia1981198419951997, 1998Future (Note 92) Indiana19962000 Louisiana198919912003, 2004 Maryland198119872007 Minnesota19962000 Mississippi198819892003, 2004,2005,20061 Nevada197919811985199219992003New Jersey19811984198719952003, 2004,2006 New Mexico 19881990

PAGE 24

24 of 74 New York1960s (Note 93) 198519952000, 2001, 2002, 2003, 2004, 2005 North Carolina 197719801998 (Note 94) 2005 Ohio1991199119942007 South Carolina 198619902005, 2006,2007 Future Tennessee1982198619982005 Texas19801983 (Note 95) 198719922005 Virginia198319862004 Two strategies were used to help evaluate the stren gth of the effects of the high-stakes testing policy, and our confidence in those effects First, data points before the introduction of the tests provided baseline informa tion. (Note 96) Whether changes in the transfer measure occurred was determined by com paring the post intervention data with the baseline or pre-intervention data. If ther e was a change in the trend line for the data, just after intervention occurred, it was conc luded that the treatment had an effect. Secondly, national trend lines were positioned alon gside state trend lines to help control for normal fluctuations and extraneous influences o n the data. (Note 97) The national group was used as a nonequivalent comparison group to help estimate how the dependent variable would have oscillated if there h ad been no treatment. (Note 98) The national trend lines controlled for whether effects at the state level were genuine or just reflections of national trends. Figure 2, using act ual data from the state of Alabama, and presented again in Appendix B, illustrates how the archival time series and our analyses of effects worked.

PAGE 25

25 of 74 Figure 2. (From Appendix B) Analysis of the America n College Test (ACT), Alabama (Note 99) Alabama implemented its 1st high school graduation exam in 1983. It was a prerequisite for graduation that first affected the class of 198 5. Alabama's 2nd exam first affected the class of 1993. The enlarged diamond shape signifies the year before the 1st graduating class was required to pass the exam. The policy int ervention occurs in the year following the large diamond. From these data we conclude that : From 1984–1985 Alabama gained .1 point on the natio n. From 1984–1992 Alabama gained .3 points on the nati on. From 1992–1993 Alabama gained .1 point on the natio n. From 1992–2001 Alabama lost .1 point to the nation. To interpret these data, one inspects the state tre nd line and notes from the bold diamond shapes that there were two different points at whic h Alabama instituted high-stakes tests. After the first test was implemented, there was a s core gain on the ACT in Alabama. (Note 100) After the second test there was an equally modest rise in Alabama's ACT scores. But in each case the national trend line sh owed similar effects, which moderates those conclusions. We can conclude from plotting th e ACT scores each year that: 1) there were, indeed, small short term gains on the A CT in the year after new high-stakes tests were instituted; and 2) that the long term ef fects that may have occurred were substantial after the first test, but resulted in a small negative effect after the second high-stakes test was instituted. As can be seen, th e national trend lines are quite important for interpreting the effects of a high-st akes testing policy on a measure of transfer. A combined national trend line was used because the creation of a comparison group from the 32 states with no or low stakes attached t o their tests was not feasible. Designation of which category a state was in change d from year to year so there were never clear cases of "states with high-stakes tests and a comparison group made up of

PAGE 26

26 of 74"states without high-stakes tests" across the years Using the combined national trend line was the best comparison group available, even though this trend line included each of the states that were under analysis and the othe r 17 states that we designated as high-stakes states and were also the object of stud y. Because of these factors there are some difficulties in the comparison of the state an d national trend lines, perhaps introducing some bias toward or against finding lea rning effects when comparing state trend lines with the national trend lines. If such bias exists, we believe its effects would be minimal.Sources of DataIn an archival time series analysis, effects of the independent variable were measured using historical records and data collected from ag ency and governmental archives (Note 101) and extensive telephone calls and emails to and fr om agency personnel and directors. The following state-level data archives were collected: American College Test (ACT) ACT composite scores – 1980–2001 ACT participation rates – 1994–2001 SAT SAT composite scores – 1977–2001 SAT participation rates – 1991–2001 National Assessment of Educational Progress (NAEP) NAEP Grade 4 Mathematics composite scores – 1992, 1 996, 2000 NAEP Grade 8 Mathematics composite scores – 1990, 1 992, 1996, 2000 NAEP Grade 4 Reading composite scores – 1992, 1994, 1998 NAEP Grade 8 Reading composite scores – 1998 Advance Placement (AP) Percentage of 11th / 12th graders who took AP exams 1991–2000 Percentage of 11th / 12th graders receiving a 3 or above 1995–2000 State summaries for each of the 18 states with the highest stakes written into their K–12 testing policies were constructed to facilitate the time series analysis. These are presented in Appendix A. The summaries include cont extual and historical information about each state's testing policies. Each summary s hould help readers gain more insight about each state's testing policies and the values each state attributes to high-stakes tests, beyond the information offered in Table 1. Most imp ortantly, each summary includes background information regarding the key interventi on points, or years in which graduating seniors were first required to pass diff erent versions of high school graduation exams as summarized in Table 3. These in tervention points were illustrated in each archival time series graph, and each interp retation of state data relied on what happened after these key points in time. The archiv al time series graphs for each of the transfer measures we used are included in the diffe rent Appendices. The data associated

PAGE 27

27 of 74with each of the transfer measures will now be desc ribed. The American College Test (ACT) and the Scholastic Achievement Test (SAT)The American College Test (ACT) (Note 102) and Scholastic Achievement Test (SAT) (Note 103) are the two predominant entrance exams taken by st udents prior to enrolling in higher education. College-bound students take th e ACT or SAT to meet in-state or out-of-state university enrollment requirements. Sc ores on these tests are used by college admissions officers as indicators of ability and ac ademic achievement and are used in decisions about whether an applicant has the minimu m level of knowledge to enter into, and prosper, at the college to which they applied. Although many studies have been conducted questioning the usefulness of these tests in predicting a student's actual success after enrolling in college, they continue t o be widely used by universities when accepting students into their institutions.. (Note 104) Despite questions about their predictive validity, both ACT and SAT scores can be considered as sensible indicators of academic achie vement in the domains that constitute the general high school curriculum of the United St ates. Averaged at the state level, both tests can be thought of as external and alternative indicators of achievement by the students of a particular state. Both of these tests can serve as measures of transfer of learning.At this time, we know that the set of states withou t high-stakes tests perform better on the ACT and SAT. We do not know, however, how perfo rmance on the ACT and SAT tests changed after high school graduation exams we re implemented in the 18 states that have introduced high-stakes testing policies. The o bjective of the first section of this inquiry is to answer this question.There are, however, limitations to using these meas ures. For example, students who take the ACT and SAT are college-bound students and do n ot represent all students in a given state. But in 2001 38% and 45% of all graduat ing seniors took the ACT and the SAT tests, respectively. Although the sample of stu dents is not representative, we can still use these scores to assess how high-stakes te sts affected an average of approximately 2 out of every 5 students across the nation. Additionally, because participation rates vary by state we can use state participation rates to assess how in some states high-stakes tests affected the academic performance of more than 75% of graduating seniors. It should be noted, as well, that some states are A CT states or states in which the majority of high school seniors take the ACT. Other states are SAT states or states in which the majority of high school seniors take the SAT. In Mississippi, for example, only 4% of high school seniors took the SAT in 2001 but in that same year 89% of high school seniors took the ACT. This would make Missis sippi an ACT state. Whether states with high-stakes tests are ACT or SAT states should be taken into consideration to help us understand the sample of students who are t aking the tests. If within Mississippi only 4% of high school seniors took the SAT it can be assumed that those students were probably among the brightest or most ambitious high school seniors in Mississippi. These students probably take the SAT because they w ere seeking out-of-state universities. Conversely, if 89% of high school sen iors took the ACT, it can be assumed

PAGE 28

28 of 74 that those students were probably a bit less talent ed or ambitious seniors, predominantly students trying to meet the requirements of the uni versities within the state of Mississippi. It is likely, however, that this sampl e also includes those seeking entrance to out of state universities that accept ACT scores. T he participation rates for each test helps to decipher whether different samples of coll ege bound students performed differently.It should also be noted that the ACT and SAT tests are high-stakes tests. A student's score does influence to which colleges a student ma y apply and in which colleges a student may enroll. It seems likely, therefore, tha t students who take these tests are trying to achieve the highest scores possible. This would deflate arguments that students try harder on high school graduation exams than col lege entrance exams. If anything, the opposite might be true.The purpose in the next two analyses is to assess h ow student learning changed in the domains represented by the ACT and SAT. Student sco res and participation rates on these tests will be examined in each state after hi gh-stakes high school graduation tests were implemented. Effects will be analyzed from the year in which the first graduating class was required to pass a high school graduation exam. It is also the purpose of the next two analyses to assess how high school seniors who are likely to be bound for out-of-state colleges, and seniors likely to be bou nd for in-state colleges, performed after high school graduation high-stakes exams were imple mented.American College Test (ACT)The ACT data for each of the 18 states with high-st akes testing is included in Appendix B. Short-term, long-term, (Note 105) and overall achievement trends on the ACT were analyzed in the years following a states implementa tion of a high-stakes high school graduation exam. These analyses are summarized in A ppendix B as well. The data and analysis for the state of Alabama, which we include d as Figure 2, illustrated the way we examined each state's ACT data. A summary of those trends across the 18 states with the highest stakes is provided in Table 4.Table 4 Results from the Analysis of ACT Scores (Note 106) StateEffect after 1st HSGE Effect after 2nd HSGE Effect after 3rd HSGE Effect after 4th HSGE OverallEffectsShortTerm LongTerm ShortTerm LongTerm ShortTerm LongTerm ShortTerm LongTerm Alabama1984–'85 +0.1 1984–'92+0.3 1992–'93+0.1 1992–'01–0.1 Positive Floridan/a1980–'89 –0.4 1989–'90–0.1 1989–'95–0.2 1995–'96–0.3(+2%) 1995–'01–0.6(+5%) Negative Georgia1983–'84 +0.2 1983–'94–0.5 1994–'95–0.1 (0%) 1994–'01–0.6(0%) Negative

PAGE 29

29 of 74 Indiana1999–'00 +0.2(–1%) 1999–'01+0.2(–1%) Positive Louisiana1990–'91 0 1990–'01–0.2 Negative Maryland1986–'87 +0.1 1986–'01–0.6 Negative Minnesota1999–'00 –0.1(0%) 1999–'010 (0%) Negative Mississippi1988–'89 0 1988–'01–0.4 Negative Nevada1980–'81 –0.1 1980–'84+0.1 1984–'85–0.3 1984–'91+0.1 1991–'92+0.2 1991–'98+0.1 1998–'99+0.1(–1%) 1998–'01–0.1(–5%) Positive New Jersey1983–'84 +0.3 1983–'86–1.4 1986–'87–0.2 1986–'94–0.1 1994–'95–0.5(–1%) 1994–'01–0.5(–1%) Negative NewMexico 1989–'90+0.1 1989–'01–0.5 Negative New York1984–'85 –0.2 1984–'94–0.5 1994–'95+0.1(–1%) 1994–'01+0.4(–6%) Negative NorthCarolina 1979–'80n/a 1980–'97–1.1 1997–'98+0.1 (0%) 1997–'01+0.4(0%) Negative Ohio1993–'94 +0.1 1993–'01+0.1 Positive SouthCarolina 1989–'90+0.1 1989–'01–0.5 Negative Tennessee1985–'86 +0.3 1985–'97–0.3 1997–'98+0.1(–7%) 1997–'01+0.3(–6%) Positive Texas1986–'87 +0.2 1986–'91+0.7 1991–1'920 1991–'010 Positive Virginia1985–'86 –0.1 1985–'01–1.3 Negative From Table 4, looking at all the states simultaneou sly, and in comparison to the nation, we can evaluate short-term, long-term, and the over all effects of high stakes testing policies.Short-term effects In the short term, ACT gains were posted 1.6 time s more often than losses after high school graduation exams were impl emented. Short-term gains were evident sixteen times, losses were evident ten time s, and no apparent effects were evident three times. But the gains and losses that occurred were partly artificial, because the states' short-term changes in scores were corre lated (–0.51 < r < 0.13) (Note 107) to

PAGE 30

30 of 74the states short-term changes in participation rate s. This modest negative correlation informs us that if the participation rate in ACT te sting went down then the scores on the ACT went up, and vice versa. Under these circumstan ces it is hard to defend the thesis that there are reliable short-term gains from highstakes tests. Long-term effects In the long term, and also in comparison to the n ation, ACT losses were posted 1.9 times more often than gains after h igh school graduation exams were implemented. Long-term gains were evident ten times losses were evident nineteen times, and no apparent effects were evident two tim es. These gains and losses were "real" given that the states' long-term changes in score were unrelated (r = –0.18) (Note 108) to the states' long-term changes in participation rates. Overall effects In comparison to the rest of the nation, negative ACT effects were displayed 2 times more often than positive effects after high-stakes high school graduation exams were implemented. Six states displ ayed overall positive effects, while twelve states displayed overall negative effects. I n this data set overall losses or gains were unrelated to whether the percentage of student s participating in the ACT increased or decreased.Assuming that the ACT can serve as an alternative m easure of the same or a similar domain as a state's high-stakes achievement tests, there is scant evidence of learning. Although states may demonstrate increases in scores on their own high-stakes tests, it appears that transfer of learning is not a typical outcome of their high-stakes testing policy. Sixty-seven percent of the states that use high school graduation exams posted decreases in ACT performance after high school graduation ex ams were implemented. These decreases were unrelated to whether participa tion rates increased or decreased at the same time. On average, the college-bound studen ts in states with high school graduation exams decreased in levels of academic ac hievement as measured by the ACT. One additional point about the ACT data needs to be made. In ACT states (states in which more than 50% of high school seniors took the ACT) students who are thought to be headed for in-state colleges were just slightly (1.3 times) more likely to post negative effects on the ACT. In SAT states (states in which less than 50% of high school seniors took the ACT) the students who are more likely boun d for out-of-state colleges were 2.7 times more likely to post negative effects on the A CT. If anything, high school graduation exams hindered the performance of the br ightest and most ambitious of the students bound for out-of-state colleges. Seventy-t hree percent of the states in which less than 50% of students take the ACT posted overall lo sses on the ACT. Analysis of ACT Participation Rates. (Note 109) Just as ACT scores were used as indicators of academic achievement, ACT participati on rates were used as indicators of the rates by which students in each state were plan ning to go to college. Arguably, if high school graduation exams increased academic ach ievement in some broad and general sense, an increase in the number of student s pursuing a college degree would be noticed. An indicator of that trend would be increa sed ACT participation rates over time. So we examined changes in the rates by which studen ts participated in ACT testing after the year in which the first graduating class was re quired to pass a high school graduation exam and for which data were available. These resul ts are presented in Table 5.Table 5

PAGE 31

31 of 74 Results from the Analysis of ACT Participation Rate sStateYear in which students had to pass 1st HSGE to graduate Change in % of students taking the ACT 1994–2001 as compared to the nation*Overall Effects Alabama1985+9%PositiveFlorida1979+4%PositiveGeorgia19840%NeutralIndiana2000–1%NegativeLouisiana1991+5%PositiveMaryland1987–1%NegativeMinnesota20000%NeutralMississippi1989+14%PositiveNevada1981–6%NegativeNew Jersey1985–1%NegativeNew Mexico 19900%Neutral New York1985–6%NegativeNorth Carolina 1980+2%Positive Ohio1994+2%Positive

PAGE 32

32 of 74 South Carolina 1990+15%Positive Tennessee1986+10%PositiveTexas1987–2%NegativeVirginia1986+4%Positive 1999–2001 data were used for Indiana and Minnesota.

PAGE 33

33 of 74From this analysis we learn that from 1994–2001 ACT participation rates, as compared to the nation, increased in 50% of the states with high school graduation exams. When compared to the nation, participation rates increas ed in nine states, decreased in six states, and stayed the same in three states. Thus t here is scant support for the belief that high-stakes testing policies within a state have an impact on the rate of college attendance.The Scholastic Achievement Test (SAT)The SAT data for each of the 18 states with high-st akes testing is included in Appendix C. Short-term, long-term, and overall achievement t rends were analyzed following the states' implementation of their high-stakes high sc hool graduation exam and these analyses are summarized in Appendix C, as well. The state of Florida was randomly chosen from this data set to illustrate what a time series for the SAT looks like. These data are provided in Figure 3. A summary of those t rends across the 18 high-stakes testing states is provided in Table 6. Figure 3. Florida: SAT scores Florida implemented its 1st high school graduation exam in 1976. It was a prerequisite for graduation that first affected the class of 197 9. Florida's 2nd exam first affected the class of 1990 and its 3rd exam the class of 1996 – see points of intervention (diamonds) enlarged to signify the year before the 1st graduating class was required to pass each exam: From 1978–1979 Florida gained 6 points on the natio n. From 1978–1989 Florida lost 4 points to the nation. From 1989–1990 Florida gained 2 points on the natio n. From 1989–1995 Florida lost 2 points to the nation. From 1995–1996 Florida lost 2 points to the nation. From 1995–2001 Florida lost 6 points to the nation.

PAGE 34

34 of 74 Table 6 Results from the Analysis of SAT Scores Across the States (Note 110) StateEffect after 1st HSGE Effect after 2nd HSGE Effect after 3rd HSGE Effect after 4th HSGE Overall EffectsShortTerm LongTerm ShortTerm LongTerm ShortTerm LongTerm ShortTerm LongTerm Alabama1984–85 +13 1984–92+23 1992–93+7 (0%) 1992–01+4 (–2%) Positive Florida1978–79 +6 1978–89–4 1989–90+2 1989–95–2 1995–96–2 (0%) 1995–01–6(+2%) Negative Georgia1983–84 0 1983–94+21 1994–95–2(+1%) 1994–2001+10 (–5%) Positive Indiana1999–00 +2 1999–01+2 Positive Louisiana1990–91 +3 1990–01+19 Positive Maryland1986–87 +3 1986–01–6 Negative Minnesota1999–00 –12 1999–01–19 Negative Mississippi1988–89 –13 1988–01+7 Negative Nevada1980–81 +3 1980–84–6 1984–85–16 1984–91–10 1991–92+1(+2%) 1991–98–15 1998–99+7 1998–01–2 Negative New Jersey1983–84 –2 1983–86+2 1986–87+4 1986–94+8 1994–95–2 (0%) 1994–01+1(+7%) Positive New Mexico 1989–90–3 1989–01–29 Negative New York1984–85 –3 1984–94–11 1994–95–3(–1%) 1994–2001–6 (–2%) Negative North Carolina 1980–81+7 1980–97+32 1997–98+3 1997–01+10 Positive Ohio1993–94 +7 1993–01–1 Positive South Carolina 1989–90+2 1989–01+15 Positive Tennessee1985–86 –3 1985–97+8 1997–98+1 1997–01–9 (–3%) Negative Texas1986–87 –2 1986–91+7 1991–92–1 (0%) 1991–01–8 (+6%) Negative

PAGE 35

35 of 74Short–term effects Looking across all the states simultaneously, and in comparison to the nation, we see that in the short term, SAT gain s were posted 1.3 times more often than losses after high school graduation exams were implemented. Short-term gains were posted seventeen times, losses were posted thi rteen times, and no apparent effects were posted once. But the gains and losses that occ urred were partly artificial because the states' short-term changes in scores were relat ed (–0.60 < r < 0.38) to the states short-term changes in participation rates. The nega tive correlations inform us that if the participation rate in SAT testing went down the sco res on the SAT went up, and vice versa. The modest positive correlations inform us t hat in a few cases if the participation rate in SAT testing went down the scores on the SAT went down, and vice versa. Under these circumstances it is hard to defend the thesis that there are any reliable short-term gains on measures of general learning associated wi th high-stakes tests. Long-term effects In the long term, and also in comparison to the n ation, SAT losses were posted 1.1 times more often than gains after h igh school graduation exams were implemented. Long-term gains were evident fifteen t imes, and losses were evident sixteen times. These gains and losses were partly a rtificial, however, given that the states' long-term changes in score were negatively correlated (r = –0.41) to the changes in participation rates for taking the SAT. The fewe r students taking the test, the higher the SAT scores, and vice versa.Overall effects. In comparison to the rest of the nation, negative SAT effects were posted 1.3 times more often than positive effects after hi gh school graduation exams were implemented. Eight states displayed overall positiv e effects, while ten states displayed overall negative effects. But the gains or losses i n score were related to increases and decreases in the percentage of students participati ng in the SAT. Thus it is hard to attribute any effects on the SAT to the implementat ion of high-stakes testing. If we assume that the SAT is an alternative measure of the same or a similar domain as a state's own high-stakes achievement tests, then the re is scant evidence of learning. Although states may demonstrate increases in scores on their own high-stakes tests, it appears that transfer of learning is not a typical outcome of their high-stakes testing policy. Fifty-six percent of the states that use hi gh school graduation exams posted decreases in SAT performance after high school grad uation exams were implemented. However, these decreases were slightly related to w hether SAT participation rates increased or decreased at the same time. Thus, ther e is no reliable evidence that high-stakes high school graduation exams improve th e performance of students who take the SAT. Gains and losses in SAT scores are more re lated to who participates in the SAT than the implementation of high school graduati on exams. One additional point about the SAT data needs to be made. In SAT states (states in which more than 50% of high school seniors took the SAT) students who are thought to be headed for in-state colleges were equally likely to post negative and positive effects on the SAT. In ACT states (states in which less tha n 50% of high school seniors took the SAT) the students who are more likely bound for out -of-state colleges were 1.7 times more likely to post negative effects on the SAT. If anything, high school graduation exams hindered the performance of the brightest and most ambitious of the students bound for out-of-state colleges. Sixty-three percen t of the states in which less than 50% of students take the SAT posted overall losses on t he SAT. Analysis of SAT Participation Rates. Just as SAT scores were used as indicators of

PAGE 36

36 of 74 academic achievement, SAT participation rates were used as indicators of the rates by which students in each state were planning to go to college. Arguably, if high school graduation exams increased academic achievement in some broad and general sense, an increase in the number of students pursuing a colle ge degree would be noticed. An indicator of that trend would be increased SAT part icipation rates. So we examined changes in the rates by which students participated in SAT testing after the year in which the first graduating class was required to pass a h igh school graduation exam and for which data were available. These results are presen ted in Table 7.Table 7 Results from the Analysis of SAT Participation Rate sStateYear students must pass 1st HSGE to graduate Change in % of students taking the SAT 1991–2001 as compared to the nation*Overall Effects Alabama1985–2%NegativeFlorida1979+3%PositiveGeorgia1984–2%NegativeIndiana2000–1%NegativeLouisiana1991–5%NegativeMaryland1987–2%NegativeMinnesota2000–1%NegativeMississippi1989–3%NegativeNevada1981+5%PositiveNew Jersey1985+4%PositiveNew Mexico1990–2%Negative

PAGE 37

37 of 74 New York1985–1%NegativeNorth Carolina 1980+5%Positive Ohio1994+2%PositiveSouth Carolina 1990–4%Negative Tennessee1986–2%NegativeTexas1987+6%PositiveVirginia1986+5%Positive 1993–2001 data were used for Ohio and 2000–2001 da ta were used for Indiana and Minnesota. Participation rates were not available for 1998 and 1999.From this analysis we learn that from 1991–2001 (19 93–2001 in Ohio, and 2000–2001 in Indiana and Minnesota) SAT participation rates, as compared to the nation, fell in 61% of the states with high school graduation exams Participation rates in the SAT increased in seven states and decreased in eleven s tates. There is scant support for the belief that high-stakes testing policies will incre ase the rate of college attendance. Students did not participate in the SAT testing pro gram at greater rates after high-stakes high school graduation exams were implemented.National Assessment of Educational Progress (NAEP)Some may argue that using ACT and SAT scores to ass ess the effects of high school graduation exams is illogical because high school g raduation exams are specifically intended to raise the achievement levels of those s tudents who are the most likely to fail – the poor, in general, and poor racial minorities, in particular. These students do not take the ACT or SAT in great numbers. But the effec ts of high-stakes policies on these particular populations can be assessed with data fr om the National Assessment of Educational Progress. (Note 111) The National Assessment of Educational Progress (NA EP), commonly known as ‘the nation's report card," is the test administered by the federal government to monitor the condition of education in the nation's schools. NAE P began in 1969 as a national assessment of three different age or grade levels, for which students were randomly sampled and tested to provide information about the outcomes of the nation's various educational systems. In 1990 NAEP was expanded to p rovide information at the state level, allowing for the first time state-to-state c omparisons. States that volunteered to participate in NAEP coul d gauge how they performed in math

PAGE 38

38 of 74and reading in comparison to each other and to the nation, overall. This way states could assess the effects of the particular educational po licies they had implemented. Under President Bush's national education policy, however states are required to take the NAEP because it is believed to be the most robust a nd stable instrument the nation has to gauge learning and educational progress across a ll states. (Note 112) The federal government believes, as we do, that NAEP exams can be used to asses transfer, that NAEP is an alternate measure of the domains that ar e assessed by each of the states. Weaknesses of the NAEP It is proper to acknowledge that the NAEP has a n umber of weaknesses influencing interpretations of the data that we offer below. First, state level NAEP data pertain only to 4th and 8th grade achieve ment. The national student data set includes 12th grade data as well, and some addition al subjects are tested, but at the state levels, only 4th and 8th grade achievement is measu red. Given these circumstances it is not logical to attempt an assessment of the effects of implementing a high school graduation exam, or any other exam that is usually administered at the high school level, by analyzing NAEP tests given at the 4th or 8th gra de. On the other hand, it is not illogical to make the assumption that other state r eform policies went into effect at or around the same time as high-stakes high school gra duation exams were put into place, including the use of other high-stakes tests at low er grade levels. (Note 113) The usefulness of the NAEP analyses that follow rest on the assumption that states' other K–12 high-stakes testing policies were implemented at or around the same time as each state's high school graduation exam. Table 1 descri bes these policies, and these policies are elaborated on in Appendix A. Other researchers who have used NAEP data to draw conclusions about the effects of high-stakes tests have used this logic and methodology as well. (Note 114) Secondly, the NAEP does not have stakes attached to it. Students who are randomly selected to participate do not have to perform thei r best. However, because each student only takes small sections of the test, students app ear to be motivated to do well and the scores appear to be trustworthy. (Note 115) Third, states like North Carolina have aligned thei r state-administered exams with the NAEP, making for state-mandated tests that are very similar to the NAEP. (Note 116) In such cases gains in score on the NAEP may be relate d to similarities in test content rather than actual increases in school learning. St ates that align their tests with the NAEP have an unfair advantage over other states that ali gned their tests with their state standards, but such imitative forms of testing occu r. State tests that look much like the NAEP will probably become more common now that Pres ident Bush is attempting to attach stakes to the NAEP, and this will, of course make the NAEP much less useful as a yardstick to assess if genuine learning of the do mains of interest is taking place. Finally, when analyzing NAEP data it is important t o pay attention to who is actually tested. The NAEP sampling plan uses a multi-stage r andom sampling technique. In each participating state, school districts are randomly sampled. Then, schools within districts are randomly sampled. And then, students within sch ools are randomly sampled. Once the final list of participants is drawn, school per sonnel sift through the list and remove students who they have classified as Limited Englis h Proficient (LEP) or who have Individualized Education Plans (IEPs) as part of th eir special education programs. Local personnel are required to follow "carefully defined criteria" in making determinations as to whether potential participants are "capable of p articipating." (Note 117) In short, although the NAEP uses random sampling techniques, not all students sampled are

PAGE 39

39 of 74actually tested. The exclusion of these students bi ases NAEP results. Illusion from exclusion. Walter Haney found that exclusion rates explained gains in NAEP scores and vice versa. Texas, for example, was one state in which large gains in NAEP scores were heralded as proof that high-stakes tests do, indeed, improve student achievement. But Haney found that the percentages o f students excluded from participating in the NAEP increased at the same tim e that large gains in scores were noted. Exclusion rates increased at both grade leve ls escalating from 8% to 11% at grade 4 and from 7% to 8% at grade 8 from 1992–1996. Mean while, in contrast, exclusion rates declined at both grade levels at the national level during this same time period, decreasing from 8% to 6% at grade 4 and from 7% to 5% at grade 8. Haney, therefore, termed the score gains in Texas an "illusion arisin g from exclusion." (Note 118) Unfortunately, however, such illusions from exclusi ons hold true across the other states that use high-stakes tests. For example, North Caro lina was the other state in which large gains in NAEP scores were heralded as proof that hi gh-stakes testing programs improve student achievement. On the 4th grade NAEP math tes t North Carolina recorded an average composite score of 212 in 1992 and an avera ge composite score of 232 in 2000. The nation's composite score increased from 218 to 226 over the same time period. North Carolina gained 20 points while the nation ga ined 8, making for what would seem to be a remarkable 12-point gain over the nation, t he largest gain made by any state. But North Carolina excluded 4% of its LEP and IEP stude nts in 1992 and 13% of its LEP and IEP students in 2000. Meanwhile, the nation's e xclusion rate decreased from 8% to 7% over the same time period. North Carolina exclud ed 9% more of its LEP and IEP students while the nation excluded 1% less making f or a 10% divergence between North Carolina's and the nation's exclusion rates from 19 92–2000. North Carolina's grade 4 math 1992–2000 exclusion rates increased 325% while the nation's exclusion rate decreased. In addition, North Carolina's grade 8 ma th 1992–2000 exclusion rates increased 467% while the nation's exclusion rate st ayed the same. There is little doubt that the relative gains poste d by North Carolina were partly, if not entirely, artificial given the enormous relative in crease in the rates by which North Carolina excluded students from participating in th e NAEP. The Heisenberg Uncertainty Principle appears to be at work in both Texas and N orth Carolina, leading to distortions and corruptions of the data, giving rise to uncerta inty about the meaning of the scores on the NAEP tests.North Carolina and Texas, however, are not the only states in which exclusionary trends were observed. In states with high-stakes tests, be tween 0%–49% of the gains in NAEP scores can be explained by increases in rates of ex clusion. Similarly, 0%–49% of the losses in score can be explained by decreases in ra tes of exclusion over the same years. (Note 119) The more recent the data, the more the variance in NAEP scores can be explained by changes in exclusion rates. In short, states that are posting gains are increasingly excluding students from the assessment This is happening with greater frequency as time passes from one NAEP test to the next. That is, as the stakes attached to the NAEP become higher, the Heisenberg Uncertain ty Principle in assessment apparently is having its effects, with distortions and corruptions of the assessment system becoming more evident.The state scores on the NAEP math and reading tests at grades 4 and 8, will be used in our analysis to test the effects on learning from u sing high-stakes tests in states that have

PAGE 40

40 of 74implemented high-stakes high school graduation exam s. Given that exclusion rates affect gains and losses in score, however, state ex clusion rates will be presented along side the relative gains or losses posted by each st ate. In this way readers can make their own judgments about whether year-to-year gains in s core are likely to be "true" or "artificial." The gains and losses in scores and ex clusion rates have all been calculated in comparison to the pooled national data. Analysis of NAEP Grade 4 Math ScoresFor each state, after high-stakes tests were implem ented, an analysis of NAEP mathematics achievement scores was conducted. The s tate of Georgia was randomly chosen to serve as an example of the analysis we di d on the grade 4 NAEP math tests (see Figure 4). The logic of this analysis rests on two assumptions. First, that high-stakes tests and other reforms were implemented in all gra des at or around the same time, or soon after high-stakes high school graduation exams were implemented. Second, that such high-stakes test programs and the reform effor ts that accompany them should affect learning in the different mathematics domains that make up the K–4 curriculum. NAEP is a test derived from the K–4 mathematics domains. Figure 4. NAEP Math, Grade 4: Georgia Trend lines and analytic comments for all the other states are included in Appendix D. A summary of these data across all 18 states is prese nted as Table 8. Georgia implemented its 1st high school graduation exam in 1984. Assuming that other stakes attached to Georgia's K–8 tests (see Table 1 ) were attached at or around the same time or some time thereafter: From 1992–1996 Georgia lost 4 points to the nation. From 1996–2000 Georgia gained 4 points, as did the nation. From 1992–2000 Georgia lost 4 points to the nation.

PAGE 41

41 of 74 Table 8 Results from the Analysis of NAEP Math Grade 4 Scor esStateYear in which students had to pass 1st HSGE tograduate 1992–1996Change inscore 1992–1996Change in%excluded 1996–2000Change inscore 1996–2000Change in%excluded 1992–2000Change inscore 1992–2000Change in%excluded Overall Effects Alabama19850+3%+2–1%+2+2%PositiveFlorida1979–2n/an/an/an/an/aNegativeGeorgia1984–4+4%0–1%–4+3%NegativeIndiana2000n/an/an/an/an/an/an/aLouisiana1991+1+6%+5–1%+6+5%PositiveMaryland1987–1+6%–20%–3+6%NegativeMinnesota2000n/an/an/an/an/an/an/aMississippi1989+2+3%–1–3%+10%PositiveNevada1981n/an/a–10%n/an/aNegativeNew Jersey1984–4n/an/an/an/an/aNegativeNew Mexico 1990–3+7%–4–1%–7+6%Negative New York19850+5%0+3%0+8%NeutralNorth Carolina 1980+8+5%+4+5%+12+10%Positive Ohio1994+2n/a+2n/a+4+5%PositiveSouth Carolina 1990–3+3%+30%0+3%Neutral Tennessee1986+4+4%–3–3%+1+1%PositiveTexas1987+7+4%0+4%+7+8%Positive

PAGE 42

42 of 74 The time period 1992–1996 From Table 8, in comparison to the nation as a wh ole, we see that the states that implemented high-stakes te sts 1 or more years before 1996 posted losses 1.2 times more often than gains on the 1992– 1996 grade 4 NAEP math tests. Six states posted gains, seven states posted losses, an d two states posted no changes, as compared to the nation. Thus, only 40% of the state s with high-stakes tests posted gains from 1992–1996. These gains and losses may be consi dered "real" given that the states' 1992–1996 changes in score were unrelated (r = 0) t o the states' 1992–1996 exclusion rates.The time period 1996–2000 Table 8 also reveals that on the 1996–2000 grade 4 NAEP math tests the states that implemented high-stakes tests 1 or more years before 2000 posted gains 1.2 times more often than losses, as c ompared to the nation. Six states posted gains, five states posted losses, and three states posted no changes as compared to the nation. Thus, only 43% of the states with highstakes tests posted gains from 1996–2000. These gains and losses, however, were pa rtly artificial since the states' 1996–2000 changes in score were positively correlat ed (r = 0.45) with the states' 1996–2000 exclusion rates.The time period 1992–2000 Table 8 also reveals that states that implemented high-stakes tests 1 or more years before 2000 poste d gains 2.7 times more often than losses. Another way to look at these data is to not e that these states were 1.6 times more likely to show gains rather than losses or no chang es on the grade 4 NAEP math tests over the time period from 1992–2000. Eight states p osted gains, three states posted losses, and two states posted no changes as compare d to the nation. Thus, gains were posted by 62% of the states with high-stakes tests from 1992–2000. But these gains and losses were partly artificial given that the states 1992–2000 changes in score were positively correlated (r = 0.39) to the states' 199 2–2000 exclusion rates. The higher the percent of students excluded, the higher the NAEP s cores obtained by a state. Because of the correlation we found between exclusion rates an d scores on the NAEP, there is uncertainty about the meaning of those improved sco res. The overall data set. In the years for which data were available, across all time periods, the implementation of high-stakes tests resulted in positive effects 1.3 times more often than negative effects on the grade 4 NAEP tests in mathematics. Eight states displayed positive effects, six states displayed negative eff ects, and two states displayed neutral effects. Thus, in comparison to national trends, 50 % of the states with high-stakes tests posted positive effects but these gains and losses were partly artificial, given that the overall positive or negative changes in score were related to changes in the overall state exclusion rates.In short, when compared to the nation as a whole, h igh-stakes testing policies did not usually lead to improvement in the performance of s tudents on the grade 4 NAEP math tests between 1992 and 2000. Gains and losses were more likely to be related to who was excluded from the NAEP than to the effects of h igh-stakes testing programs in a state. In the 1992–1996 time period, when participa tion rates were unrelated to gains and losses, the academic achievement of students may ha ve even been thwarted in those states where high-stakes testing was implemented. H igh-stakes tests within states probably had a differential impact on students from racial minority and economically disadvantaged backgrounds.

PAGE 43

43 of 74 Analysis of NAEP Grade 8 Math ScoresFor each state, after high-stakes tests had been im plemented, an analysis of NAEP mathematics achievement scores was conducted. The s tate of Mississippi was randomly chosen to serve as an example of the analysis we di d on the grade 8 NAEP math tests (see Figure 5). The logic of this analysis rests on two assumptions. First, that high-stakes tests and other reforms were implemented in all gra des at or around the same time, or soon after high-stakes high school graduation exams were implemented. Second, that such high-stakes test programs should affect learni ng in the different mathematics domains that make up the K–8 curriculum. NAEP is a test derived from the domains that make up the K–8 curriculum. Figure 5. Mississippi – NAEP math grade 8 Mississippi implemented its 1st high school graduat ion exam in 1988. Assuming that the stakes attached to Mississippi's K–8 tests (see Tab le 1) were attached at or around the same time or some time thereafter. From 1990–1992 M ississippi data were not available. From 1992–1996 Mississippi gained 4 points, as did the nation. From 1996–2000 Mississippi gained 4 points, as did the nation. Fro m 1990–2000 Mississippi NAEP data were not available.All other states' trend lines and analytic comments are included in Appendix E. A summary of these data across all 18 states is prese nted as Table 9.Table 9 Results from the Analysis of NAEP Math Grade 8 Scor es StateYear in which students 1990–92 1990–192 1992–96 1992– 96Change 1996–00 1996– 00Change 1990–00 1990–00Change Overall Effects

PAGE 44

44 of 74 had to pass 1st HSGE tograduate Change in score Change in % excluded Change in score in % excluded Change in score in % excluded Change in score in % excluded Alabama1985–6–2%0+4%+2–4%–4–2%NegativeFlorida1979–1n/a0n/an/an/an/an/aNegativeGeorgia1984–50%–1+4%0–2%–6+2%NegativeIndiana2000n/an/an/an/an/an/an/an/an/aLouisiana1991–1–2%–2+4%+3–2%00NeutralMaryland1987–1–2%+1+4%+2+2%+2+4%PositiveMinnesota2000n/an/an/an/an/an/an/an/an/aMississippi19890n/a0+2%n/an/an/an/aNeutralNevada1981n/an/an/an/an/an/an/an/an/aNew Jersey1984n/an/an/an/an/an/an/an/an/aNew Mexico 1990–2–3%–2+5%–6+2%–10+4%Negative New York198500%0+2%+2+3%+2+5%PositiveNorth Carolina 1980+3–2%+6+3%+8+8%+17+9%Positive Ohio1994–1–1%+3.5n/a+3.5n/a+6+2%PositiveSouth Carolina 1990n/an/a–4+2%+2–1%n/an/aNegative Tennessee1986n/an/a+1+1%–4–1%n/an/aNegativeTexas1987+2–1%+1+4%+1–1%+4+2%PositiveVirginia1986–2–2%–2+4%+3+1%–1+3%Negative The time period 1990–1992. Table 9 reveals that, in comparison to the nation as a whole, states that implemented high-stakes tests on e or more years before 1992 posted losses 4 times more often than gains on the 1990–19 92 grade 8 NAEP math tests. Compared to the nation, two states posted gains, ei ght states posted losses, and two

PAGE 45

45 of 74states posted no change. Over this time period gain s on the NAEP tests were posted by 17% of the states with high-stakes tests. These gai ns and losses were "real" given that the states' 1990–1992 changes in score were unrelat ed (r = 0) to the states' 1990–1992 exclusion rates.The time period 1992–1996 Table 9 also reveals that states that implemented high-stakes tests 1 or more years before 1996 were as likely to post gains as losses on the 1992–1996 grade 8 NAEP math tests. Five states post ed gains, five states posted losses, and four states posted no changes as compared to th e nation. Thus, from 1992–1996 only 36% of the states with high-stakes tests poste d gains. These gains and losses were "real" given that the states' 1992–1996 changes in score were unrelated (r = 0) to the states' 1992–1996 exclusion rates.The time period 1996–2000. Looking at the grade 8 NAEP math tests over the 1996–2000 time period we see that states that imple mented high-stakes tests 1 or more years before 2000 posted gains 4.5 times more often than losses. Nine states posted gains, two states posted losses, and one state post ed no changes, as compared to the nation. Thus, in the time period from 1996–2000 gai ns were posted by 75% of the states with high-stakes tests, but those NAEP scores were related to whether exclusion rates increased or decreased over the same time period, r aising some uncertainty about the authenticity of these gains. Gains and losses durin g this time period must be considered partly artificial given that the states' 1996–2000 changes in score were positively related (r = 0.35) to the states' 1996–2000 exclusion rates The time period 1990–2000 Looking over the long term, states that implement ed high-stakes tests one or more years before 2000 pos ted gains 1.3 times more often than losses on the 1990–2000 grade 8 NAEP math tests. Fi ve states posted gains, four states posted losses, and one state posted no changes as c ompared to the nation. These gains and losses were partly artificial, however, given t hat the states' 1996–2000 changes in score were substantially related (r = 0.53) to the states' 1990–2000 exclusion rates. Overall, across the years for which data were avail able, the states that had implemented high-stakes tests displayed negative effects 1.4 ti mes more often than positive effects. Five states displayed positive effects, seven state s displayed negative effects, and two states displayed neutral effects. Another way of in terpreting these data is that 36% of the states with high-stakes tests posted positive effec ts from 1990–2000 on the grade 8 NAEP math examinations, while losses were posted by 50% of the states with high-stakes tests over this same time period. These gains and losses were partly artificial, however, given that the overall positiv e or negative changes in score were related to overall exclusion rates.In short, there is no compelling evidence that high -stakes testing policies have improved the performance of students on the grade 8 NAEP mat h tests. Gains were more related to who was excluded from the NAEP than to whether ther e were high-stakes tests being used or not. If anything, the weight of the evidenc e suggests that high-stakes tests thwarted the academic achievement of students in th ese states.Analysis of the Grade 4 NAEP Reading ScoresFor each state, after high-stakes tests had been im plemented, an analysis of NAEP reading achievement scores was conducted. The state of Virginia was randomly chosen

PAGE 46

46 of 74 to serve as an example of the analysis we did on th e grade 4 NAEP reading tests (see Figure 6). The logic of this analysis rests on two assumptions. First, that high-stakes tests and other reforms were implemented in all gra des at or around the same time, or soon after high-stakes high school graduation exams were implemented. Second, that such high-stakes test programs should affect learni ng in the different domains of reading that make up the K–4 curricula. NAEP is a test deri ved from the various domains that constitute the K–4 reading curriculum. Figure 6. Virginia – NAEP reading grade 4 Virginia implemented its 1st high school graduation exam around 1981. Assuming that the stakes attached to Virginia's K–8 tests (see Ta ble 1) were attached at or around the same time or some time thereafter 1) From 1992–1994 Virginia lost 5 points to the nation; 2) From 1994–1998 Virginia gained 2 points on the nation; 3) From 1992–1998 Virginia lost 3 points to the nation. Trend lines a nd analytic comments for all other states are included in Appendix F. A summary of the se data across all 18 states is presented as Table 10.Table 10 Results from the analysis of NAEP reading grade 4 s coresStateYear in which students had to pass 1st HSGE tograduate 1992–94Change in score 1992–94Change in % excluded 1994–98Change in score 1994–98Change in % excluded 1992–98Change in score 1992–98Change in % excluded Overall Effects Alabama1985+4–1%0+4%+4+3%Positive

PAGE 47

47 of 74 Florida19790+1%–1–1%–10%NegativeGeorgia1984–20%0+2%–2+2%NegativeIndiana2000n/an/an/an/an/an/an/aLouisiana1991–4+2%+4+7%0+9%NeutralMaryland1987+20%+2+3%+4+3%PositiveMinnesota2000n/an/an/an/an/an/an/aMississippi1989+6+1%–1–2%+5–1%PositiveNevada1981n/an/an/an/an/an/an/aNew Jersey1984n/an/an/an/an/an/an/aNew Mexico 1990–30%–2+3%–5+3%Negative New York19850+2%+10%+1+2%PositiveNorth Carolina 1980+5+1%0+6%+5+7%Positive Ohio1994n/an/an/an/an/an/an/aSouth Carolina 1990–4+1%+4+5%0+6%Neutral Tennessee1986+4+1%–4–1%00%NeutralTexas1987+2+3%+2+3%+4+6%PositiveVirginia1986–5+1%+2+2%–3+3%NegativeThe time period 1992–1994 We note in Table 10 that on the grade 4 reading t est, during the time period 1992–1994, states that implemented high-stakes tests 1 or more years before 1994 posted gains 1.2 times more often than losses, in comparison to the nation. Compared to national trends six states posted gains five states posted losses, and two states posted no changes at all. Thus, only 46% of the states with high-stakes tests posted gains from 1992–1994. These gains and losses were real" given that the states' changes in score for the time period 1992–1994 were virtual ly unrelated (r = –0.10) to the states' exclusion rates.The time period 1994–1998 Table 10 also reveals that those states implement ing high-stakes tests 1 or more years before 1998 poste d gains 1.5 times more often than losses when compared to national trends. Six states posted gains, four states posted

PAGE 48

48 of 74losses, and three states posted no changes when com pared to the national trends. Thus, only 46% of the states with high-stakes tests from 1994–1998 posted gains. These gains and losses were partly artificial, however, given t hat the states' 1994–1998 changes in score were strongly correlated (r = 0.63) to the st ates' 1994–1998 exclusion rates. The time period 1992–1998. Table 10 also informs us that states implementing high-stakes tests 1 or more years before 1998 poste d gains 1.5 times more often than losses in comparison to the national trends during the time period 1992–1998. Six states posted gains, four states posted losses, and three states posted no changes in comparison to national trends. Thus, only 46% of the states wi th high-stakes tests posted positive effects from 1992–1998 on the NAEP grade 4 reading test. The gains and losses may be considered "real" given that the states' 1992–1998 changes in score were virtually unrelated (r = 0.11) to the states' changes in 1992 –1998 exclusion rates. In short, in comparison to the national trends, hig h-stakes tests did not improve the learning of students as judged by their performance on the NAEP grade 4 reading test. This was clearest in the time periods from 1992–199 4 and from 1992–1998. The learning effects over these years were unrelated to the rates by which students were excluded from the NAEP. We note, however, that in 1 998 75% of the states with high-stakes tests had 1998 exclusion rates that wer e higher than the nation. Given the typical positive (and substantial) correlation betw een increased exclusion rates and increased NAEP scores, states' gains and losses in score need to be carefully evaluated. If anything, in comparison to national trends, the academic achievement of students in states with high-stakes testing policies seemed to be lower, particularly for students from minority backgrounds.NAEP Cohort AnalysesAnother way of investigating growth in achievement on measures other than states' high-stakes tests is to look at each state's cohort trends on the NAEP. (Note 122) The NAEP analyses preceding this section gauged the ach ievement trends of different samples of students over time, for example, 4th gra ders in one year compared to a different group of 4th graders a few years later. T here is a slight weakness with this approach because we must compare students in one ye ar with a different set of students a few years later. We are unable to control for dif ferences between the different groups or cohorts of students. (Note 123) To compensate for this we did a cohort analysis, a n analysis of the growth in achievement made by "simi lar" groups of students over time. This is possible because NAEP uses random samples o f students. Thus the 4th graders in 1996 should be representative of the same populatio n of 8th graders tested four years later. Random sampling techniques made the groups o f students similar enough so that the achievement effects made by the "same" (statist ically the same) students can be tracked over time. (Note 124) Analyzing cohort trends in the 18 states with high -stakes tests helped assess the degree to which students in creased in achievement as they progressed through school systems that were exertin g more pressures for school improvement, including the use of high-stakes tests We examined the growth of these students by tracking the relative changes in math a chievement of 4th graders in 1996 to 8th graders in 2000, and by looking at the reading achievement of 4th graders in 1994 to that of 8th graders in 1998. The changes we record for each state are all relative to the national trends on the respective NAEP tests.

PAGE 49

49 of 74 Cohort Analysis of NAEP Mathematics Scores: Grade 4 (1996) to Grade 8 (2000)The state of New York was randomly chosen to serve as an example of the analysis we did for the NAEP mathematics cohort over the years 1996 to 2000 (see Figure 7). The logic of this analysis rests on the same two assump tions as previous NAEP analyses. First, that high-stakes tests and other reforms wer e implemented in all grades at or around the same time, or soon after high-stakes hig h school graduation exams were implemented. Second, that such high-stakes test pro grams should affect learning in the different domains of mathematics from which NAEP is derived. Figure 7. New York by cohort: NAEP math grade 4 199 6 to grade 8 2000 New York implemented its 1st high school graduation exam around 1981. Assuming that the stakes attached to New York's K–8 tests (s ee Table 1) were attached at or around the same time or some time thereafter, from 4th gra de in 1996 to 8th grade in 2000 New York gained 1 point on the nation. Trend lines and analytic comments for all other states are included in Appendix G. A summary of these data across all 18 states is presented as Table 11.Table 11 Results from the Analysis of NAEP Math Cohort Trend s StateYear in which students had to pass 1st HSGE to graduate Change in scorefrom grade 4 1996to grade 8 2000 Change in % excluded from grade 4 1996 tograde 8 2000 Overall Effects1996–00 Alabama1985–2–2%NegativeFlorida1979n/an/an/a

PAGE 50

50 of 74 Georgia1984–2–1%NegativeIndiana2000n/an/an/aLouisiana1991–2–3%NegativeMaryland1987+4+2%PositiveMinnesota2000n/an/an/aMississippi1989–60%NegativeNevada1981–10%NegativeNew Jersey1984n/an/an/aNew Mexico 1990–6–1%Negative New York1985+1+4%PositiveNorth Carolina 1980+4+6%Positive Ohio1994n/an/an/aSouth Carolina 1990+10%Positive Tennessee1986–8–2%NegativeTexas1987–6–1%NegativeVirginia1986+3+2%PositiveThe 1996–2000 cohort From 1996 to 2000 cohorts of students moving from 4th to 8th grade in states that had implemented high-stakes te sts in the years before 2000 posted losses 1.6 times more often than gains. In comparis on to the national trends five states posted gains, and eight states posted losses. Said differently, in comparison to the nation, 62% of the states with high-stakes tests posted los ses as their students moved from the 4th grade 1996 NAEP to the 8th grade 2000 NAEP. The se gains and losses, however, were partly artificial because gains and losses in score for the cohorts in the various states were strongly correlated (r = 0.70) with ove rall exclusion rates. This cohort analysis finds no evidence of gains in general lear ning as a result of high-stakes testing policies.Cohort Analysis of NAEP Reading Scores: Grade 4 (19 94) to Grade 8 (1998)

PAGE 51

51 of 74 The state of Tennessee was randomly chosen to serve as an example of the analysis we did for NAEP reading cohort over the years 1994 to 1998 (see Figure 8). The logic of this analysis rests on the same two assumptions mad e in the previous NAEP analyses. First, that high-stakes tests and other reforms wer e implemented in all grades at or around the same time, or soon after high-stakes hig h school graduation exams were implemented. Second, that such high-stakes test pro grams should affect learning in the different domains of reading from which NAEP is der ived. Figure 8. Tennessee by cohort: NAEP reading grade 4 1994 to grade 8 1998 Tennessee implemented its 1st high school graduatio n exam in 1982. Assuming that the stakes attached to Tennessee's K–8 tests (see Table 1) were attached at or around the same time or some time thereafter, from 4th grade i n 1994 to 8th grade in 1998 Tennessee lost 3 points to the nation. Trend lines and analytic comments for all other states are included in Appendix H. A summary of the se data across all 18 states is presented as Table 12.Table 12 Results from the Analysis of NAEP Reading Cohort Tr ends StateYear in which students had to pass 1st HSGE to graduate Change in scorefrom grade 4 1994to grade 8 1998 Change in % excluded from grade 4 1994 tograde 8 1998 Overall Effects 1994–98 Alabama1985–2+5%NegativeFlorida1979–1–2%NegativeGeorgia1984+1+4%PositiveIndiana2000n/an/an/a

PAGE 52

52 of 74 Louisiana1991+6+6%PositiveMaryland1987+3+3%PositiveMinnesota2000n/an/an/aMississippi1989–20+4%NegativeNevada1981n/an/an/aNew Jersey1984n/an/an/aNew Mexico 1990+4+2%Positive New York1985+5+4%PositiveNorth Carolina 1980+1+7%Positive Ohio1994n/an/an/aSouth Carolina 1990+3+2%Positive Tennessee1986–3+1%NegativeTexas1987+1–1%PositiveVirginia1986+4+3%Positive The 1994–1998 cohort In comparison to national trends, cohorts of stud ents in states that implemented high-stakes tests in the years bef ore 1998 posted gains 2.3 times more often than losses from the 4th to the 8th grade on the 1994 and the 1998 NAEP reading exams. Nine states posted gains, and four states po sted losses. These gains and losses were "real" given that gains and losses in score we re unrelated (r = 0) to overall exclusion rates.Thus far in these analyses this is the only example we found of gains in achievement on a transfer measure that meet criteria of acceptabil ity. As their students moved from the 4th grade in 1994 to the 8th grade in 1998, 69% of the states with high-stakes tests posted gains on the NAEP reading tests. Since these gains and losses were unrelated to increases and decreases in exclusion rates they app ear to be "real" effects. To put these gains in context we note that in the states that sh owed increases in scores from 1994 to 1998, the average gain was 52 points. By any metric a 52-point gain is sizeable. But when these gains are compared to the national trend s over the same time period, as shown in table12, we see that the gains in the stat es with high-stakes testing policies

PAGE 53

53 of 74was, on average, only 3 points over the national tr end. On the other hand, although fewer in number, the states that posted losses in compari son to the nation fell an average of 6.5 points. This figure is skewed, however, by the fact that Mississippi lost 20 points more than the nation did on the 4th to 8th grade reading NAEP from 1994-1998. In sum, these gains in the reading scores in states with high-sta kes testing policies seem real but modest given the losses shown by other states with high-stakes testing policies.Advanced Placement (AP) Data AnalysisThe Advanced Placement (AP) program offers high sch ool students opportunities to take college courses in a variety of subjects and receiv e credits before actually entering college. We used the AP data (Note 125) as another indicator of the effects of high-stakes high school graduation exams on the general learnin g and motivation of high school students. Using the AP exams as transfer measures a nd the AP participation rates as indicators of increased student preparation and mot ivation for college, we could inquire whether, in fact, high school graduation exams incr eased learning in the knowledge domains that are the intended targets of high-stake s testing programs. The participation rates and rates by which students passed AP exams that are used in the following analyses were calculated by the College B oard, (Note 126) administrators of the AP program. Gains or losses were assessed after the most recent year in which a new high school graduation exam was implemented or afte r 1995 – the first year for which these AP data were available.Table 13 presents for each state the percentages of students who passed AP examinations with a grade of 3 or better after high school graduation exams were implemented. As we worked, however, it became appar ent that fluctuations in participation rates were related (r = –0.30) to the percent of students passing AP exams with a grade 3 or better. If participation rates in a state decreased, the percent of students who passed AP exams usually increased and vice vers a. To judge the effect of this interaction, and in comparison to the nation, the p ercent change in students who passed the AP examination is presented along with the perc ent change in students who participated in AP exams during the time period 199 5–2000. If an increase in one corresponded to a decrease in the other, caution in making judgments about the effects is required.North Carolina was randomly chosen from the states we examine to be the example for the AP analysis. That data is presented in Figure 9 Trend lines and analytic comments for all other states are included in Appendix I and summarized in Table 13.

PAGE 54

54 of 74 Figure 9. North Carolina: Percent passing AP examin ations North Carolina's 1st high school graduation exam fi rst affected the class of 1980. North Carolina's second exam first affected the class of 1998. From 1995–2000 North Carolina lost 7.7 percentage points to the nation. From 1997 –1998 North Carolina gained .5 percentage points on the nation. From 1997–2000 Nor th Carolina lost 1.3 percentage points to the nation.Table 13 Results from the Analysis of AP Scores and Particip ation RatesStateYear in which students had to pass 1st HSGE tograduate Change in % ofstudents passing APexams 1995–2000 ascompared to the nation*Change in % ofstudents taking APexams 1995–2000 ascompared to the nation*Overall Effects Alabama1985+9.6%–6.5%PositiveFlorida1979+3.9%–0.5%PositiveGeorgia1984+6.8%–1.4%PositiveIndiana2000+1.9%–0.4%PositiveLouisiana1991+2.6%–4.4%PositiveMaryland1987+0.5%+2.3%PositiveMinnesota2000+0.6%–1.6%Positive

PAGE 55

55 of 74 Mississippi1989–2.4%–4.6%NegativeNevada1981+3.2%–2.7%PositiveNew Jersey1985+1.7%+2.0%PositiveNew Mexico 1990–4.1%–1.6%Negative New York1985+7.7%+3.9%PositiveNorth Carolina 1980–7.7%+0.9%Negative Ohio1994–3.3%–2.6%NegativeSouth Carolina 1990–9.8%–3.7%Negative Tennessee1986+5.8%–1.8%PositiveTexas1987–10.5%+5.1%NegativeVirginia1986–1.6%+3.9%Negative*(Indiana and Minnesota, 1999–2000) The time period 1995–2000 In comparison to national trends from 1995–2000, students in states with high school graduation exams posted gains 1.6 times more than losses in the percentage of students passing AP exams with a score of 3 or better. Eleven states posted gains, and seven states posted losses. These gains and losses were partly artificial, however, given that gains and losses in the percent age of students passing AP exams were negatively correlated (r = –0.30) with the rat e in which students participated in the AP program. The greater the percentage of students who participated in the AP program, the lower the percentage of students passing AP exa ms, and vice versa. Compared to the national average participation rate s fell in 67% of the states with high school graduation exams since 1995 (and since 1999 for Indiana and Minnesota). In comparison to the nation participation rates increa sed in six states and decreased in twelve states in the time period from 1995–2000.Overall, 61% of the states with high-stakes tests p osted gains in the rate by which students passed AP exams with a grade of 3 or bette r from 1995–2000 (1999–2000 in Indiana and Minnesota). But those increases and dec reases in the percent passing AP exams were negatively correlated (r = –0.30) to whe ther participation rates increased or decreased at the same time. If we look at only thos e states where the participation rates did not seem to influence the percent passing AP ex ams (Note 127) as the overall correlation suggests it typically does, only Maryla nd (+), Mississippi (–), New Jersey

PAGE 56

56 of 74(+), New Mexico (–), New York (+), Ohio (–), and So uth Carolina (–) posted "true" effects, 57% of which were negative.The special case of Texas. Texas, as has been mentioned, received attention a s one of two states in which high-stakes tests purportedly i mprove achievement. Dramatic gains in the rates of students enrolled in AP courses wer e among several state indicators of achievement provided by the state in support of the ir academic gains. But another educational policy was put into effect around the s ame time as the high-stakes testing program was implemented in that state. (Note 128) The Texas state legislature substantially reduced the cost of taking AP courses and the accompanying exams. (Note 129) This highly targeted policy may have helped increa se enrollments in AP courses in Texas much more than their high-stakes testing prog ram. So the substantial drop in the percent passing the test is difficult to assess sin ce many more students took the AP tests. As we have seen, as a greater percentage of student s in a state take the test the scores go down, and as a smaller percentage of students take the test scores go up. Inferences about the meaning of test scores become more uncert ain when participation rates are not steady from one testing year to another.In conclusion, when we use the national data on AP exams as a comparison for state AP data, and we use the percent of students passing th e various AP exams as an indicator of learning in the domains of interest, we find no evi dence of improvement associated with high-stakes high school graduation exams. When cont rolling for participation rates there even appeared to be a slight decrease in the percen t of students who passed AP examinations. Further, in the states under study, h igh-stakes high school graduation exams did not result in an increase in the numbers of students preparing to go to college, as indicated by the percent of students who partici pated in AP programs from 1995–2000.ConclusionIf we assume that the ACT, SAT, NAEP and AP tests a re reasonable measures of the domains that a state's high-stakes testing program is intended to affect, then we have little evidence at the present time that such progr ams work. Although states may demonstrate increases in scores on their own high-s takes tests, transfer of learning is not a typical outcome of their high-stakes testing poli cy. The ACT data Sixty-seven percent of the states that use high s chool graduation exams posted decreases in ACT performance after high school graduation ex ams were implemented. These decreases were unrelated to whet her participation rates increased or decreased at the same time. On average, as measured by the ACT, college-bound students in states with high school graduation exam s decreased in levels of academic achievement. Moreover, participation rates in ACT t esting, as compared to the nation, increased in nine states, decreased in six states, and stayed the same in three states. If participation rates in the ACT program serve as an indicator of motivation to attend college, then there is scant support for the belief that high-stakes testing policies within a state have such an impact.The SAT data. Fifty six percent of the states that use high-stak es high school graduation exams posted decreases in SAT performance after tho se exams were implemented. However, these decreases were slightly related to w hether SAT participation rates

PAGE 57

57 of 74increased or decreased over the same time period. T hus, there is no reliable evidence of high-stakes high school graduation exams improving the performance of students who take the SAT. Gains and losses in SAT scores are mo re strongly correlated to who participates in the SAT than to the implementation of high school graduation exams. Moreover, SAT participation rates, as compared to t he nation, fell in 61% of the states with high school graduation exams. If these partici pation rates serve as an indicator for testing the belief that high-stakes testing policie s will prepare more students or motivate more students to attend college, then there is scan t support for such beliefs. Students did not participate in the SAT testing program at great er rates after high-stakes high school graduation exams were implemented.The NAEP mathematics data High-stakes testing policies did not usually impr ove the performance of students on the grade 4 NAEP math te sts. Gains and losses were more related to who was excluded from the NAEP than the effects of high-stakes testing programs in a state. However, during the 1992–1996 time period, when exclusion rates were unrelated to gains and losses in scores, mathe matics achievement decreased for students in states where high-stakes testing had be en implemented. High-stakes testing policies did not consistently improve the performan ce of students on the grade 8 NAEP math tests. Gains were more strongly correlated to who was excluded from the NAEP than to whether or not high-stakes tests were used. If anything, the weight of the evidence suggests that students from states with hi gh-stakes tests did not achieve as well on the grade 8 NAEP mathematics tests as students i n other states. The NAEP reading data High-stakes testing policies did not consistently improve the general learning and competencies of students in re ading as judged by their performance on the NAEP grade 4 reading test. This was clearest in the time periods from 1992–1994 and over the time span of from 1992–1998. The learn ing effects over these years were unrelated to the rates by which students were exclu ded from the NAEP. By 1998, however, 75% of the states with high-stakes tests h ad exclusion rates higher than the national average. These exclusionary policies were probably the reason for the apparent increases in achievement in several states. As the NAEP tests become more important in our national debates about school success and failu re the effects of the Heisenberg Uncertainty Principle, as applied to the social sci ences, seems to be evident. When these exclusion rates are taken into account, in comparis on to national trends, the reading achievement of students in states with high-stakes testing policies appeared lower, particularly for students from minority backgrounds The NAEP cohort data. Sixty-two percent of the states with high-stakes t ests posted losses on the NAEP mathematics exams as a cohort of their students moved from the 4th grade in 1996 to the 8th grade in the year 2000. Th ese gains and losses, however, must be considered artificial to some extent because of the very strong relationship of overall exclusion rates to the gains and losses that were r ecorded. This cohort analysis finds no evidence of gains in general mathematics knowledge and skills as a result of high-stakes testing policies.For the cohort of students moving from the 4th to t he 8th grade and taking the 1994 and the 1998 NAEP reading exams, gains in scores were p osted 2.3 times more often than losses in the states with high-stakes testing polic ies. Nine states (69%) posted gains, and four states (31%) posted losses. These gains and lo sses were "real" given that gains and losses in score were unrelated to overall NAEP excl usion rates. While not reflecting unequivocal support for high-stakes testing policie s, this is the one case of gains in

PAGE 58

58 of 74achievement on a transfer measure among the many an alyses we did for this report. It is also true that over this time period many reading c urriculum initiatives were being implemented throughout the country, as reading deba tes became heated and sparked controversy. Because of that it is not easy to attr ibute the gains made for the NAEP reading cohort to high-stakes testing policies. Our guess is that the reading initiatives and the high-stakes testing polices are entangled i n ways that make it impossible to learn about their independent effects.The AP data. High-stakes high school graduation exams do not im prove achievement as indicated by the percent of students passing the va rious AP exams. When participation rates were controlled there was a decrease in the p ercent of students who passed AP examinations. Further, in the states with high-stak es high school graduation exams there was no increase in the numbers of students preparin g to go to college, as indicated by the percent of students who chose to participate in AP programs from 1995–2000. Final thoughts. What shall we make of all this? At the present ti me, there is no compelling evidence from a set of states with high-stakes testing polic ies that those policies result in transfer to the broader domains of knowledge and skill for w hich high-stakes test scores must be indicators. Because of this, the high-stakes tests being used today do not, as a general rule, appear valid as indicators of genuine learnin g, of the types of learning that approach the American ideal of what an educated per son knows and can do. Moreover, as predicted by the Heisenberg Uncertainty Principl e, data from high-stakes testing programs too often appear distorted and corrupted.Both the uncertainty associated with high-stakes te sting data, and the questionable validity of high-stakes tests as indicators of the domains they are intended to reflect, suggest that this is a failed policy initiative. Hi gh-stakes testing policies are not now and may never be policies that will accomplish what the y intend. Could the hundreds of millions of dollars and the billions of person hour s spent in these programs be used more wisely? Furthermore, if failure in attaining the go als for which the policy was created results in disproportionate negative affects on the life chances of America's poor and minority students, as it appears to do, then a high -stakes testing policy is more than a benign error in political judgment. It is an error in policy that results in structural and institutional mechanisms that discriminate against all of America's poor and many of America's minority students. It is now time to deba te high-stakes testing policies more thoroughly and seek to change them if they do not d o what was intended and have some unintended negative consequences, as well.Notes 1. Haladyna, Nolen, & Haas, 1991. 2. Figlio & Lucas, 2000. 3. Kreitzer, Madaus, & Haney, 1989. 4. Bracey, 1995; Heubert & Hauser, 1999; and Kreitzer, Madaus, & Haney, 1989.

PAGE 59

59 of 74 5. Linn, 2000 and Serow, 1984. 6. U.S. Department of Education, 1983 and Bracey, 1995 7. U.S. Department of Education, 1983. 8. Berliner & Biddle, 1995. 9. Quality Counts, 2001. 10. McNeil, 2000; Orfield & Kornhaber, 2001; Paris, 200 0; Sacks, 1999; and Sheldon & Biddle, 1998. 11. Madaus & Clarke, 2001 and Campbell, 1975. 12. All of the following statistics come from extensive interviews conducted with knowledgeable testing personnel throughout the Unit ed States, and Quality Counts, 2001. 13. Administrative bonuses, 2001. 14. Neufeld, 2000. 15. Folmar, 2001. 16. Commission on Instructionally Supportive Testing, 2 001. 17. California, Delaware, Michigan, Missouri, Nevada, a nd Ohio give scholarships to students for high performance on state mandated exa ms. See Quality Counts, 2001. 18. Durbin, 2001 and Ross, 2001. 19. Thanks to Professor J. Ryan, Arizona State Universi ty, for suggesting we investigate this story. 20. National Governor's Association, 2000; "Civil right s coalition," 2000; and "Using tobacco settlement revenues," 1999. 21. Heller, 1999. See also Durbin, 2001; Ross, 2001; an d Swope & Miner, 2000. 22. "Civil rights coalition," 2000. 23. Heller, 1999. 24. Delaware, Ohio, South Carolina, and Texas have plan s to promote students using test scores by the year 2003. Interview data and Qu ality Counts, 2001. 25. Florida implemented its first minimum competency te st for the class of 1979, North Carolina implemented its first minimum competency t est for the class of 1980, and Nevada implemented its first minimum competency tes t for the class of 1981. 26. U.S. Department of Education, 1983.

PAGE 60

60 of 74 27. States that currently use high school graduation ex ams to grant or withhold diplomas are Alabama, Florida, Georgia, Indiana, Louisiana, Maryland, Minnesota, Mississippi, Nevada, New Jersey, New Mexico, New York, North Car olina, Ohio, South Carolina, Tennessee, Texas, and Virginia. Hawaii used a test until 1999 and has plans to implement a different exam in 2007. 28. States that are developing high school exit exams a re Alaska, Arizona, California, Delaware, Hawaii, Massachusetts, Utah, Washington, and Wisconsin. 29. Data illustrated in this chart were collected throu gh telephone interviews and cross-checked with information provided in Quality Counts, 2001. States are counted the year the first graduating class was (or will be) af fected by the state's first high school high-stakes graduation exam. For example, since the class of 1987 was the first class that had to pass the TEAMS in Texas, Texas was defi ned as a state with a high school exit exam in 1987. 30. Percentages were calculated using 1997 National Cen ter for Education Statistics finance data available: http://nces.ed.gov/. Data were adjusted for cost of living. 31. See Elazar's classification of state's governmental traditions of centralism and localism in Elazar, 1984. Hawaii and Alaska were no t included in his analyses so were not included in these calculations. 32. These numbers were calculated using 2000 Census Bur eau data available: http://www.census.gov/. 33. Ibid. 34. In the West, Nevada has a high school graduation ex am and Alaska, California, Utah, and Washington have exams in progress (5/10 w estern states). 35. These numbers were calculated using 1999 Census Bur eau data available: http://www.census.gov/. 36. New Mexico, Louisiana, California, Mississippi, New York, Alabama, Texas, Arizona, Georgia, South Carolina and Florida are am ong the 16 states with the highest degrees of poverty that have or have plans to imple ment high school graduation exams. For child poverty levels see 2001 Kids Count Data O nline available: http://www.aecf.org/kidscount/kc2001/. 37. Ohanian, 1999. 38. Goodson & Foote, 2001. 39. McNeil, 2000. 40. Clarke, Haney, & Madaus, 2000. 41. The most influential research we found that substan tiated the effectiveness of high-stakes testing policies came from Grissmer, D. Flanagan, A., Kawata, J., & Williamson, S., 2000. Using NAEP data, researchers in this study recommended duplicating the high-stakes testing programs in Nor th Carolina and Texas, although

PAGE 61

61 of 74concrete evidence that high-stakes testing programs caused the achievement gains noted in those states was lacking. Only a few other studi es have substantiated the positive effects of high-stakes testing. See Carnoy, Loeb, & Smith, 2000; Muller & Schiller, 2000; Scheurich, Skrla, & Johson, 2000; and Schille r & Muller, 2000. 42. Sacks, 1999 and Kohn, 2000b. 43. The attachment of accountability measures to high a cademic standards has enjoyed a full measure of bipartisan support for the last dec ade or more. Eilperin, 2001 and Valencia, Valenzuela, Sloan & Foley, 2001. 44. Haney, 2001; Haney, 2000; Neill & Gayler, 1999; and Sacks, 1999. 45. Firestone, Camilli, Yurecko, Monfils & Mayrowetz, 2 000; Goodson & Foote, 2001; Haney, 2000; Heubert & Hauser, 1999; Klein, Hamilto n, McCaffrey, & Stecher, 2000; Kohn, 2000a; Kossan & Gonzlez, 2000; Kreitzer, Mad aus, & Haney, 1989; McNeil, 2000; McNeil & Valenzuela, 2001; Reardon, 1996; Sac ks, 1999; Thomas & Bainbridge, 2001; and Urdan & Paris, 1994. 46. Chiu, 2000. 47. Robelen, 2000. 48. Sacks, 1999. 49. Salzer, 2000. 50. Kossan, 2000. 51. Domench, 2000. 52. Gardner, 1999, p. 16. 53. Shorris, 2000; and Shorris, 1997. 54. "high-stakes tests," 2000. 55. Ibid. 56. Ibid. 57. Heubert & Hauser, 1999. 58. Linn, 2000, pg. 14. 59. Heubert & Hauser, 1999, pg. 75. 60. Heubert & Hauser, 1999. 61. McNeil & Valenzuela, 2001, pg. 133. 62. McNeil & Valenzuela, 2001, pg. 134.

PAGE 62

62 of 74 63. Wright, (Forthcoming). 64. This listing of "stakes" is not exhaustive. For exa mple, local school districts and local schools may attach additional stakes to the c onsequences written into state test policies. 65. In 2004, grade promotion decisions in grades 3, 5, and 8 will be contingent upon student performance on Georgia's new Criterion-Refe renced Competency Tests. Eventually, all Georgia students will have to pass promotion tests at each grade level. 66. Beginning in the fall of 2000, grade promotion beca me contingent on grades 4 and 8 performance on the new Louisiana Educational Assess ment Program (LEAP 21) tests. Louisiana became the first state to retain students in grade using test scores. 67. Grade promotion in grade 8 is contingent on a combi nation of CTBS/5 test scores, student classroom performance, and classroom assess ments. 68. A grade promotion "gateway" exists at grade 5. Begi nning in 2002, promotion gateways will exist at grades 3 and 8. 69. Teachers in schools that perform poorly and are ide ntified as low-performing by the state face the possibility of having to take a teac her competency test. Jones, Jones, Hardin, Chapman, Yarbrough, & Davis, 1999. 70. In 2002, promotion to the 5th grade will depend on a student's 4th grade reading score on the Ohio Proficiency Reading Test. Plans t o make more grades promotion gateways are in progress. 71. In 2002, promotion to the 5th grade will depend on a student's 4th grade test scores. Plans to make more grades promotion gateways are in progress. 72. In 2003 students must pass the grade 3 reading test to be promoted to the 4th grade. In 2005 students must pass the grade 5 reading and math tests to be promoted to the 6th grade. In 2008 students must pass the grade 8 readi ng and math tests to be promoted to the 9th grade. 73. The state also uses student or school test results to evaluate teachers. Georgia has similar teacher accountability plans underway. 74. Quality Counts, 2001. 75. Information included has been pooled from the state department web sites and multiple telephone interviews with state testing pe rsonnel. 76. States with an asterisk (*) do not collect the perc ent of students who do not graduate or receive a regular high school diploma because th ey did not meet the graduation requirement at the state level. Almost 50% of the s tates do not collect this information. For these states, a rough percent was calculated by taking the number of 12th graders who on their last attempt before the intended gradu ation date did not meet the graduation requirement divided by the total 12th grade enrollm ent that year. 77. This percentage does not account for students who d ropped out, who enrolled in

PAGE 63

63 of 74alternative or GED programs, or who transferred out of state. 78. Mississippi does not collect the percent of student s who do not graduate or receive a regular high school diploma because they did not me et the graduation requirement and does not collect any data on the test after the tes t is first administered. The number of 12th graders who took, failed, or passed the test w as therefore unavailable so rough estimates were impossible to calculate. 79. New York's graduation exam requirement consists of a series of end-of-course exams. Students take the end-of-course tests as ear ly as the 7th grade or when students complete each Regents course. 80. Students in North Carolina take the competency test s only if they did not pass a series of similar, end-of-grade tests taken at the end of the 8th grade. 81. Mehrens, 1998. 82. Neill & Gayler, 2001, pg. 108. 83. Fisher, 2000, pg. 2. 84. As cited in Haney, 2000. 85. Klein, Hamilton, McCaffrey & Stecher, 2000. 86. Schrag, 2000. 87. Heubert & Hauser, 1999, pg. 132. 88. NAEP data are not collected at the high school leve l by state. As such, NAEP scores will not be used as direct indicators of how high s chool students have been affected by high school graduation exams. However, if a state h as a high school graduation exam in place it has appropriately been defined as one of t he states with high-stakes written into K–12 testing policies. Accordingly, gains over time as compared to the nation will indicate how more general high-stakes testing polic ies have improved each state's system of education. 89. Judd, Smith, & Kidder, 1991 and Smith & Glass, 1987 90. Fraenkel & Wallen, 2000; Glass, 1988; and Smith & G lass, 1987. 91. Information included was pooled from the state depa rtment web sites, multiple telephone interviews with state testing personnel, and Quality Counts, 2001. State testing personnel in all states but Florida and Vir ginia verified the information before it was included in this chart. 92. In 39% (7/18) of the high-stakes states – Georgia, Maryland, Mississippi, New York, South Carolina, Tennessee, and Virginia – students will take end-of-course exams instead of high school graduation, criterion-refere nced tests once they complete courses such as Algebra I, English 1, Physical Science, etc ... End-of-course exams seem to be the new fad, replacing high school graduation exams

PAGE 64

64 of 74 93. Since the 1960s student performance on New York's R egents Exams determined the type of diploma students receive at graduation – a Local Diploma or a Regents Diploma. 94. The competency tests are only given to 9th graders who did not pass the end-of-grade tests at the end of the 8th grade. 95. In 1983 students did not have to pass the Texas Ass essment of Basic Skills to receive a high school diploma. 96. Glass, 1988 and Smith & Glass, 1987. 97. Glass, 1988, pg. 445–446. 98. Campbell & Stanley, 1963; Glass, 1988; and Smith & Glass, 1987. 99. From 1959 to 1989 the original version of the ACT w as used. In 1989 an enhanced ACT was implemented but only scores back to 1986 we re equated to keep scores consistent across time. This explains the slight ju mps from 1985–1986 that will be apparent across all states. Although scores from 19 80–1985 have not been equated, the correlation between scores from the original and en hanced ACT assessments is high: r =.96 100. See footnote #99 to explain the large increase illu strated from 1985–1986. 101. Smith & Glass, 1987. 102. ACT composite scores (1980–2000) were available on– line at http://www.act.org. or were obtained through personal communications wi th Jim Maxey, Assistant Vice President for Applied Research at ACT. We are indeb ted to him for providing us with these data. 103. SAT composite scores (1977–2000) were available online at http://www.collegeboard.com. or were provided by personnel at the College Board. We thank those at the College Board who helped us in o ur pursuit of these data. 104. Kohn, 2000a. 105. Trends were defined in the short term, as defined b y the difference in score one year after the point of implementation, and in the long term, as defined by the difference in score the number of years from one point of interve ntion to the next or 2001 as compared to the nation. 106. Changes in participation rates as compared to the n ation (1994–2001) are listed in parentheses. 107. Correlation coefficients represent the relationship between changes in score and changes in participation or exclusion rates for par ticipating states with high-stakes tests. Only states with high-stakes tests were included in the calculations of correlation coefficients hereafter. Coefficients were calculate d separately from one year to the next for the years for which data and participation rate s were available. 108. These correlation coefficients were calculated usin g changes in score and changes

PAGE 65

65 of 74in participation rates for the years in which data and participation rates were available. 109. Within states colleges may change their policies re garding which tests are required of enrolling students. This may affect participatio n and exclusion rates hereafter. 110. Changes in participation rates as compared to the n ation (1991–1997 and 2000–2001) are listed in parentheses. 111. State NAEP composite scores (1990–2000) are availab le on–line at http://nces.ed.gov/nationsreportcard. 112. For more information on the NAEP, for example its d esign and methods of sampling see Johnson, 1992. 113. For further discussion see Neill & Gayler, 2001. 114. See Grissmer, Flanagan, Kawata, & Williamson, 2000 and Klein, Hamilton, McCaffrey & Stecher, 2000. 115. Johnson, 1992. 116. Neill & Gayler, 2001. 117. See the NAEP website at http://nces.ed.gov/nationsreportcard. 118. Haney, 2000. 119. This figure represents the r-square of each correla tion coefficient that was calculated by squaring the correlations between cha nge in score and change in exclusion rates year to year. 120. Changes in exclusion rates are listed next to chang es in score hereafter. Scores and exclusion rates were calculated as compared to the nation. 121. The exclusion rate for the nation in 1990 was not a vailable. The exclusion rate was imputed by calculating the average exclusion rate f or all states that participated in the 1990 8th grade math NAEP. 122. For a similar study see Camilli, 2000. Camilli test ed claims made by Grissmer et al., 2000, that large NAEP gains made by students f rom 1992 to 1996 in Texas were due to high-stakes tests. Camilli found, however, that the cohort of Texas students who took the NAEP math as 4th graders in 1992 and then again as 8th graders in 1996 were just average in gains. Camilli analyzed cohort gains in Texas on the NAEP 1992 and 1996 math assessment only, however. This section of the study will expand on Camilli's work to include all states with high-stakes tests. Furth er, randomly sampled cohorts of students who took the NAEP math as 4th graders in 1 996 and as 8th graders in 2000 and cohorts of students who took the NAEP reading as 4t h graders in 1994 and as 8th graders in 1998 will be examined. 123. Toenjes, Dworkin, Lorence & Hill, 2000. 124. Klein, Hamilton, McCaffrey & Stecher, 2000.

PAGE 66

66 of 74 125. AP data (1995–2000) were available in the AP Nation al Summary Reports available on-line at http://www.collegeboard.org/ap. 126. Participation rates were calculated by dividing the number of AP exams that were taken by students in the 11th and 12th grade by eac h state's total 11th and 12th grade population. Grades received on the exams were calcu lated by dividing the number of students who received a grade of 3 or above, a grad e of 3 being the minimum grade required to receive college credit, by the total nu mber of 11th and 12th grade participants. 127. "Controlling" for participation rates was possible only in this analysis. Years for which we had participation rates matched the years for which we had the percentages of students who passed AP exams. 128. "Fisher," 2000. 129. "Advanced placement," 2000.ReferencesAdministrative bonuses: Tied to student performance in Oakland. (2001, February 8). The National Education Goals Panel Weekly [On-line]. Available: http://www.negp.gov/weekly.htm Advanced placement: Growth in Texas. (2000, August 31). The National Education Goals Panel Weekly [On-line]. Available: http://www.negp.gov/weekly.htm Berliner, D. C. & Biddle, B. J. (1995). The manufactured crisis: Myths, fraud, and the attack on America's public schools Reading, MA: Addison-Wesley Publishing Company, Inc.Bracey, G. W. (1995). Variance happens: Get over it Technos, 4 (3), 22–29. Camilli, G. (2000). Texas gains on NAEP: Points of light? Education Policy Analysis Archives, 8 (42) [On-line]. Available: http://epaa.asu.edu/epaa/v8n42.html Campbell, D. T. (1975). On the conflicts between bi ological and social evolution and between psychology and moral tradition. American Psychologist, 30 (12), 1103-1126. Campbell, D. T. & Stanley, J. C. (1963). Experiment al and quasi-experimental designs for research on teaching. In N.L. Gage (Ed.) Handbook of research on teaching Chicago, IL: Rand McNally & Company.Carnoy, M., Loeb, S. & Smith, T.L. (2000, April). Do higher test scores in Texas make for better high school outcomes ? Paper presented at the American Educational Resea rch Association Annual Meeting, New Orleans, LA.Chiu, L. (2000, October 3). Education issues fuel C apitol rally. The Arizona Republic [On-line]. Available: http://www.azcentral.com/news/education/1003AIMS03. html Civil rights coalition sues over race discriminatio n in Michigan Merit Scholarship Program. American Civil Liberties Union Freedom Network, (2 000, June 27).

PAGE 67

67 of 74[On-line]. Available: http://www.aclu.org/news/2000/n062700a.html Clarke, M., Haney, W., & Madaus, G. (2000). High stakes testing and high school completion The National Board on Educational Testing and Pub lic Policy [On-line]. Available: http://www.nbetpp.bc.edu/reports.html Commission on Instructionally Supportive Testing. ( 2001). Building tests to support instruction and accountability: A guide for policym akers [On-line]. Available: http://www.aasa.org/issues_and_insights/assessment/ Domench, D. A. (2000, December). School administrator web edition [On-line]. Available: http://www.aasa.org/publications/sa/2000_12/domenec h.htm Durbin, D. (2001, March 21). Merit awards fail mino rities. The Detroit News [On-line]. Available: http://www.detnews.com/2001/schools/0103/21/c07d-20 2027.htm Eilperin. J. (2001. May 23). House backs annual rea ding, math tests. Washington Post [On-line]. Available: http://washingtonpost.com/wp-dyn/education/A63017-2 001May22.html Elazar, D. J. (1984). American federalism: A view from the states (3rd ed.). New York: Harper & Row, Publishers.Figlio, D. N. & Lucas, M. E. (2000). What's in a grade? School report cards and house prices National Bureau of Economic Research [On-line]. A vailable: http://papers.nber.org/papers/w8019 Firestone, W. A., Camilli, G., Yurecko, M., Monfils L., & Mayrowetz, D. (2000). State standards, socio-fiscal context and opportunity to learn in New Jersey. Education Policy Analysis Archives, 8 (35) [On-line]. Available: http://olam.ed.asu.edu/epaa/v8n35/ Fisher, F. (2000). Tall tales? Texas testing moves from the Pecos to Wobegon. Unpublished manuscript.Folmar, K. (2001, August 9). Lawsuit delays payment of state bonuses to teachers. San Jose Mercury News [On-line]. Available: http://www0.mercurycenter.com/premium/local/docs/aw ard09.htm Fraenkel, J. R. & Wallen, N. E. (2000). How to design and evaluate research in education (4th ed.). Boston, MA: McGraw Hill, Inc. Gardner, H. (1999). The disciplined mind: What all students should unde rstand New York: Simon & Schuster.Glass, G. V (1988). Quasi-experiments: The case of interrupted time series. In R. M. Jaeger (Ed.) Complementary methods for research in education Washington, DC: American Educational Research Association.Goodson, I. & Foote, M. (2001). Testing times: A sc hool case study. Education Policy Analysis Archives, 9 (2) [On-line]. Available: http://epaa.asu.edu/epaa/v9n2.html Grissmer, D., Flanagan, A., Kawata, J., & Williamso n, S. (2000). Improving student

PAGE 68

68 of 74achievement: What NAEP test scores tell us Santa Monica, CA: RAND Corporation [On-line]. Available: http://www.rand.org/publications/MR/MR924/ Haladyna, T., Nolen, S. B. & Haas, N. S. (1991). Ra ising standardized test scores and the origins of test score pollution. Educational Researcher, 20 (5), 2–7. Haney, W. (2000). The myth of the Texas miracle in education. Education Analysis Policy Archives, 8 (41) [On-line]. Available: http://epaa.asu.edu/epaa/v8n41/ Haney, W. (2001). Revisiting the myth of the Texas miracle in educati on: Lessons about dropout research and dropout prevention Paper prepared for the "Dropout Research: Accurate Counts and Positive Interventions" Confere nce sponsored by Achieve and the Harvard Civil Rights Project, Cambridge, MA [On-lin e]. Available: http://www.law.harvard.edu/groups/civilrights/publi cations/dropout/haney.pdf Heller, D. E. (September 30, 1999). Misdirected mon ey: Merit scholarships take resources from have-nots. Detroit Free Press [On-line]. Available: http://www.freep.com/voices/columnists/qehell30.htm Heubert, J. P. & Hauser, R. M. (Eds.) (1999). High stakes: Testing for tracking, promotion, and graduation Washington, DC: National Academy Press [On-line]. Available: http://www.nap.edu/html/highstakes/ High stakes tests: A harsh agenda for America's chi ldren ( 2000, March 31). Remarks prepared for U.S. Senator Paul D. Wellstone. Teache rs College, Columbia University [On-line]. Available: http://www.senate.gov/~wellstone/columbia.htm Johnson, E. G. (1992). The design of the National A ssessment of Educational Progress. Journal of Educational Measurement, 29 (2), 95–110. Jones, M. G., Jones, B. D., Hardin, B., Chapman, L. Yarbrough, T. & Davis, M. (1999). The impact of high-stakes testing on teachers and s tudents in North Carolina. Phi Delta Kappan, 81 (3), 199–203. Judd, C. M., Smith, E. R. & Kidder, L. H. (1991). Research methods in social relations (6th ed.). Fort Worth, TX: Harcourt Brace Jovanovic h College Publishers. Klein, S. P., Hamilton, L.S., McCaffrey, D. F., & S techer, B. M. (2000). What do test scores in Texas tell us? Education Policy Analysis Archives, 8 (49) [On-line]. Available: http://epaa.asu.edu/epaa/v8n49/ Kohn, A. (2000a). The case against standardized testing: Raising the scores, ruining the schools Portsmouth, N.H.: Heinemann. Kohn, A. (2000b, September 27). Standardized testin g and its victims. Education Week [On-line]. Available: http://www.edweek.org/ew/ewstory.cfm?slug=04kohn.h2 0 Kossan, P. & Gonzlez, D. (2000, November 2). Minor ities fail AIMS at high rate: Some backers talking change. The Arizona Republic [On-line]. Available: http://www.arizonarepublic.com/news/articles/1102ai msrace02.html Kossan, P. (2000, November 27). By trying too much too quick, AIMS missed mark.

PAGE 69

69 of 74The Arizona Republic [On-line]. Available: http://www.arizonarepublic.com/news/articles/1127ai ms27.html Kreitzer, A. E., Madaus, G. F., & Haney, W. (1989). Competency testing and dropouts. In L. Weis, E. Farrar, & H. G. Petrie (Eds.) Dropouts from school: Issues, dilemmas, and solutions Albany, NY: State University of New York Press. Linn, R. L. (2000). Assessments and accountability. Education Researcher, 29 (2), 4–15 [On-line]. Available: http://www.aera.net/pubs/er/arts/29-02/linn01.htm Madaus, G. & Clarke, M. (2001). The adverse impact of high stakes testing on minority students: Evidence from one hundred years of test d ata. In Orfield, G., & Kornhaber, M. L. (Eds.). Raising standards or raising barriers? Inequality a nd high-stakes testing in public education New York: The Century Foundation Press. McNeil, L. M. & Valenzuela, A. (2001). The harmful impact of the TAAS system of testing in Texas: Beneath the accountability rhetor ic. In Orfield, G., & Kornhaber, M.L. (Eds.). Raising standards or raising barriers? Inequality a nd high-stakes testing in public education New York: The Century Foundation Press. McNeil, L. M. (2000). Contradictions of school reform New York, NY: Routledge. Mehrens, W.A. (1998). Consequences of assessment: W hat is the evidence? Education Policy Analysis Archives, 6 (13) [On-line]. Available: http://olam.ed.asu.edu/epaa/v6n13.html Muller, C & Schiller, K.S. (2000). Leveling the pla ying field? Students' educational attainment and states' performance testing. Sociology of Education, 73 (3), 196–218. National Governor's Association. (2000, January 25) 1999 state initiatives on spending tobacco settlement revenues [On-line]. Available: http://www.nga.org/Pubs/IssueBriefs/2000/000125Toba cco.asp#Summary Neill, M., & Gayler, K. (2001). Do high-stakes grad uation tests improve learning outcomes? Using state-level NAEP data to evaluate t he effects of mandatory graduation tests. In Orfield, G., & Kornhaber, M. L. (Eds.). Raising standards or raising barriers? Inequality and high-stakes testing in public educat ion New York: The Century Foundation Press.Neufeld, S. (2000, October 2). Backlash fermenting against school tests: Groups organize to complain about STAR. San Jose Mercury News [On-line]. Available: http://www.mercurycenter.com/premium/local/docs/bac klash02.htm Ohanian, S. (1999). One size fits few: The folly of educational standar ds Portsmouth, NH: Heinemann.Orfield, G., & Kornhaber, M. L. (Eds.) (2001). Raising standards or raising barriers? Inequality and high-stakes testing in public educat ion New York: The Century Foundation Press.Pallas, A.M., Natriello, G., & McDill, E.L. (1989). The changing nature of the disadvantaged population: Current dimensions and fu ture trends Center for Research

PAGE 70

70 of 74on Elementary and Middle Schools. (ERIC Document Re production Service No. ED 320 655).Paris, S. G. (2000). Trojan horse in the schoolyard : The hidden threats in high-stakes testing. Issues in Education, 6 (1, 2), 1-16. Quality Counts 2001. (2001). Education Week [On-line]. Available: http://www.edweek.org/sreports/qc01/ Reardon, S. F. (1996). Eighth grade minimum competency testing and early h igh school dropout patterns Paper presented at the Annual Meeting of the Amer ican Educational Research Association (ERIC Document Reproduction Se rvice No. ED 400 273). Robelen, E. W. (2000, October 11). Parents seek civ il rights probe of high-stakes tests in La. Education Week [On-line]. Available: http://www.edweek.org/ew/ewstory.cfm?slug=06ocr.h20 Ross, J. (2001, April 2). High-testing kids grab ME AP cash: More students savor $2,500 reward to use for college education, expenses. Detroit Free Press [On-line]. Available: http://www.freep.com/news/education/maward2_2001040 2.htm Sacks, P. (1999). Standardized minds: The high price of America's tes ting culture and what we can do to change it Cambridge, MA: Perseus Books. Salzer, J. (2001, April 29). Georgia dropout rate h ighest in nation: Years of reform achieve little, and some fear testing will only wor sen problem. Atlanta-Journal Constitution [On-line]. Available: http://www.accessatlanta.com/partners/ajc/epaper/ editions/sunday/news_a3be6ad4f17dd10f007c.hml Scheurich, J. J., Skrla, L & Johnson, J. F. (2000). Thinking carefully about equity and accountability. Phi Delta Kappan, 82 (4), 293–299. Schrag, P. (2000, January 3). Too good to be true. The American Prospect [On-line]. Available: http://www.prospect.org/archives/V11-4/schrag-p.htm l Serow, R. C. (1984). Effects of minimum competency testing for minority students: A review of expectations and outcomes. The Urban Review, 16 (2), 67-75. Sheldon, K. M. & Biddle, B. J. (1998). Standards ac countability and school reform: Perils and pitfalls. Teachers College Record, 100 (1), 164-180. Shorris, E. (1997). New American Blues New York: W. W. Norton. Shorris, E. (2000). Riches for the poor New York: W. W. Norton. Smith, M. L. & Glass, G.V (1987). Research and evaluation in education and the social sciences Needham Heights, MA: Allyn and Bacon. Swope, K. and Miner, B. (Eds.). (2000). Failing our kids: Why the testing craze won't fix our schools Milwaukee, WI: Rethinking Schools, Ltd. Thomas, M. D. & Bainbridge, W. L. (2001). "All chil dren can learn:" Facts and

PAGE 71

71 of 74fallacies. Phi Delta Kappan, 82 (9), 660–662. Toenjes, L. A., Dworkin, A. G., Lorence, J., & Hill A. J. (2000). The lone star gamble: High stakes testing, accountability, and student ac hievement in Texas and Houston [On-line]. Available: http://www.brook.edu/gs/brown/bc_report/2000/Housto n.PDF U.S. Department of Education. (1983). A nation at risk: The imperative for educational reform [On-line]. Available: http://www.ed.gov/pubs/NatAtRisk/index.html Urdan, T. C. & Paris, S. G. (1994). Teachers' perce ptions of standardized achievement tests. Educational Policy, 8 (2), 137–157. Using tobacco settlement revenues for children's se rvices: State opportunities and actions. (1999, October). National Conference of St ate Legislatures and The Finance Project. [On-line]. Available: http://www.financeproject.org/tobaccoattach.htm Valencia, R. R., Valenzuela, A., Sloan, K., & Foley D. E. (2001). Let's treat the cause, not the symptoms: Equity and accountability in Texa s revisited. Phi Delta Kappan, 83 (4), 318-321, 326.Wright. W. E. (Forthcoming). The effects of high st akes testing on an inner-city elementary school: The curriculum, the teachers, an d the English Language Learners. Current Issues in Education [On-line]. Available: http://cie.ed.asu.edu/ About the AuthorsAudrey L. Amrein College of EducationArizona State UniversityDivision of Educational Leadership and Policy Studi es PO Box 872411 Tempe, AZ 85287-2411 Email: audrey.beardsley@cox.net Audrey L. Amrein is an Assistant Research Professio nal in the College of Education at Arizona State University in Tempe, Arizona. Her res earch interests include the study of large-scale educational policies and their effects on students from racial minority, language minority, and economically disadvantaged b ackgrounds. Specifically, she is interested in investigating the effects of high-sta kes tests, bilingual education, and charter school policies as they pertain to issues o f equity. David C. BerlinerRegents' Professor of EducationCollege of EducationArizona State UniversityTempe, AZ 85287-2411 Email: berliner@asu.edu David C. Berliner is Regents' Professor of Educatio n at the College of Education of

PAGE 72

72 of 74 Arizona State University, in Tempe, AZ. He received his Ph.D. in 1968 from Stanford University in educational psychology, and has worke d also at the University of Massachusetts, WestEd, and the University of Arizon a. He has served as president of the American Educational Research Association (AERA), p resident of the Division of Educational Psychology of the American Psychologica l Association, and as a fellow of the Center for Advanced Study in the Behavioral Sci ences and a member of the National Academy of Education. Berliner's publications inclu de The Manufactured Crisis Addison-Wesley, 1995 (with B.J. Biddle) and The Handbook of Educational Psychology Macmillan, 1996 (Edited with R.C. Calfee). Specia l awards include the Research into Practice Award of AERA, the National Association of Secondary School Principals Distinguished Service Award, and the Med al of Honor from the University of Helsinki. His scholarly interests include research on teaching and education policy analysis.AppendicesAppendices are available in either html format or R ich Text Format, the latter being a word processor file. The RTF file is 3.5 megabytes in size. Appendices in html. Appendices in Rich Text Format Copyright 2002 by the Education Policy Analysis ArchivesThe World Wide Web address for the Education Policy Analysis Archives is epaa.asu.edu General questions about appropriateness of topics o r particular articles may be addressed to the Editor, Gene V Glass, glass@asu.edu or reach him at College of Education, Arizona State University, Tempe, AZ 8 5287-2411. The Commentary Editor is Casey D. Cobb: casey.cobb@unh.edu .EPAA Editorial Board Michael W. Apple University of Wisconsin Greg Camilli Rutgers University John Covaleskie Northern Michigan University Alan Davis University of Colorado, Denver Sherman Dorn University of South Florida Mark E. Fetler California Commission on Teacher Credentialing Richard Garlikov hmwkhelp@scott.net Thomas F. Green Syracuse University Alison I. Griffith York University Arlen Gullickson Western Michigan University Ernest R. House University of Colorado Aimee Howley Ohio University

PAGE 73

73 of 74 Craig B. Howley Appalachia Educational Laboratory William Hunter University of Calgary Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Green Mountain College Dewayne Matthews Education Commission of the States William McInerney Purdue University Mary McKeown-Moak MGT of America (Austin, TX) Les McLean University of Toronto Susan Bobbitt Nolen University of Washington Anne L. Pemberton apembert@pen.k12.va.us Hugh G. Petrie SUNY Buffalo Richard C. Richardson New York University Anthony G. Rud Jr. Purdue University Dennis Sayers California State University—Stanislaus Jay D. Scribner University of Texas at Austin Michael Scriven scriven@aol.com Robert E. Stake University of Illinois—UC Robert Stonehill U.S. Department of Education David D. Williams Brigham Young University EPAA Spanish Language Editorial BoardAssociate Editor for Spanish Language Roberto Rodrguez Gmez Universidad Nacional Autnoma de Mxico roberto@servidor.unam.mx Adrin Acosta (Mxico) Universidad de Guadalajaraadrianacosta@compuserve.com J. Flix Angulo Rasco (Spain) Universidad de Cdizfelix.angulo@uca.es Teresa Bracho (Mxico) Centro de Investigacin y DocenciaEconmica-CIDEbracho dis1.cide.mx Alejandro Canales (Mxico) Universidad Nacional Autnoma deMxicocanalesa@servidor.unam.mx Ursula Casanova (U.S.A.) Arizona State Universitycasanova@asu.edu Jos Contreras Domingo Universitat de Barcelona Jose.Contreras@doe.d5.ub.es Erwin Epstein (U.S.A.) Loyola University of ChicagoEepstein@luc.edu Josu Gonzlez (U.S.A.) Arizona State Universityjosue@asu.edu Rollin Kent (Mxico)Departamento de InvestigacinEducativa-DIE/CINVESTAVrkent@gemtel.com.mx kentr@data.net.mx Mara Beatriz Luce (Brazil)Universidad Federal de Rio Grande do Sul-UFRGSlucemb@orion.ufrgs.br

PAGE 74

74 of 74 Javier Mendoza Rojas (Mxico)Universidad Nacional Autnoma deMxicojaviermr@servidor.unam.mxMarcela Mollis (Argentina)Universidad de Buenos Airesmmollis@filo.uba.ar Humberto Muoz Garca (Mxico) Universidad Nacional Autnoma deMxicohumberto@servidor.unam.mxAngel Ignacio Prez Gmez (Spain)Universidad de Mlagaaiperez@uma.es Daniel Schugurensky (Argentina-Canad)OISE/UT, Canadadschugurensky@oise.utoronto.ca Simon Schwartzman (Brazil)Fundao Instituto Brasileiro e Geografiae Estatstica simon@openlink.com.br Jurjo Torres Santom (Spain)Universidad de A Coruajurjo@udc.es Carlos Alberto Torres (U.S.A.)University of California, Los Angelestorres@gseisucla.edu