xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c20009999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00185
Educational policy analysis archives.
n Vol. 8, no. 41 (August 19, 2000).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c August 19, 2000
Myth of the Texas miracle in education / Walt Haney.
Arizona State University.
University of South Florida.
t Education Policy Analysis Archives (EPAA)
xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 8issue 41series Year mods:caption 20002000Month August8Day 1919mods:originInfo mods:dateIssued iso8601 2000-08-19
1 of 6 Volume 8 Number 41August 19, 2000ISSN 1068-2341 A peer-reviewed scholarly electronic journal Editor: Gene V Glass, College of Education Arizona State University Copyright 2000, the EDUCATION POLICY ANALYSIS ARCHIVES. Permission is hereby granted to copy any article if EPAA is credited and copies are not sold. Articles appearing in EPAA are abstracted in the Current Index to Journals in Education by the ERIC Clearinghouse on Assessment and Evaluation and are permanently archived in Resources in Education The Myth of the Texas Miracle in Education Walt Haney Boston CollegeRelated articles: Klein et al.: Vol. 8 No. 49 Camilli: Vol. 8 No. 42 Toenjes & Dworkin: Vol. 10 No. 17 Abstract:I summarize the recent history of education reform and statewide testing in Texas, which led to introduction of the Texas As sessment of Academic Skills (TAAS) in 1990-91. A variety of evi dence in the late 1990s led a number of observers to conclude that th e state of Texas had made near miraculous progress in reducing dropouts and increasing achievement. The passing scores on TAAS tests were arbitrary and discriminatory. Analyses comparing TAAS reading, wr iting and math scores with one another and with relevant high scho ol grades raise doubts about the reliability and validity of TAAS s cores. I discuss problems of missing students and other mirages in T exas enrollment statistics that profoundly affect both reported dro pout statistics and test scores. Only 50% of minority students in Texas have been progressing from grade 9 to high school graduation since the in itiation of the TAAS testing program. Since about 1982, the rates at whi ch Black and Hispanic
2 of 6students are required to repeat grade 9 have climbe d steadily, such that by the late 1990s, nearly 30% of Black and Hispanic students were "failing" grade 9. Cumulative rates of grade retent ion in Texas are almost twice as high for Black and Hispanic students as fo r White students. Some portion of the gains in grade 10 TAAS pass rat es are illusory. The numbers of students taking the grade 10 tests who w ere classified as "in special education" and hence not counted in schools accountability ratings nearly doubled between 1994 and 1998. A sub stantial portion of the apparent increases in TAAS pass rates in the 19 90s are due to such exclusions. In the opinion of educators in Texas, s chools are devoting a huge amount of time and energy preparing students s pecifically for TAAS, and emphasis on TAAS is hurting more than hel ping teaching and learning in Texas schools, particularly with at -risk students, and TAAS contributes to retention in grade and dropping out. Five different sources of evidence about rates of high school comp letion in Texas are compared and contrasted. The review of GED statisti cs indicated that there was a sharp upturn in numbers of young people taking the GED tests in Texas in the mid-1990s to avoid TAAS. A co nvergence of evidence indicates that during the 1990s, slightly less than 70% of students in Texas actually graduated from high scho ol. Between 1994 and 1997, TAAS results showed a 20% increase in the percentage of students passing all three exit level TAAS tests (r eading, writing and math), but TASP (a college readiness test) results showed a sharp decrease (from 65.2% to 43.3%) in the percentage of students passing all three parts (reading, math, and writing). As measur ed by performance on the SAT, the academic learning of secondary school students in Texas has not improved since the early 1990s, compared wi th SAT takers nationally. SAT-Math scores have deteriorated relat ive to students nationally. The gains on NAEP for Texas fail to con firm the dramatic gains apparent on TAAS. The gains on TAAS and the u nbelievable decreases in dropouts during the 1990s are more ill usory than real. The Texas "miracle" is more hat than cattle. Click on items in list below for full text. Part 1: Introduction Part 2: Recent History of Testing in Texas Part 3: Evidence and Boosters of the Myth Part 4: Problems with TAAS Part 5: Missing Students and Other Mirages Part 6: Educators' Views of TAAS Part 7: Other Evidence on Education in Texas Part 8: Summary and Lessons from the Myth Deflated Notes and References Appendices About the AuthorWalt Haney
3 of 6Center for the Study of Testing, Evaluation and Edu cational Policy Campion Hall 323Lynch School of EducationBoston CollegeChestnut Hill, MA 02467617-552-4199617-552-8419 (Fax) Email: email@example.comHome page Walt Haney, Ed.D., Professor of Education at Boston College and Senior Research Associate in the Center for the Study of Testing Ev aluation and Educational Policy (CSTEEP), specializes in educational evaluation and assessment and educational technology. He has published widely on testing and assessment issues in scholarly journals such as the Harvard Educational Review, Review of Educational Research, and Review of Research in Education and in wide-audience periodicals such as Educational Leadership, Phi Delta Kappan, the Chronicle of High er Education and the "Washington Post." He has served on the editorial boards of Educational Measurement: Issues and Practice and the American Journal of Education and on the National Advisory Committee of the ERIC Clearinghouse on Assessment a nd Evaluation.ACKNOWLEDGEMENTS As I worked on this article over a period of more than two years, literally dozens of people helped me in numerous ways. At Boston College, grad uate students Cathy Horn, Kelly Shasby, Miguel Ramos and Damtew Teferra helped on specific portions of the work reported here. Damtew Teferra has been especially helpful over a period o f nearly two years in tracking down references and other source material and in checking accuracy of data input. Ed Rincon and Terry Hitchock both helped me more than once as I sought to build the data set on enrollments in Texas public schools over the last quarter century. Among many s cholars who have answered repeated questions and kindly provided me with references and reviews of various portions of this article in many different versions, I thank James Hoffman, John Tyl er, Jeff Rodamar, Angela Valenzuela, Bob Hauser, Duncan Chaplin, Richard Murnane, Linda McNe il, Dennis Shirley, Anne Wheelock, and Janet Baldwin. Diane Joyce, Lauren McGrath, Anna Ge raty, Courtney Danley, and Genia Young helped me, as they do everyone else in our research center, in ways too many to mention here. Also, I thank Jane Hodosi of Market Data Retrieval who, under an extraordinarily short deadline, helped me get the mailing labels that allowed us to carry out two of the surveys recounted here. Thanks too to Chris Patterson for providing me with a great deal of useful information. Three electronic mail correspondentsÂ—Audrey Amrein, Craig Bolon and Alein Jehlen also provided me with helpful suggestions and encouragement. I also wish to express my appreciation to Judge Edward Prado. Though I think he may have erred in his ruling in the GI Forum case (as may be apparent from what is to follow), during the four days I was on the stand in his courtroom, he t reated me with attention, respect and good humor. He even had the good sense to tell me simply to "answer the question," when the professor in me launched into discussions of literature on to pics on which I was questioned. My wife, Kris, and daughter, Elizabeth, also deserve great appreci ation for their tolerance in putting up with work that I told them many times would be done long befo re now. Thanks also to Gene V Glass who encouraged me to submit this work to Education Policy Analysis Archives As a former editor, I know how hard it sometimes can be to pry manuscript s away from authors who know that there are always other nooks and crannies to explore. Thanks too to nine anonymous reviewers from the EPAA Editorial Board who commented generously on a prev ious version of this article. More than anyone else, though, I wish to express my appreciation and respect for Al Kauffman. Over the past two years, on several occas ions I have cursed him under my breath (and once or twice aloud), for getting me involved with TAAS and education reform in Texas. But after spending more than twice as long on this topic as I ever thought I would, I have developed
4 of 6 enduring respect for Al, his integrity and good hum or, and his quest for truth and justice. I regret that I was not able to complete all of the analyses reported here before the TAAS trial. But it will be a long time before I let Al talk me into working on anoth er case, even if next time he tries to tell me I am not his second choice as an expert witness. Any errors of fact or interpretation in t his report are, of course, despite the enormous hel p of many good and generous people, entirely my responsi bility. No corporations, foundations or anonymous donors have supported the research report ed here. But I do owe an enormous debt of gratitude to Boston College for awarding me a sabba tical leave during the 1999-2000 academic year. Without the leave, there is no way I would ha ve been able to complete this research. I did not do what I said I would when I applied for sabbatica l leave, but I hope that the work reported here will win me, if not forgiveness, at least tolerance for being distracted from well-intentioned plans. And on the topic of forgiveness, I am alm ost certain there are people I should have thanked here but could not remember. Forgive me, please, bu t I simply had to finish this work before returning to normal academic duties in September.Copyright 2000 by the Education Policy Analysis ArchivesThe World Wide Web address for the Education Policy Analysis Archives is epaa.asu.edu General questions about appropriateness of topics o r particular articles may be addressed to the Editor, Gene V Glass, firstname.lastname@example.org or reach him at College of Education, Arizona State University, Tempe, AZ 8 5287-0211. (602-965-9644). The Commentary Editor is Casey D. C obb: email@example.com .EPAA Editorial Board Michael W. Apple University of Wisconsin Greg Camilli Rutgers University John Covaleskie Northern Michigan University Alan Davis University of Colorado, Denver Sherman Dorn University of South Florida Mark E. Fetler California Commission on Teacher Credentialing Richard Garlikov firstname.lastname@example.org Thomas F. Green Syracuse University Alison I. Griffith York University Arlen Gullickson Western Michigan University Ernest R. House University of Colorado Aimee Howley Ohio University Craig B. Howley Appalachia Educational Laboratory William Hunter University of Calgary Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Green Mountain College Dewayne Matthews Western Interstate Commission for HigherEducation William McInerney Purdue University Mary McKeown-Moak MGT of America (Austin, TX)
5 of 6 Les McLean University of Toronto Susan Bobbitt Nolen University of Washington Anne L. Pemberton email@example.com Hugh G. Petrie SUNY Buffalo Richard C. Richardson New York University Anthony G. Rud Jr. Purdue University Dennis Sayers Ann Leavenworth Centerfor Accelerated Learning Jay D. Scribner University of Texas at Austin Michael Scriven firstname.lastname@example.org Robert E. Stake University of IllinoisÂ—UC Robert Stonehill U.S. Department of Education David D. Williams Brigham Young UniversityEPAA Spanish Language Editorial BoardAssociate Editor for Spanish Language Roberto Rodrguez Gmez Universidad Nacional Autnoma de Mxico email@example.com Adrin Acosta (Mxico) Universidad de Guadalajaraadrianacosta@compuserve.com J. Flix Angulo Rasco (Spain) Universidad de Cdizfelix.firstname.lastname@example.org Teresa Bracho (Mxico) Centro de Investigacin y DocenciaEconmica-CIDEbracho dis1.cide.mx Alejandro Canales (Mxico) Universidad Nacional Autnoma deMxicocanalesa@servidor.unam.mx Ursula Casanova (U.S.A.) Arizona State Universitycasanova@asu.edu Jos Contreras Domingo Universitat de Barcelona Jose.Contreras@doe.d5.ub.es Erwin Epstein (U.S.A.) Loyola University of ChicagoEepstein@luc.edu Josu Gonzlez (U.S.A.) Arizona State Universityjosue@asu.edu Rollin Kent (Mxico)Departamento de InvestigacinEducativa-DIE/CINVESTAVrkent@gemtel.com.mx email@example.com Mara Beatriz Luce (Brazil)Universidad Federal de Rio Grande do Sul-UFRGSlucemb@orion.ufrgs.brJavier Mendoza Rojas (Mxico)Universidad Nacional Autnoma deMxicojaviermr@servidor.unam.mxMarcela Mollis (Argentina)Universidad de Buenos Airesmmollis@filo.uba.ar Humberto Muoz Garca (Mxico) Universidad Nacional Autnoma deMxicohumberto@servidor.unam.mxAngel Ignacio Prez Gmez (Spain)Universidad de Mlagaaiperez@uma.es
6 of 6 Daniel Schugurensky (Argentina-Canad)OISE/UT, Canadadschugurensky@oise.utoronto.ca Simon Schwartzman (Brazil)Fundao Instituto Brasileiro e Geografiae Estatstica firstname.lastname@example.org Jurjo Torres Santom (Spain)Universidad de A Coruajurjo@udc.es Carlos Alberto Torres (U.S.A.)University of California, Los Angelestorres@gseisucla.edu
1 of 2 Education Policy Analysis ArchivesVolume 8 Number 41The Texas Miracle in Education Walt Haney 1. IntroductionAccountability Narrows Racial Gap in Texas; Expand It Â—Editorial headline, USA Today March 21, 2000, p. 14A For several years the state of Texas has been widely cited as a model of standards-based education reform. Some have even ca lled recent educational progress in Texas a miracle. Indeed Texas has been cited from w est coast to east as a model worthy of emulation by other states. As in the USA Today editorial cited above, the Texas system of educational accountability has even been touted as a model to be followed in federal education legislation. In this article, I review ev idence to show that the "miracle" of education reform in Texas is really a myth and illu sion. What should be learned from this is not just to be suspicious of the "tall tales" of Texans (as Jeff Rodamar, 2000, put the matter), but that more broadly, we should be cautio us in drawing sweeping conclusions about large and complex educational endeavors, base d on only one form of evidence, such as test scores. This may seem strange advice c oming from one who would call a purported miracle a myth. But as I will explain, ev en if the Texas approach to education reform is not worthy of emulation elsewhere, there is still something to be learned from Texas about how not to judge the health of educatio n and the progress of education reform elsewhere. The story of the Texas miracle is reporte d here in eight parts. Following this introduction, Part 2 provides a summary of recent e ducation history in Texas, with particular focus on how statewide testing has evolv ed in the Lone Star state over the last two decades into the Texas Assessment of Academic S kills (TAAS) which is now the linchpin of educational accountability in Texas. Pa rt 3 summarizes evidence upon which the Texas tale of success in the 1990s is based, an d recounts some of the praise that has been lavished recently on the Texas miracle story. Part 4 summarizes some of the problems with the TAAS tests that make them suspect as sources of evidence about the progress of education in Texas. Part 5 describes th e problem of missing students in Texas, and other mirages, reminding us that when tr ying to interpret summary test results, it is always helpful to pay attention to who is and is not present for the testing. Part 6 summarizes views of educators in Texas about TAAS a nd teaching and learning in the state. Part 7 reviews other evidence on the status of education in Texas. Finally, the conclusion suggests some broader lessons from this story of the myth of the education "miracle" in TexasÂ—about both the limits of test-ba sed accountability and the need to remember the broad aims of education in a democrati c society. Before reviewing the story of the Texas miracle," I offer two caveatsÂ—one very large, and the other inevitable in any work of limi ted scope. The big caveat is that approximately two years ago Al Kauffman, Regional C ounsel for the Mexican American Legal Defense and Education Fund (MALDEF) persuaded me to serve as an expert witness in a MALDEF lawsuit, GI Forum v. Texas Education Agency brought against the state of Texas. As a result, I served as one of sev eral expert witnesses for MALDEF in its
2 of 2effort to prove that that the high school graduatio n test in Texas, the TAAS "exit level" test, has illegal discriminatory impact on Black an d Hispanic students. After a trial in the fall of 1999 (which in the press came to be called the "TAAS trial,") the federal judge who heard the case, Edward C. Prado, ruled on Janua ry 7, 2000, against MALDEF and for the state of Texas. (Note 1) In essence, Judge Prado ruled that while TAAS does have discriminatory impact on Black and Hispanic student s, the use of TAAS to withhold diplomas is not illegal because it is educationally necessary. I am not a legal expert and, hence, in the body of this article will comment onl y on matters of evidence and facts in the TAAS case. Nonetheless, in appendices to this a rticle, I provide the full text of Judge Prado's ruling, documentation on summary arguments made by the two sides in the case, and my own summary comments on the judge's ruling. (Note 2) The second caveat is one that is inevitab le in any presentation in any medium. One can never tell the whole story. Texas is well known for its size. Hence the territory I try to cover in this article is rather large. To provide s ome indication of its scope, the TAAS trial lasted for five weeks, and in addition to dir ect testimony, was based on hundreds of documents submitted by plaintiffs and defendants. I ndeed, my personal files on TAAS and the TAAS case occupy six feet of shelf space an d several megabytes of computer storage. So, in trying to recount the Texas miracle story and why I think it is a myth, I will have to be somewhat selective. This may seem d angerous since I was on one side of a hard fought legal battle. I make no apologies for that, but want to make it clear simply as fair warning to readers. I leave it to others to judge how fair-minded I have been in recounting this version of the Texas miracle. And o ne final caution. During preparation for the TAAS trial, Mr. Kauffman, the lead attorney for MALDEF in the TAAS case, several times referred to me as his "Yankee testing expert." While I do now reside in New England, I am actually a native of Texas. So beware the tall tales of Texans.
1 of 4 Education Policy Analysis ArchivesVolume 8 Number 41The Texas Miracle in Education Walt Haney 2. Recent History of Testing in Texas Texas has seen several waves of education reform over the last several decades. As with reform efforts in many other states, testing h as featured prominently in these efforts. In 1971, in the case of Rodriquez v. San Antonio Independent School Distric t a federal court ruled the system of financing public schools in Texas to be unconstitutional in that it discriminated against students living in poor school districts. Although the U.S. Supreme Court reversed the decision in the Rodriquez case in 1973, the case helped spur the Texas legislature into trying to remedy inequit ies in school finance (Funkhouser, 1990, p. 6). In 1979, the Texas legislature passed the Equal Educational Opportunity Act, which established the first state mandated testing program (Office of Technology Assessment, 1987, p. 271). This was the Texas Asses sment of Basic Skills (TABS), a survey-type assessment, without sanctions for test takers, from 1980 to 1985. Following recommendations of a Select Com mittee on Education (chaired by H. Ross Perot), in 1984 the Texas legislature passed a comprehensive education reform law mandating the most sweeping changes in education in Texas in 30 years (Funkhouser, 1990, p. 3). Among other things, the law establishe d a statewide curriculum (called the Essential Elements), required students to achieve a score of 70 to pass their high school courses, mandated the "no pass, no play" rule (wher eby students could not participate in varsity sports if they did not pass high school cou rses), required teachers to pass a proficiency test; and mandated changes in the state wide testing program (Funkhouser, 1990). Commenting on the state of education in Texa s in the mid-1980s, Harold Hodgkinson observed that "The current Texas school reform is as 'top down' as can be found in the U.S. The costs of operating the system now enacted into law will be severe and the retention rate to high school graduation wi ll likely decrease" (Hodgkinson, 1986). The 1984 law mandated basic skills testin g of students in each odd numbered grade (Funkhouser, 1990, p. 199). The new testing program called the Texas Educational Assessment of Minimum Skills or TEAMS, was implemen ted in 1985 and tested students in grades 1, 3, 5, 7, 9 and 11. Under the 1984 law, high school students were required to pass the "exit level" version of TEAMS in order to receive a high school diploma, based on a passing score set by the State Board of Educat ion (Office of Technology Assessment, 1987, pp. 272-75). The TEAMS exit-level tests were given for the first time in October 1985 to approximately 190,000 eleventh g raders. Eighty-eight percent of students passed the math portion of TEAMS; 91 perce nt passed the English language arts portion; and 85 percent passed both. Students who f ailed either portion of TEAMS had an opportunity to retake the tests in May 1986. The ma jority of students, who had failed in the fall, passed the spring retest (Funkhouser, 199 0, pp. 199-201). In Fall 1990, changes in state law requir ed the implementation of a new "criterion-referenced" testing program, the Texas A ssessment of Academic Skills (TAAS) and also established end-of-course tests for selected high school course subjects. As compared with TEAMS, TAAS was intended to shift the focus of assessment from "minimum skills to academic skills" and to test "hi gher-order thinking skills and problem
2 of 4 solving ability." (TEA, 1997, p. 1). The TAAS is de veloped for Texas by National Computer Systems, which subcontracts for portions o f work to Harcourt Brace Educational Measurement (for item development) and Measurement Incorporated (for scoring of the open-ended portions of the TAAS). TA AS was administered to students in grades 3, 5, 7, and 11 in Fall of 1990 and 1991. Results of the fall 1990 tryout of TAAS s howed that the new tests were much more difficult than the TEAMS tests had been. Table 2.1 shows results from the Fall 1990 grade 11 field test of TAAS. These results made cle ar that if the passing score on TEAMS (70% correct) was maintained for TAAS, passin g rates would fall from the 80-90% range seen on TEAMS to the 40-60% range on T AAS (with pass rates for Black and Hispanic students on the math portion of TAAS f alling to the 27-33% range).Table 2.1 Possible Passing Scores Based on Texas Assessment of Academic Skills (TAAS) Field Te st Results Exit (11) (1990)Mathematics (Total possible score is 60 items corre ct) Projected Percent Passing Number of itemsPercent of ItemsBlackHispanicWhite T otal 3660%43%50%68%59%4270%27%33%50%42% Reading (Total possible score is 48 items correct) Projected Percent Passing Number of itemsPercent of ItemsBlackHispanicWhite T otal 2960%68%68%84%77%3471%45%46%71%60% Writing (Total possible score is 40 items correct) Projected Percent Passing Number of itemsPercent of ItemsBlackHispanicWhite T otal 2460%50%70%77%69%2870%38%55%64%56%(Data presented to the Texas Board of Education, Ju ly 1990. Reproduced from TEA, 1997, appendix 9 of Texas Student Assessment Program Tech nical Digest for the Academic Year 1996-1997, p. 347.) The 1992-93 school year was a time of tra nsition for statewide testing in Texas with some grades being tested in the fall and some in the spring. In the Spring of 1994, the TAAS reading and mathematics assessments were a dministered to students in grades 3, 4, 5, 6, 7, 8, and 10; and the TAAS writing test s were administered at Grades 4, 8, and 10. If students do not pass the grade 10 or exit le vel TAAS, they may continue taking
3 of 4portions they have not yet passed during grades 11 and 12. Since 1994, the TAAS Reading, Mathematics and Writing tests have consist ently been administered to students in grades 4, 8 and 10 in the spring of each year. In addition to being used to help ensure student learning, TAAS results are also used to hold schools and school systems "accountabl e" for student learning. By state law, the State Board of Education is mandated to rate th e performance of schools and school districts according to a set of "academic excellenc e indicators," including TAAS results, dropout rates and student attendance rates (TEA, 19 97, p. 159). State law also prescribes that student performance data be disaggregated by e thnicity and socioeconomic status. The performance rating system holds that school per formance is not acceptable if the performance of all subgroups is not acceptable. Bas ed primarily on percentage of students passing each of the TAAS tests, the more than 6,000 schools in Texas have been rated since 1994 as "exemplary," "recognized," "acceptabl e" or "unacceptable." TAAS passing standards [for schools' performance ra tings] . are based on the passing rates for all students and the disaggre gated rates for four student groups: African American, Hispanic, White, and Econ omically Disadvantaged. Of the four categories, only the exe mplary rating has had a consistent passing standard, requiring at least 90 percent of all students and each student group to pass each subject area. The r ecognized rating has increased from at least 65 percent of students pass ing in 1994 to a current 70 percent, the acceptable rating has gone from at lea st 25 percent passing to 30 percent, and the low-performing rating from less th an 25 percent to less than 30 percent. (Gordon & Reese, 1997, p. 347-480) Schools are eligible for cash awards for high ratings; and if they are rated as low performing twice in a row, they are subject to sanc tions from the Texas Education Agency, including possible closure. In short, over the past decade TAAS has b ecome an extremely high stakes test for students, educators and schools in the state of Tex as. If students do not pass all three portions of the exit level version of TAAS (reading math and writing), they cannot graduate from high school, regardless of grades in their high school courses. And schools' reputations, funding and their continued existence depend on students' performance on TAAS. (Note 3) Before summarizing TAAS results in the 19 90s, it is useful to describe the tests themselves. The focus of test-based accountability in Texas is on the TAAS tests of reading, mathematics and writing (there are also TA AS tests of social studies and science and end-of course tests in some high school subject s). The TAAS tests are mostly multiple-choice in format. The numbers of questions on the TAAS tests varies somewhat across grade level versions, but the grade 10 (or e xit level) versions contain 48 reading questions, 60 math questions and 40 writing questio ns. The TAAS writing test also includes an open-ended question to which students m ust write their answers. The written composition portion of the TAAS writing test is sco red on a 4-point scale (released versions of the TAAS tests are available atwww.tea.state.tx.us/student.assessment/release.htm) Finally, I should mention that though sev eral observers have described the TAAS tests as criterion-referenced, traditional norm-ref erenced test construction techniques (such as screening of candidate items in terms of i tem discrimination) have been used in their construction. Also it is clear that the TAAS tests have so few items that they cannot be used to yield reliable scores below the aggregat e reading, math and writing
4 of 4levelsÂ—and as we will see, there is ample cause to doubt their reliability and validity even at these aggregate levels. Moreover, as will b e explained, the passing scores on the TAAS test were set without any reference to perform ance criteria external to TAAS, but only after review of group performance on TAASÂ—in e ffect a norm-referenced rather than criterion-referenced comparison. As mentioned, by law the Texas State Boar d of Education was required to set passing scores on the TAAS tests (or as legislative language put it, "determine the level of performance considered to be satisfactory," TEA, 1997, p. 157). Here is how the Texas Student Assessment Program Technical Digest describes the evolution of the TAAS "passing standard": In 1990 the State Board of Education set minimum ex pectations as equivalent to 70% of the multiple-choice items corr ect on the fall 1990 test and a score of at least 2 on the written compositio n. The 70%-equivalent standard was in effect beginning with the 1991-1992 school year. The 1990-1991 school year served as a transition from t he previous assessment program, The Texas Assessment of Minimum Skills (TE AMS). The SBOE set the interim minimum expectations standard at 65 % of the multiple-choice items correct for Grades 3, 3-Spanish, and 5, and 6 0% of the items correct for grades 7, 9 and exit level. A student also had to score at least 2 on the written composition to meet minimum expectations on the writing test. (TEA, 1997, p. 28) So, since 1992 the passing scores on the TAAS exit level tests (reading, writing and math) have been set at a level equivalent to th e 70% of items correct on Fall 1990 form of the tests. As new forms of the tests were u sed in subsequent years, analysts used test-equating methods to try to make passing scores on the new forms equivalent to 70% correct on the 1990 forms. (Note 4)
1 of 8 Education Policy Analysis ArchivesVolume 8 Number 41The Texas Miracle in Education Walt Haney 3. Evidence and Boosters of the Myth Given the consequences attached to perfor mance on TAAS, it is not surprising that this test has had major impact on education in Texa s. At first glance, this impact appears to have been largely positive; and it is evidence o f the apparent positive impact of TAAS, and the Texas system of school accountability, that has helped give rise to the "miracle" story of education reform in Texas over the last de cade. Four kinds of evidence seem to have been most widely cited as indicative of major improvements in education in Texas, namely: 1) shar p increases in the overall pass rates on TAAS during the 1990s; 2) apparent decreases in the achievement gap between White and minority students in Texas (again based on TAAS scores); 3) seemingly decreasing rates of students dropping out of school before hig h school graduation; and, 4) apparent confirmation of TAAS gains by results on the Nation al Assessment of Educational Progress (NAEP).3.1 Improved results on TAAS The main evidence contributing to the per ception of dramatic educational gains in Texas during the 1990s (what the March 21, 2000 USA Today editorial called "widespread improvement in student achievement") se ems to have been sharp increases in passing rates on the TAAS. TAAS was introduced i n Texas in 1990-91, and, as recounted previously, was administered at somewhat varied grades (and seasons) during the early 1990s. In several publications, the TEA h as presented TAAS pass rates aggregated across different grades. Inasmuch as thi s sort of aggregation may obscure as much as it reveals, here I present results mainly f or grade 10 TAAS testing. Table 3.1 (and corresponding Figure 3.1) shows the results on the grade 10 TAAS test from 1994 to 1998.Table 3.1 TAAS Grade 10 Percent Passing 1994-1998 All Students Not in Special Education(Does Not Include Year-Round Education Results) 19941995199619971998 TAAS Reading 76%76%81%86%88% TAAS Math 57%59%65%72%78% TAAS Writing 81%86%85%88%89% TAAS All Tests 52%54%60%67%72%
2 of 8 Source: Selected State AEIS Data: A Multi-Year Hist ory (www.tea.state.tx.us/student.assessment/swresult/gd 10sp98.htm) As can be seen from these data, grade 10 TAAS results show a pattern of steady improvement from 1994 through 1998, with the percen tage of students passing the TAAS reading test rising from 76% to 88%; the percentage passing the TAAS math test rising from 57% to 78%; and the corresponding increase for the TAAS writing test going from 81% to 89%. The percentage of grade 10 students pas sing all three tests increased from 52% in 1994 to 72% in 1998.3.2 Decrease in Race Gap in Test Scores Even as test scores were improving overal l, the gaps in achievement between White and nonwhite students (specifically Black and Hispanic students) appeared to have been narrowing. The USA Today editorial (3/21/2000) reported that "Texas is one of the few states that has narrowed its racial learning ga p." Figure 3.2 and Table 3.2 show how the "racial learning gap" appears to have narrowed on the grade 10 TAAS tests (for economy of presentation here, I do not show results separately for the reading, writing, and math tests, but only the percentages of grade 1 0 students passing all three tests).Table 3.2 TAAS Grade 10 Percent Passing All Tests by Race 199 4-1998 All Students Not in Special Education (Does Not Include Year-Round Education Results) 19941995199619971998Black29%32%38%48%55%
3 of 8 Hispanic35%37%44%52%59%White67%70%74%81%85%Source: Selected State AEIS Data: A Multi-Year Hist ory: www.tea.state.tx.us/student.assessment/swresult/gd1 0sp98.htm As can be seen, in 1994 there was a huge disparity in the grade 10 pass rates for Black and Hispanic students as compared with White students. The 1994 White pass rate of 67% was 38 points higher than the Black pass rat e of 29%; and 32 points more than the Hispanic rate of 35%. In other words, in 1994, White students were passing the grade 10 TAAS tests at about double the rate of Black and Hispanic students. This gap was just about what might have been predicted based on the 1 990 field test results (see Table 2.1). By 1998, the White grade 10 pass rate had climbed 1 8 points to 85%. But the Black and Hispanic pass rates had climbed even more, 26 and 2 4 points respectively. So in a period of just five years, the race gaps had been reduced from 38 to 30 percentage points for Whites and Blacks and from 32 to 26 for Whites comp ared with Hispanic tenth grade students. Or in other words, minorities had increas ed their rate of passing grade 10 TAAS tests from less than 50% of the White pass rate to two-thirds of the White pass rate in just four years.3.3 Decreases in Dropout Rates If the dramatic gains in grade 10 pass ra tes overall and substantial decreases in the "racial learning gap" were not sufficiently remarka ble, official TEA statistics indicated that over the same interval high school dropout rat es were also declining.Table 3.3 Texas Annual Dropout Rate, Grades 7-12 1994-1998 19941995199619971998
4 of 8 All Students2.8%2.6%1.8%1.8%1.6%Black3.6%3.2%2.3%2.3%2.0%Hispanic4.2%3.9%2.7%2.5%2.3%White1.7%1.5%1.2%1.1%1.0%Source: Selected State AEIS Data: Five Year History www.tea.state.tx.us/perfreport/aeis/hist/state.html As shown in Table 3.3, TEA data indicated that between 1994 and 1998, even as pass rates on the TAAS were increasing among grade 10 students, dropout rates were decreasing not just among secondary students overal l, but also for each of the three race groups for which data were disaggregated. In short, what appeared to be happening in Texas schools in the 1990s truly did seem to be a m iracle. As Peter Schrag has recently written: "So me of Texas's claims are so striking they border on the incredible. The state's official numb ers show that even as TAAS scores were going up, dropout rates were cut from an annua l 6.1 percent in 1989-90 to 1.6 percent last year. If ever there was a case of some thing being too good to be true, this is it" (Schrag, 2000). But before reviewing the doubts of Schrag and others, let me recap one additional source of evidence that seemed to co nfirm the miracle story. 3.4 NAEP Results for Texas Anyone even remotely familiar with recent education history of the United States must view with some skepticism the meaningfulness o f the almost inevitable increases in performance that follow introduction of a new testi ng program. When a new testing program is introduced, students and teachers have l ittle familiarity with the specifics of the new tests. But after a few years, they become f amiliar with the style and format of the tests and students can be coached specifically for the test in question. Hence, performanceÂ—or at least average test scoresÂ—almost inevitably increases. That students can be successfully coached for particular tests has been well known among education researchers for decades. As far bac k as 1927, Glimore, for example, reported that students could be coached on Otis gro up intelligence tests "to the point of increasing their standing and score in intelligence tests even in the case of the material used in coaching being only similar and not identic al with that of the basic test" (Gilmore, 1927, p. 321). Indeed what happens when students ar e coached for a specific test has come to called the "saw tooth" phenomenon because o f the regular pattern in which scores steadily rise following introduction of a ne w testing program, only to fall dramatically when a different test is introduced (L inn, 2000, p. 7). The phenomenon of falsely inflated test s cores was brought to wide public attention in the late 1980s and early 1990s because of publicity for what came to be known as the "Lake Wobegon" phenomenon in test resu lts. Lake Wobegon is the mythical town in Minnesota popularized by Garrison Keillor in his National Public Radio program "A Prairie Home Companion." It is the town where "all the women are strong, all the men are good looking, and all the children are above average." In the late 1980s it was discovered that Lake Wobegon seemed to have inv aded the nation's schools. For according to a 1987 report by John Cannell, the vas t majority of school districts and all states were scoring above average on nationally nor med standardized tests (Cannell, 1987). Since it is logically impossible for all of any population to be above average on a
5 of 8single measure, it was clear that something was ami ss, that something about nationally normed standardized tests or their use had been lea ding to false inferences about the status of learning in the nation's schools. Cannell was a physician by training and n ot a specialist in education or education research. His original (1987) report was published by "Friends for Education," the foundation he established to promote accountability in education. A revised version of Cannell's report was published in the Summer 1988 i ssue of Educational Measurement: Issues and Practice (Cannell, 1988) together with responses and commen tary from representatives of major test publishers and offici als of the U.S Department of Education (Phillips and Finn, 1988; Drahozal and Frisbie, 198 8; Lenke and Keene, 1988; Williams, 1988; Qualls-Payne, 1988; Stonehill, 1988). Cannell 's charges regarding misleading test results were hotly debated in this and other forums Some people doubted whether the Lake Wobegon phenomenon was real (that is, whether large majorities of states, schools and districts were in fact scoring above average on the national norms of the tests), while most observers accepted the reality of the phenomen on but disputed what caused it. Among the causes suggested and debated were problem s in the original norming of the tests, outdated norms, lack of test security, manip ulation of populations of students tested, artificial statistical manipulation of test results, and teachers and schools teaching to the tests, either purposely or inadvertently. The publicity surrounding the Lake Wobego n phenomenon was sufficiently widespread that the U.S. Department of Education fu nded researchers at the Center for Research on Evaluation, Standards and Student Testi ng (CRESST) to investigate. On the basis of a survey of state directors of testing, Sh epard (1989) concluded that the conditions for inflated test resultsÂ—such as high s takes being pinned on test results, efforts to align curricula to the tests, and direct teaching to the testsÂ—existed in virtually all of the states. And on the basis of an analysis of up to three years of test results from 35 states from which they were available, Linn, Graue and Sanders (1989) essentially confirmed Cannell's basic finding that test results across the nation were implausibly inflatedÂ—Lake Wobegon had invaded the nation's schools. For instance, they f ound that "for grades 1 through 6, the percentage of students scoring above the national median in mathematics ranges from a low of 58% in grade 4 for the 1985 school year to a high of 71% in grade 2 for the 1987-88 school year . (p. 8). Linn, Graue and Sanders concluded that the use of old norms was one cause o f the abundance of "above average scores" (p. 23), but also pointed out that in situa tions in which the same form of a test is used year after year, "increased familiarity with a particular form of a test" (p.24) likely contributed to inflated scores. The practice of using a single form of a test year after year poses a logical threat to making inferences about the larger domain of achievement. Scores may be raised by focusing narrowly on the test obje ctives without improving achievement across the broader domain that the test objectives are intended to represent. Worse still, practice on nearly ident ical or even the actual items that appear on a test may be given. But as Dyer apt ly noted some years ago, "if you use the test exercises as an instrument of teaching you destroy the usefulness of the test as an instrument for measuri ng the effects of teaching (Dyer, 1973, p. 89)." (Linn, Graue and Sanders, 198 9, p. 25). The problem was illustrated even more cle arly in a subsequent study reported by Koretz, Linn, Dunbar & Shepard (1991), which compar ed test results on one "highstakes" test, used for several years in a large urb an school district, with those on a
6 of 8comparable test that had not been used in that dist rict for several years. They found that performance on the regularly used high-stakes test did not generalize to other tests for which students had not been specifically coached, a nd again commented that "students in this district are prepared for high-stakes testing in ways that boost scores . substantially more than actual achievement in domains that the te sts are intended to measure" (p. 2). To put the matter bluntly, teaching to a particular test undermines the validity of test results as measures of more general learning. While education researchers were essentia lly confirming Cannell's initial charges, the intrepid physician was continuing his own inves tigations. In late summer 1989, Cannell released a new report entitled The "Lake Wobegon" Report: How Public Educators Cheat on Standardized Achievement Tests This time Cannell presented new instances of the Lake Wobegon phenomenon and a vari ety of evidence of outright fraud in school testing programs, including a sampling of testimony from teachers concerned about cheating on tests. After presenting results o f his own survey of test security in the 50 states (concluding that security is generally so lax as to invite cheating), Cannell outlined methods to help people detect whether chea ting is going on in their school districts, and "inexpensive steps" to help prevent it. More recently Koretz and Barron (1998; RA ND, 1999) of the RAND Corporation investigated the validity of dramatic gains on Kent ucky's high stakes statewide tests. Like Texas, Kentucky had adopted policies to hold school s and teachers accountable for student performance on statewide tests. During the first four years of the program, Kentucky students showed dramatic improvements on t he state tests. What Koretz and Barron sought to assess was the validity of the Ken tucky test gains by comparing them with Kentucky student performance on comparable tes ts, specifically the National Assessment of Educational Progress (NAEP) and the A merican College Testing Program (ACT) college admissions tests. What they found was that the dramatic gains on the Kentucky test between 1992 and 1996 were simply not reflected in NAEP and ACT scores. They concluded that the Kentucky test score s "have been inflated and are therefore not a meaningful indicator of increased l earning" (RAND, 1999). Even before the release of the report sho wing inflated test scores in Kentucky, anyone familiar with the Lake Wobegon phenomenon, w idely publicized in the late 1980s and early 1990s, had to view the dramatic gai ns reported on TAAS in Texas in the 1990s with considerable skepticism. Were the gains on TAAS indicative of real gains in student learning, or just another instance of artif icially inflated test scores? In 1997, results from the 1996 the Nation al Assessment of Educational Progress (NAEP) in mathematics were released. The 1996 NAEP results showed that among the states participating in the state-level portion of the math assessment, Texas showed the greatest gains in percentages of fourth graders sco ring at the proficient or advanced levels. Between 1992 and 1996, the percentage of Te xas fourth grades scoring at these levels had increased from 15% to 25%. The same NAEP results also showed North Carolina to have posted unusually large gains at th e grade 8 level, with the percentages of eighth graders in North Carolina scoring at the pro ficient or advanced levels improving from 9% in 1990 to 20% in 1996. (Reese et al., 1997 ) Putting aside for the moment that the 199 6 NAEP results also showed that math achievement in these two states was no better (and in some cases worse) than the national average, these findings led to considerable publici ty for the apparent success of education reform in these two states. The apparent gains in m ath, for example, led the National Education Goals Panel in 1997 to identify Texas and North Carolina as having made unusual progress in achieving the National Educatio n Goals.
7 of 83.5 Plaudits for the Texas Miracle In Spring 1998, Tyce Palmaffy published a n article titled "The Gold Star State: How Texas jumped to the head of the class in elemen tary school achievement." Citing both 1996 NAEP results and TAAS score increases, Pa lmaffy praised Texas for being in the vanguard of "an accountability movement sweepin g the states" (not surprisingly he also mentioned North Carolina and Kentucky). Regard ing TAAS, Palmaffy reported "In 1994, barely half of Texas students passed the TAAS math exam. By last year, the proportion had climbed to 80 percent. What's more, the share of black and Hispanic children who passed the test doubled during that ti me to 64 percent and 72 percent respectively." Palmaffy's article, published in a H eritage Foundation journal, also included testimonials for the Texas success story f rom divergent vantage points. Kati Haycock, "director of the Education Trust, a Washin gton D.C.-based organization devoted to improving educational opportunities for low-income children" was quoted as touting Texas as "a real model for other states to follow." The article also referred to "researcher Heidi Glidden of the American Federatio n of Teachers union" as praising the sort of education accountability system used in Tex as. Meanwhile, the National Education Goals P anel had "commissioned Dr. David Grissmer, an education researcher with the RAND Cor poration, to conduct an analysis of education reforms in both states [Texas and North C arolina] to determine that the improvements were indeed significant and to seek to identify the factors that could and could not account for their progress" (Grissmer & F lanagan, 1998, p. i). The National Education Goals Panel released the Grissmer/Flanaga n report in November 1998. Without trying to recap or critique the Grissmer/Fl anagan report here, let me simply summarize how it was conveyed to the outside world. The report was released November 5, 1998 with a press release titled "North Carolina and Texas Recognized as Models for Boosting Student Achievement." The first paragraph of the press release read: (WASHINGTON, D.C.) A new study that both belies con ventional wisdom about problems in K-12 education and illuminates so me approaches for solving them points to the extraordinarily successf ul policies of two states North Carolina and Texas as models for reform throu ghout the nation. (NEGP, 11/5/98) After quotes from North Carolina Governor Jim Hunt and Texas Governor George W. Bush, the press release went on to summarize the Grissmer/Flanagan findings. The researchers found that "several factors commonly as sociated with student achievement, such as real per pupil spending, teacher pupil rati os, teachers with advanced degrees, and experience level of teachersÂ—are not adequate for e xplaining the test score gains." (National Education Goals Panel, November 5, 1998, p. 1). The press release explained that, instead, Grissmer and Flanagan attributed the achievement gains in Texas and North Carolina to three broad factors common to the two s tates (business leadership, political leadership, consistent reform agendas) and seven ed ucational policies (adopting statewide standards by grade for clear teaching, holding all students to the same standards, linking statewide assessments to academic standards, creati ng accountability systems with benefits and consequences for results, increasing l ocal control and flexibility for administrators and teachers, providing test scores and feedback via computer for continuous improvement, and shifting resources to s chools with more disadvantaged students).
8 of 8 Grissmer and Flanagan (1998) did not expl ain how they had determined that these were the factors behind the apparent achievement ga ins in Texas and North Carolina; but whatever the case, this 1998 report from the Nation al Education Goals Panel, coupled with the sort of diverse support for the Texas mode l education accountability system cited by Palmaffy, seemed to certify the apparent miracle of education reform in Texas. The success of education reform in Texas was being hera lded by observers as diverse as Palmaffy (of the Heritage Foundation), Haycock (hea d of an organization dedicated to improving the educational opportunities of low-inco me children), and Glidden (a researcher with one of the nation's largest teacher s unions). The Grissmer/Flanagan report seemed to be the clincher. Here was a report from a bipartisan national group (the National Education Goals Panel), prepared by a Ph.D researcher from a prestigious research organization, the RAND Corporation, that s traight out said, "The analysis confirms that gains in academic achievement in both states are significant and sustained. North Carolina and Texas posted the largest average gains in student scores on tests of the National Assessment of Educational Progress (NA EP) administered between 1990 and 1997. These results are mirrored in state asses sments during the same period, and there is evidence of the scores of disadvantaged st udents improving more rapidly than those of advantaged students" (Grissmer & Flanagan, 1998, p. i). Few people seemed to notice that the Grissmer & Flanagan report was not actually published by RAND. Nonetheless, the report from the National Education Goals Panel seemed to certify the seeming miracle of education reform in Texas. S ubsequently, the story of the Texas miracle has been circulated far and wide. Without t rying to document all of the stories on the Texas miracle I have seen, let me mention here just two examples. On June 10, 1999, the Boston Globe ran a front-page story headlined "Embarrassed into success: Texas school experience may hold lessons for Massachusett s" (Daley, 1999). And on March 21, 2000, in the editorial cited at the start of this a rticle, USA Today in urging the U.S. Senate to adopt a Texas-style school accountability system for the $8 billion Title I program providing federal aid to poor schools, the editors cited "Texas-size school success" in the Lone Star state. In an apparent ref erence to 1996 NAEP results, the editorial cited the Education Trust as the source o f evidence about gains in Texas on 1996 math tests administered nationally.
1 of 21 Education Policy Analysis ArchivesVolume 8 Number 41The Texas Miracle in Education Walt Haney 4. Problems with TAAS Two years ago when I agreed to help MALDE F on the TAAS case, I had no way of foreseeing the extent to which education reform in Texas would come to be touted as a model to be emulated elsewhere. Nonetheless, as I s tudied what had been happening with TAAS in Texas, I quickly came to think otherwise. B efore summarizing what I think is wrong with TAAS and how it is being misused in Texa s, I should mention that some of what I recount in the remainder of this article is based on two unpublished reports that I prepared in connection with the TAAS caseÂ—a prelimi nary report in December 1998, and supplementary report in July 1999 (Haney, 1998; 1999). However, it also draws on additional evidence acquired and analyses undertake n since completion of the supplementary report in summer 1999. The problems with TAAS and the way it is being used in Texas may be summarized under five sub-headings: 1) the TAAS is having a continuing adverse impact on Black and Hispanic students; 2) the use of the T AAS test in isolation to control award of high school diplomas is contrary to professional standards concerning test use; 3) the passing score on TAAS is arbitrary and discriminato ry; 4) a variety of evidence casts doubt on the validity of TAAS scores; and 5) more a ppropriate use of test results would have more validity and less adverse impact.4.1 Adverse impact In previous research and law, three stand ards have been recognized for determining whether observed differences constitute discriminat ory disparate impact: 1) the 80 percent (or four-fifths) rule; 2) tests of the stat istical significance of observed differences; 3) and evaluation of the practical significance of differences. The "80 percent" or four-fifths rule refers to a provision of the 1978 Uniform Guidelines on Employee Selection Procedures (43 F.R. No. 166, 38290-38296, 1978) which reads: Sec. 6D. Adverse impact and the "four-fifths rule." A selection rate for any race, sex or ethnic group which is less than four-f ifths (or eighty percent) of the rate for the group with the highest rate will b e generally regarded by Federal enforcement agencies as evidence of adverse impact, while a greater than four-fifths rate will generally not be regarde d by Federal enforcement agencies as evidence of adverse impact. (As quoted in Fienberg, 1989, p. 91). As a result of its standing in federal re gulations, the 80 percent rule as a test of adverse or disparate impact has been widely recogni zed. Nonetheless, simple differences in percentage rates have some undesirable propertie s. The simple difference, for example "is inevitably small when the two percentages are c lose to zero" (David H. Kaye and David A. Freedman, Reference guide on statistics, F ederal Judicial Center, 1994). Hence, most observers and considerable case law now hold t hat in assessing disparate impact, it
2 of 21 is important to apply not just the 80% or fourfif ths rule but also to consider the practical and statistical significance of differences in sele ction or pass rates (Fienberg, 1989; Kaye & Freedman, 1994; see also, Office of Civil Rights, 1999). In previous reports regarding the TAAS case (Haney, 1998; 1999), I applied these three tests of adverse impact to a variety of TAAS results. However, for economy of pr esentation here, I provide only illustrative results. Eighty Percent or Four-Fifths Rule. To apply this test of adverse impact, we simply multiply the pass rates on TAAS for White st udents by 80% and check to see whether the pass rates for Blacks and Hispanics fal l below these levels. Table 4.1 presents the application of the 80% rule to the TAA S results previously presented in Table 3.2 above. As can be seen, even though grade 10 pass rates for all three TAAS tests for Black and Hispanics have improved between 1994 and 1998, these pass rates still lag below 80% of the White pass rates. According to thi s standard of adverse impact, the TAAS grade 10 tests continue to show adverse impact on Black and Hispanic students. (Note 5)Table 4.1 Eighty Percent Rule and TAAS Grade 10 Pass Rates: P ercent Passing All Tests by Race 1994-1998 All Students Not in Spe cial Education(Does Not Include Year-Round Education Results) 19941995199619971998 White 67%70%74%81%85% White*80% 53.6%56.0%59.2%64.8%68.0%% Black 29%32%38%48%55% Hispanic 35%37%44%52%59%Source: Selected State AEIS Data: A Multi-Year Hist ory Statistical Significance of Differences in Pass Rat es. As mentioned, comparisons of simple percentages passing have some weaknesses fro m a statistical point of view. For example, differences in pass rates, particularly if small numbers of examinees are involved, may result from random variation in the p articular sample of candidates who take an examination in a particular year. To check against this possibility, a second kind of standard for evaluating discriminatory disparate impact is generally employed; namely, a test of the statistical significance of observed differences. A test of statistical significance is used to assess the probability that a particular outcome (such as differences in proportions passing a test) might ha ve occurred simply by chance or random sampling. The obvious statistical significance test to apply in a case such as that of proportions of candidates passing the TAAS is the t est of the difference in proportions of two populations. As explained in most statistics te xtbooks, such as Paul Hoel's Introduction to mathematical statistics (1971, pp. 134-137), if p1 and p2 refer to the proportions of successes in two samples, q1 and q2 refer to the proportions of failures in the two samples, and n1 and n2 refer to the sizes of the samples, the standard er ror of the difference in proportions is calculated as follows:
3 of 21 SEdiff = (p 1q1/n 1 + p2q 2/ n2) 1/2 Using this formula we may calculate the s tandard error of the difference in proportions for each comparison we wish to make and then divide the standard error of the difference into the observed difference to calc ulate the number of standard errors equivalent to the observed difference. Table 4.2 sh ows the results of such calculations for the Spring 1998 TAAS results.Table 4.2 Statistical Significance of Differences in 1998 Gra de 10 Pass Rates TAAS ReadingTAAS MathTAAS Writing No. Tested% PassNo. Tested% PassNo. Tested% Pass Black 2679081%2743461%2671784% Hispanic 7066679%7174767%7048182% White 10888795%10959588%10893596%Source: TAAS Summary ReportÂ—Test Performance All St udents Not In Special Ed. Grade 10Â—Exit Level Report Date April 98 Date of Testing: March 1998 (www.tea.state.tx.us/student.assessment/results/sum mary/sum98/gxen98.htm)White-Black Differences SE of difference0.0025 0.0031 0.0023 Obs'd Difference 14% 27% 12% Obs'd Diff/SE 56.312 86.982 51.721 White-Hispanic Differences SE of difference0.0017 0.002 0.0016 Obs'd Difference 16% 21% 14% Obs'd Diff/SE 95.894 104.41 89.503 As can be seen from Table 4.2, the differ ences in pass rates for both White-Black and White-Hispanic comparisons are easily statistic ally significant, with observed differences equivalent to some fifty to over 100 st andard errors. (Other statistical tests on TAAS results also yield results of this magnitude; see Haney, 1998; 1999). Practical significance of observed differences What of the practical significance of the observed differences in the 1998 grade 10 TAAS pass rates? Later in this report, I discuss the apparent consequences of the TAAS for g rade retention and dropping out of school, but for the moment let us simply examine th e numbers of students involved in the differential pass rates. On the TAAS writing test in 1998, 96% of White students passed, 84% of Black students and 82% of Hispanic students. While these differences do not exceed the 80% rule (96%*0.80 = 76.8%), let us consider the number s of students involved. Specifically
4 of 21we may consider the numbers of Black and Hispanic s tudents who would have passed the 1998 grade 10 writing test had the passing rates fo r Black and Hispanic students been the same as that for White students. These numbers are approximately 3,200 Black students and 9,900 Hispanic students, for a total of about 1 3,000 (comparable calculations show that on the TAAS math for 1998, about 22,000 more B lack and Hispanic students would have passed had their pass rates been the same as f or White students). Do the differential results on the 1998 grade 10 TAAS writing test, on which approximately 13,000 more Black and Hispanic students failed than would have been the case had the Black and Hispanic pass rates been the same as that of White students, constitute practical adverse impact? Do the differential results on all of the 1 998 grade 10 TAAS tests, on which close to 34,000 more Black and Hispanic students fa iled (10,700 Black and 23,200 Hispanic students) than would have been the case ha d the Black and Hispanic pass rates been the same as that for White students constitute practical adverse impact? The answer, especially when results are also suspect under both the 80% rule and tests of statistical significance, seems clear, at least to me. A test t hat leads to failure for tens of thousands more minority than non-minority students, had they had equivalent passing rates, surely has practical adverse impact. Hence, the validity a nd educational necessity of such a test deserve close scrutiny. Before turning to those issues, however, I should mention that in his opinion in the TAAS case on January 7, 2000, Judge Prado ruled tha t "Plaintiffs have made a prima facie showing of significant adverse impact" (p. 23 though it should be added that the opinion has a discussion of disparate impact in two places, pp.15-17 and 20-23) 4.2 TAAS Use in Isolation Violates Professional Sta ndards The use of TAAS scores in isolation to co ntrol award of high school diplomas (or for that matter use of any test results alone to ma ke high stakes decisions about individuals or institutions) is contrary both to pr ofessional standards regarding testing and to sound professional practice. The standards to which I refer are the Standards for Educational and Psychological Testing published by the American Educational Research Ass ociation (AERA), the American Psychological Association (APA) and the Na tional Council on Measurement in Education (NCME). These standards have been in exis tence for nearly 50 years (in current and previous editions; AERA, APA & NCME, 19 85; 1999), and have been relied upon in numerous legal proceedings concerning testi ng in state and federal courts. (Note 6) One specific provision of these standards reads as follows: Standard 13.7 In educational settings, a decision o r characterization that will have a major impact on a student should not be made on the basis of a single test score. Other relevant information should be ta ken into account if it will enhance the overall validity of the decision. . It is important that in addition to test scores, other relevant information (e.g., school record, classroom observa tion, parent report) is taken into account by the professionals responsible for making the decision. (AERA, APA & NCME, 1999, pp. 146-47) (Note 7) It seems clear that the practice in Texas of controlling award of high school diplomas on the basis of TAAS test scores in isolat ion without weighing other relevant information such as students' grades in high school (HSGPA) is contrary to this provision of the 1999 Standards for Educational and Psychological Testing (and the corresponding
5 of 21provision of the 1985 Standards ). Witnesses for the state of Texas during t he TAAS trial (Susan Phillips and William Mehrens) disputed my interpretation of this standar d. Here is how Judge Prado summarized and resolved the dispute in his decision : There was little dispute at trial over whether this standard exists and applies to the TAAS exit-level examination. What was disput ed was whether the TAAS test is actually the sole criterion for gradua tion. As the TEA points out, in addition to passing the TAAS test, Texas st udents must also pass each required course by 70 percent. See Texas Admin. Code Â§ 74.26(c). Graduation in Texas, in fact, hinges on three separate and independent criteria: the two objective criteria of attendance and success on the TAAS examination, and the arguably objective/subjective criterion of course success. However, as the Plaintiffs note, these fac tors are not weighed with and against each other; rather, failure to meet any single criterion results in failure to graduate. Thus, the failure to pass the exit-level exam does serve as a bar to graduation, and the exam is properly calle d a "high-stakes" test. On the other hand, students are given at least eight opportunities to pass the examination prior to their scheduled gradu ation date. In this regard, a single TAAS score does not serve as the sole criterion for graduation. The TEA presented persuasive evidence that the number o f testing opportunities severely limits the possibility of "false negative" results and actually increases the possibility of "false positives," a f act that arguably advantages all students whose scores hover near the borderline between passing and failing. (Prado 2000, pp. 14-15) Nonetheless, I believe that my interpreta tion of this standard is more in keeping with preponderance of professional opinion than are the narrow interpretations offered by the witnesses for the state of Texas. This may be i llustrated by reference to the 1999 report from the Board on Testing and Assessment of the Commission on Behavioral and Social Sciences of the National Research Council. As a result of increasing controversy ove r high stakes testing, the U.S. Congress passed legislation in 1997 requesting that the Nati onal Academy of Sciences undertake a study and make recommendations regarding the approp riate use of tests for student grade promotion, tracking and graduation (Heubert & Hause r, 1999, p. 1). The resulting report High Stakes: Testing for Tracking, Promotion and Graduation specifically cites Standard 8.12 of the 1985 joint standards and clear ly points out that a compensatory or sliding scale approach to using test scores in comb ination with grades would be "more compatible with current professional standards" tha n using an absolute cut-off score on a test to control high school graduation (Heubert & H auser, 1999, pp. 165-66). More generally, this National Research Council report re commends: High stakes decisions such as tracking, promotion, and graduation should not automatically be made on the basis of a single test score but should be buttressed by other relevant information about stud ents' knowledge and skills such as grades, teacher recommendations and extenua ting circumstances. (Heubert & Hauser, 1999, p. 279) (Note 8) Ironically enough, reliance on TAAS score s in isolation to control award of high school diplomas in Texas is even contrary to the fo llowing passage from the TEA's own Texas Student Assessment Program Technical Digest :
6 of 21All test result uses regarding individual students or groups should incorporate as much data as possible. . Student test scores should also be used in conjunction with other performance indicato rs to assist in making placement decisions, such as whether a student shou ld take a reading improvement course, be placed in a gifted and talen ted program or exit a bilingual program. (pp. 2-3) In sum, the state of Texas's use of TAAS scores in isolation, without regard to students' high school grades, to control award of h igh school diplomas, is contrary not only to both professional standards regarding test use and the advice of the recent NRC report, but also to the TEA's own advice on the nee d to use test results in conjunction with other performance indicators.4.3 Passing scores on TAAS Arbitrary and Discrimina tory The problem of using TAAS scores in isola tion to control award of high school diplomas is exacerbated by the fact that the passin g scores set for TAAS are arbitrary and discriminatory. This is important because when a pa ss or cut score is set on a test, the validity of the test depends not just on test conte nt, administration and scoring, but also on the manner in which the passing score is set. The 1999 Standards for Educational and Psychological Testing state: Standard 4.19 When proposed score interpretations i nvolve one or more cut scores, the rationale and procedures used for estab lishing cut scores should be clearly documented. (AERA, APA & NCME, 1999, p. 59) Also, standard 2.14 says that "Where cut scores are specified for selection or classification, the standard errors of measurement should be reported in the vicinity of each cut score (AERA, APA & NCME, 1999, p. 35) (N ote 9) Considerable technical and professional l iterature has been published on alternative methods for setting passing scores on tests. Glass (1978) wrote an early critique of methods of setting passing scores that questioned t he very advisability of even attempting to make this use of tests. In 1986, Ronald Berk pub lished "A consumer's guide to setting performance standards on criterion-referenced tests ( Review of Educational Research 56:1, 137-172) in which he reviewed 38 different me thods for setting standards (or pass or cut-scores) on standardized tests. (Note 10) I sought to learn exactly how the passing scores were set on the TAAS in 1990 and to obtain copies of any data that were used in the process of setting passing scores on the TAAS exit test. The most complete account of the pr ocess by which the passing scores were set is provided in Appendix 9 of the Texas Student Assessment Program Technical Digest for the Academic Year 1996-1997 (TEA, 1997, pp. 337-354). Specifically contained in this appendix are 1) a memo dated July 14, 1990, from Texas Education Commissioner Kirby to members of the state Board of Education (including a summary of results from a field test of the TAAS) and 2) Mi nutes of the State Board of Education meeting in July 1990 at which the passing scores on the grade 10 TAAS were established. In his memo, Commissioner Kirby recommend ed a passing score of 70% correct for the exit level of TAAS, but also recommended th at this standard be phased in over a period of three years, with the passing score of 60 % proposed for Fall 1990. After considerable discussion, the State Board voted unan imously to adopt the
7 of 21recommendations of the commissioner regarding the T exas Assessment of Academic Skills, specifically that: "For the Academic Skills Level, a minimum standard of 70% of the test items must be answered correctly." Following a statement by a Dr. Crawford a bout the importance of giving "notice regarding the standard required for graduation from high school . to those students who will be taking the exit level test" (p. 6/353), the Board also voted 11 to 3 in favor of an amendment to the original proposal to "give notice that the 1991-92 standard will be 70" (p. 7/354). What struck me about this record of how t he passing score on the TAAS exit test was set are the following: The process was not based on any of the professiona lly recognized methods for setting passing standards on tests; 1. It appears to have failed completely to take the st andard error of measurement into account; and, 2. As I explain below, the process yielded a passing s core that effectively maximized the adverse impact of the TAAS exit test on Black a nd Hispanic students. 3. Before I elaborate on the latter point, l et me emphasize that from the available record I have done my utmost to understand the rati onale that motivated the Board to set the passing score where it did, namely at 70% corre ct. As best I can tell from the record, the main reasons for setting the passing score at 7 0% correct appear to have been that this is where the passing score had been set on TEAMS an d this level was suggested by the Texas Education Code. The minutes of the Board meet ing report that "the Commissioner cited the portion of the Texas Education code that requires 70 percent as passing (Attachment A), explaining that there is a rational e for aiming at 70 percent of test items as the mastery standard" (p. 1/348). In my view this is simply not a reasonabl e or professionally sound basis for setting a passing standard on an important test such as the TAAS exit test. Indeed from the available record it is not even clear that the Texa s code cited by the Commissioner was actually referring to anything more than the passin g standard for course grades. Moreover, the minutes to the July 12, 1990, meeting also report the following remarks by Dr. Crawford: "Testing is driving a curricular prog ram, which means that the curriculum is not at the place where you want it to be when yo u start out." She commented that "70 only has whatever value that is given to it, and in testing 70 is not the automatic passing standard on every test" (p. 4/351). In sum, the process used in setting the p assing scores on the TAAS exit test in 1990 did not adhere to prevailing professional standards regarding the setting of passing scores on standardized tests. For example, from the record available, it is clear that the process used to set the passing score on the TAAS exit test in 1990 failed to meet all six criteria of "technical adequacy" described in Berk's (1986) review of criteria for setting performance standards on criterion-referenced tests Â—a review published in a prominent education research journal, and of which TEA offici als surely should have been aware in 1990. TAAS cut score study To understand more fully the process by which the TAAS passing scores were set in 1990, I requested a copy of the TAAS field test data that were presented to the Board of Education in the meeting at which it set the passing score on the TAAS-X. Using these data, I undertook a study ( with the assistance of Boston College doctoral student Cathy Horn) which came to be called our "TAAS cut score study." In this study, we asked individuals, review ing the data available to the Texas
8 of 21Board of Education in July 1990 to select the passi ng scores (or cut scores) students would be need to attain in order to pass the TAAS r eading and math tests. For both the reading and math tests, each research subject was p resented with a graph showing the percentage of students, separately for White, Hispa nic and Black ethnic groups, passing each number of percent correct answers on the field or pilot test of the TAAS exit test in 1990. These graphs are shown in Figures 4.1 and 4.2 below. Each person in the cut score study was th en presented with the following instructions: The following graph presents the percentage of stud ents passing the reading / math section of the Texas Assessment of Academic Sk ills (TAAS) at each number of questions answered correctly. Choose the number of questions
9 of 21 correct that most clearly differentiates White stud ents (represented by a black line) from Black and Hispanic students. Respondents could then ask clarifying que stions before selecting a response. After a pilot test of the cut score study in 1998, Ms. Ho rn (a native of Texas and secondary school teacher there before she came to Boston Coll ege for graduate studies) extended the cut score study to nine Texans. The exercise was ad ministered, by phone or in person, to 9 individuals residing in the state of Texas. (Thos e individuals who were interviewed by phone had paper copies of the Figure 4.1 and 4.2 gr aphs and the prompt for the exercise in front of them when they selected cut points.) Th e professions of the nine respondents are listed below. Respondents (all currently living in Texas):2 teachers3 engineers2 college students1 financial analyst1 director of communications The cut or passing scores selected by the se nine individuals as most clearly differentiating between White students and Black/Hi spanic students are shown in Table 4.3 below.Table 4.3 Results of Cut Score Study with Nine Texans ReadingMath Person 1 3434 Person 2 3537 Person 3 3538 Person 4 3437 Person 5 3640 Person 6 3340 Person 7 3437 Person 8 3643 Person 9 4444 Summary Minimum 3334 Maximum 4444 Mean 35.738.9 Median 3538
10 of 21 As shown, respondents selected passing sc ores ranging from 33 to 44 on the reading test and from 34 to 44 for the math test. T he median value across all nine respondents was 35 for the reading test and 38 for the math test. The passing scores of 70% correct for the TAAS exit test recommended by Commissioner Kirby and accepted by the Board of Edu cation in July 1990 were 34 for the reading test and 42 for the math test. The resu lts of our cut score study show that if the intent in setting passing scores based on the T AAS field test results in July 1990 had been discriminatory, i.e., to set the passing score s so that they would most clearly differentiate between White students and Black/Hisp anic students, then the passing scores would have been set just about where the Boa rd of Education did in fact set them. At the same time, there is no evidence of which I know, in the record of the process of setting passing scores on the TAAS in 1990, that the explicit intent of either Commissioner Kirby or the Board was discriminatory. However, the available record shows no indication that Commissioner Kirby, the TE A or the Board relied on any professionally recognized method for setting passin g scores on the test, and the passing scores set were indeed consistent with those that w ould have been set, based on the TAAS field test results, if the intent had been dis criminatory. Use of measurement error in setting passing scores The reason the setting of passing scores on a high stakes test such as the TA AS is so important is that the passing score divides a continuum of scores into just two c ategories, pass and fail. Doing so is hazardous because all standardized test scores cont ain some degree of measurement error. Hence, the 1985 Standards for Educational and Psychological Testing and other professional literature clearly indicate the importance of considering meas urement error and consequent classification errors in the process of setting passing scores on tests. Before discussing this topic further, two introductory explanations may be helpful. First, from the available record of the July 1990 m eeting of the Board of Education, there is no indication that consideration of measurement error entered into the Board's deliberations. Second, the issue of measurement and classification errors regarding TAAS was addressed, as far as I know at least in the 199 3-94 and 1996-97 editions of Texas Student Assessment Program Technical Digest Unfortunately there are two serious errors in the manner in which these issues are addressed. Before explaining the nature of these errors, let me first summarize what the 1996-97 edi tion of Texas Student Assessment Program Technical Digest says about test reliability, standard error of mea surement and classification errors. Chapter 8 of the 1996-97 Technical Digest entitled "reliability" provides a brief discussion of internal consistency estimates and fo rmulas for calculating internal consistency reliability estimates (p.41). This is f ollowed (p. 42) by a discussion of (and formulas for) calculating standard errors of measur ement from reliability estimates. These discussions provide references to appendix 7 which shows data to indicate that for the Spring 1997 administration of TAAS at grade 10 (adm inistered to 214,000 students) the internal consistency estimates for the TAAS math, r eading and writing sub-tests were 0.934, 0.878 and 0.838, respectively; and the corre sponding standard errors of measurement were 2.876, 2.352 and 2.195. This represents the first serious error i n the technical report's handling of measurement and classification error. Specifically, while the technical report bases the calculation of standard error of measurement on int ernal consistency reliability estimates, it clearly should have been based on test-retest or alternate-forms reliability estimates. Test-retest reliability refers to the consistency o f scores on two administrations of a test. Alternate-forms reliability refers to the consisten cy of scores on two different forms or
11 of 21versions of the same test. Since the purpose of TAA S testing is not simply to estimate students' performance on one version of the TAAS test, but to estimate their co mpetence in reading, math and writing, in general, as might be measured by any version of the relevant TAAS tests, alternate-forms reliability is more appropriate for assessing reliability than is internal consistency reliabilit y. As Thorndike and Hagen (1977, p. 79) point out in their textbook on measurement and eval uation, "evidence based on equivalent test forms should usually be given the m ost weight in evaluating the reliability of a test." In general, alternate forms test reliabil ity tends to be lower than internal consistency reliability. Hence, it seems clear to m e that the figures reported in the 1996-97 Technical Digest overestimate the relevant reliability of grade 10 TAAS test scores and underestimate the standard error of meas urement associated with TAAS scores. I have attempted to estimate the alternat e-forms reliability of TAAS test scores using two independent sources of data. First I empl oyed the cross-tabulations reported by Linton & Debeic (1992) of test-retest data on stude nts in several large Texas districts who took the TAAS exit level test in October 1990 and a gain in April 1991. Using the Linton & Debeic cross tabular results, I calculated the fo llowing test-retest correlations: TAAS-Reading 0.536; TAAS-Math 0.643; and TAAS-Writi ng 0.555. Second, as part of the background work for the TAAS case, Mark Fassold developed a remarkable longitudinal database of all 1995 sophomore student s in Texas and their TAAS scores on up to ten different administrations of TAAS: 1 March 1995 2 May 1995 3 July 1995 4 October 1995 5 March 1996 6 May 1996 7 July 1996 8 October 1996 9 February 1996 10 April 1996 At my request Mr. Fassold ran an analysis of all test-retest correlations on this cohort of students (total N of about 230,000). Corr elations were calculated separately by ethnic group and for TAAS Reading and Math tests. G iven 16 different test-retest possibilities this yielded 214 different coefficien ts (2 x 16 x 6 ethnic groups). Results varied widely (in part because in some comparisons sample sizes were very small). Overall, however, the observed test-retest correlat ions tended to cluster in the 0.30 to 0.50 range. These test-retest correlations based on b oth the Linton-Debeic and Fassold data are, however, attenuated in that in both data sets only students who failed a TAAS test took it again. There are methods for correcting observed te st-retest correlations for such attenuation (see Haney, Fowler and Wheelock, 1999, for an example), but as a more conservative approach here, let me simply discuss w hat previously published literature suggests about the relationships between test-retes t and internal consistency reliability. As mentioned above, the 1996-97 Technical Digest cites internal consistency reliability estimates for the three grade 10 TAAS s ub-tests of 0.934, 0.878 and 0.838, and standard errors of measurement of 2.876, 2.352 and 2.195. It is common for tests which
12 of 21show internal consistency reliability of about 0.90 to show alternate forms reliability of 0.85 or 0.80 (see for example, Thorndike & Hagen, 1 977, p. 92). On page 42 of the 1996-97 Technical Digest the example is shown in which a test with an inte rnal consistency reliability of 0.90 (and a standard dev iation of 6.3) is estimated to have a standard error of measurement of 2.0. However, if i nstead of an internal consistency reliability of 0.90, we were to use in these calcul ations an alternate forms reliability of 0.85 or 0.80, the resulting standard errors of meas urement would be 2.44 and 2.82. This suggests that the appropriate standard errors of me asurement for the TAAS tests may be on the order of 20 to 40% greater than the estimate s reported in the TAAS 1996-97 Technical Digest. The second serious error in the technical report's handling of measurement and classification error occurs on pages 30 and 31 in a section labeled Exit level testing standards and the standard error of measurement." H ere the authors of the 1996-97 Technical Digest point out that a student with a "true achievement l evel at the passing standard would be likely to pass on the first attem pt only 50% of the time" (p. 31). This passage then goes on to assert that "if such a stud ent has attempted that test eight times, the student's passing is almost assured (probabilit y of passing is 99.6%)" (p. 31). In other words, the chances of a minimally qualified student failing the TAAS eight times and being misclassified as not having the requisite ski lls is only 0.4% (0.50 to the 8th power is 0.0039). This calculation strikes me as erroneous, or at least potentially badly misleading, because the authors have presented absolutely no ev idence to show the probability that a student who fails the TAAS will continue to take th e test seven more times. As I explain later, available evidence suggests that students wh o fail the TAAS grade 10 test more than once or twice are likely to be held back in gr ade and to drop out of school long before they reach grade 12 by which time they would have had a chance to take the TAAS exit test eight times. Since 0.50 to the secon d power is 0.25; and to the third power is 0.125, this indicates that a student with a "tru e achievement level at the passing standard" who takes the TAAS twice or three times, before becoming discouraged and not taking the test again, has a 25% or 12.5% chanc e of being misclassified as failing. Before proceeding to present evidence bea ring on this point, let me discuss how the standard error of measurement might usefully have b een taken into account in adjusting passing scores. Because of the error of measurement in test scores, when scores are used to make pass-fail decisions about students, two kin ds of classification errors can occur. A truly unqualified student can pass the test (a fals e pass) or a truly qualified student can fail the test (a false failure). How one thinks abo ut the balance of these two misclassification errors depends on the risks (or b enefits and costs) associated with each type of misclassification. If one were confident th at a student failing TAAS would receive special attention and support educationally, one mi ght be inclined to weigh false passes as more serious than false failures. If on the othe r hand, one thought that students failing TAAS were unlikely to receive effective instruction and instead merely to be retained in grade 10 and to be stigmatized as failures, then on e would probably feel that false failures would be more harmful than false passes. Here is how Berk (1986) discussed this po int: Assessing the relative seriousness of these consequ ences, is a judgmental process. It is possible to assign plusses (benefits ) and minuses (costs or losses) to the consequences so that the cutoff scor es can be set in favor of a specific error reduction rate. A loss ratio (benefi ts: losses) can be specified for each decision application with the cutoff score adjusted accordingly.
13 of 21(Berk, 1986, p. 139). To study the relative risks associated wi th the two kinds of classification errors associated with a high school graduation test, with the assistance of Kelly Shasby, (a doctoral sudent in the Educational Research, Measur ement and Evaluation program at Boston College), I undertook what came to be known as our "risk analysis" study. The survey form used in the risk analysis study was entitled "Survey of risk associated with classification decisions" and opene d with the following introduction: When classifying large numbers of individuals using standardized exams, two different kinds of mistakes are made. Some peop le will be falsely classified as "qualified" or "passing" while others will be falsely classified as "unqualified" or "failing." There is a degree of ri sk associated with mistakes of this kind, both for the individual who is incorr ectly classified and for the society in which that individual lives. We would li ke your help in assessing the severity of the risk, or possible harm, caused to individuals and to society when mistakes are made on a number of different typ es of standardized tests. The purpose of this survey is to assess t he public's perception of misclassifications of individuals. These misclassif ications can have an impact on the individual and on the society in whic h that individual lives. This impact has the potential to be harmful, and we are interested in determining how harmful the public thinks different misclassifications can be. On a scale from 1 to 10, 1 being "minimum harm" and 10 being "maximum harm," rate each scenario with respect to the degree of harm it would cause that individual and then the degree of harm it would cause society. Then circle the number, which corresponds, to the rating you c hose. After this introduction, respondents were asked to rate the risk on a 1 to 10 scale of harm associated with 16 different misclassification s that might results from classifying people pass-fail based on standardized test results Respondents were asked to rate separately the harm to individuals and to societyÂ—a nd to give credit where it is due, this distinction, a clear improvement over the initial v ersion of our survey, was suggested by Ms. Shasby. Specifically, survey respondents were a sked to rate the degree of harm, separately for individuals and society, associated with the following kinds of misclassification: A kindergartner who is ready to enter school is den ied entrance. 1. A kindergartner who is not ready to enter school is granted entrance. 2. An airline pilot who is not qualified is given a li cense to fly. 3. An airline pilot who is qualified is denied a licen se to fly. 4. A qualified high school student is denied a diploma 5. An unqualified high school student is granted a dip loma. 6. A qualified accountant is denied certification. 7. An unqualified accountant is granted certification. 8. A qualified student is denied promotion from grade eight to grade nine. 9. An unqualified student is granted promotion from gr ade eight to grade nine. 10. A qualified doctor is denied a license to practice. 11. An unqualified doctor is granted a license to pract ice. 12. A qualified candidate is denied admission into coll ege. 13.
14 of 21 An unqualified candidate is granted admission into college. 14. A qualified teacher is denied certification. 15. An unqualified teacher is granted certification. 16. The risk survey form was sent to a random sample of 500 secondary teachers in Texas (specifically only math and English/Language Arts teachers) on May 23, 1999. As of June 30, 1999, we had received 66 responses (rep resenting a response rate of 13.2%). (Note 11) Table 4.4 below summarizes the results of the risk analysis survey.Table 4.4 Results of Risk Analysis Survey with Secondary Teac hers in Texas For individual For society MeanSDMeanSD1. A kindergartner who is ready to enter school is denied entrance.6.452.673.942.642. A kindergartner who is not ready to enter school is granted entrance.7.202.235.062.713. An airline pilot who is not qualified is given a license to fly.8.362.329.551.004. An airline pilot who is qualified is denied a li cense to fly.7.742.374.392.995. A qualified high school student is denied a dipl oma.9.111.696.392.586. An unqualified high school student is granted a diploma.6.852.727.742.267. A qualified accountant is denied certification.8.651.505.322.628. An unqualified accountant is granted certificati on.8.651.505.322.629. A qualified student is denied promotion from gra de eight to grade nine.8.891.526.152.3910. An unqualified student is granted promotion fro m grade eight to grade nine.8.152.017.802.1211. A qualified doctor is denied a license to pract ice.8.801.687.322.6412. An unqualified doctor is granted a license to p ractice.7.152.879.371.7213. A qualified candidate is denied admission into college.8.831.736.302.4314. An unqualified candidate is granted admission i nto college.6.082.666.082.6615. A qualified teacher is denied certification.8.641.768.382.1316. An unqualified teacher is granted certification .6.622.849.151.60 As this table shows, the risk associated with denying a high school diploma to a qualified student is for individuals the most sever e risk associated with any of the
15 of 21misclassification scenarios we asked respondents to rate. The only scenarios showing higher average risks are the risks for society asso ciated with licensing an unqualified pilot (mean = 9.55), licensing an unqualified doctor (9.3 7) and licensing an unqualified teacher (9.15). Particularly germane to our discussion of the setting of passing scores on the TAAS graduation test are the relative risks associated w ith denying a diploma to a qualified high school student (mean = 9.11) and granting a diploma to an unqualified student (6.85). These results indicate that the risk of denying a d iploma to a qualified student is much more severe than granting a diploma to an unqualifi ed student (the difference, by the way, is statistically significant). These results indicate that if a rational passing score had been established on the TAAS exit test, the passing or cutoff scores should be adjusted downward in order to minimize overall risk. A common practice in setting passing scores on important tests is to reduce an empirically established passing score by one or two standard errors of measurement. While I want to stress that the passin g scores of 70% correct on the TAAS are arbitrary, unjustified and discriminatory, we c an see from Figures 4.1 and 4.2 what the consequences would be for Black and Hispanic pa ss rates (on the TAAS field test) if the passing scores of 70% had been corrected for er ror of measurement. Recall that the passing scores set by the Board on the field test a dministration of the TAAS were 34 items correct on the reading test and 42 on the mat h test. Recall also that the standard errors for the reading and math tests reported in t he Technical Digest were in the range of 2.5 to 3.0 raw score points. Suppose that to take e rror of measurement into account, the initially selected passing scores of 34 and 42 were lowered 5 points, to 29 and 37 on the reading and math tests, respectively. What can be e asily seen from Figures 4.1 and 4.2 is that these adjustments would have increased the pas sing rates for Black and Hispanic students about 12% on the math test and 20% on the reading test. The foregoing results were presented in a written report before the TAAS trial (Haney, 1999) and also discussed during testimony a t trial. Judge Prado (2000) apparently did not find these points persuasive for he commented merely that in setting the passing score on the TAAS tests, "the State Boa rd of Education looked at the passing standard for the TEAMS test, which was also 70 perc ent, and also considered input from educator committees" (p. 11). Regarding the dispara te impact of the passing score, he commented simply, "The TEA understood the consequen ces of setting the cut score at 70 percent" (p. 11).4.4 Doubtful Validity of TAAS Scores The Technical Digest on TAAS (TEA, 1997) contains an extremely short se ction (pp. 45-47) discussing test validity. Though this t hree-page passage mentions content, construct and criterion-related validity, it mainta ins that "the primary evidence for the validity of the TAAS and end-of-course tests lies i n the content validity of the test" (TEA, 1997, p. 47). This discussion, it seems to me is wo efully inadequate because test validation should never rest primarily on test cont ent. Test validation refers to the interpretation and meaning of test scores and these depend not just on test content, but also on a host of other factors, such as the condit ions under which tests are administered, and how results are scored and interpreted (e.g., i n terms of a passing score, as discussed in the previous section). Nonetheless, the TEA has previously under taken a number of studies examining the relationship between TAAS scores and course gra des. In one study, for example, it was reported that in one large urban district, 50% of the students who had received a
16 of 21 grade of B in their math courses failed the TAAS ma th test (TEA, 1996 Comprehensive Report on Texas Public Schools, pp. 14-15). Another summary finding was that when "TEA correlated exit level students' TAAS mathemati cs scores with the same students' course grades for several different mathematics cou rses in the 1992-93 school year . the correlation between TAAS scale scores and stude nts' end-of-year grades was only moderately positive (0.32). . (TEA, 1997, Techn ical Digest, p. 47). Inasmuch as this correlation is remarkably low in light of previous research that has generally shown test scores to correlate with high school grades in the range of 0.45 to 0.60 (see Haney, 1993, p. 58), as part of work on the TAAS case I sought t o acquire the actual data set on which this TEA finding was based. The data set in question contains records for 3,281 students in three districts that TEA documentation describes as "large urban distric t," "mid-sized suburban district," and "small rural district." The TEA has previously repo rted analyses of these data in "Section V: A study of correlation of course grades with Exi t Level TAAS Reading and Writing Tests" pp. 189-197 in Student Performance Results 1 994-95, Texas Student Assessment Program, TAAS and End-of-Course Examinations and Ot her Studies (Texas Education Agency, Austin, Texas, ND, but presumably 1995). After opening the file and verifying its structure, I sought to confirm that the results reported by the TEA could be replicated. This was i mpossible to do precisely because TEA did not report results with great precision. No netheless, initial results corresponded reasonably well with what TEA reported. Also, it sh ould be noted that while the data file included records on a number of grade 11 students, I restricted most analyses to grade 10 students pooled across the three districts, though the bulk of this sample (> 2,400 cases out of 3,300) comes from the one large urban distri ct. Then we calculated basic descriptive statistics on variables of interest, in particular scores for the TAAS reading and writing test administered in March 1995 and gra des for the English II courses completed in May 1995 (these data were provided by the districts to the Student Assessment Division of TEA.) Next we calculated rel ationships between variables. Table 4.5 shows the intercorrelations between the three T AAS test scores (writing, reading and math) and English II course grades. Given the size of this sample (>3,000) all of these correlation coefficients are statistically signific ant at the 0.01 level.Table 4.5 Correlations between TAAS Scores (Standard scores) and English II Grades Write SSRead SSMath SSGrade Write SS 1.00 Read SS 0.501.00 Math SS 0.510.691.00 Grade 0.320.340.371.00 Note the magnitudes of the correlations b etween English II course grades and TAAS scores. They are all in the range of 0.32 to 0 .37. As indicated above, previous studies have generally shown test scores to correla te with high school grades in the range of 0.45 to 0.60. Contrary to expectations, English II grades correlate more highly with TAAS math scores (0.37) than with writing (0.32) or reading (0.34) scores. Note also the odd intercorrelations among TAAS scores. The TAAS m ath scores correlate at the level
17 of 21of 0.69 with the TAAS reading scores, while the TAA S reading scores correlate at the level of 0.50 with the TAAS writing scores. This is contrary to the expectation that scores of two verbal measures (of reading and writing) sho uld correlate more highly with one another than with a measure of quantitative skills. These results cast doubt on the validity and the reliability of TAAS scores. People unfamiliar with social science res earch doubtless find it hard to make sense of correlation coefficients in the range of 0.32 to 0.37. Hence to provide a visual representation, Figure 4.3 shows a scatterplot of t he relationship between TAAS reading scores and English II grades. As can be seen from t his figure, the relationship between these two variables is a quite weak. Students with grades in the 70 to 100 range have TAAS reading scores from well below 40 to well over 80. Conversely, students with TAAS reading scores in the 80 to 100 range have Eng lish II grades from well below 40 to well over 80. Figure 4.3 Scatterplot of TAAS Reading Scores and E nglish II Grades I next examined whether there were differ ences in the relationships between TAAS scores and English II grades across ethnic groups. Table 4.6 provides an example of the relationship between passing and failing TAAS and p assing or failing in terms of English II course grades for Hispanics, Blacks and Whites. As can be seen from this table, of those students who passed their English II courses in the spring of 1995, 27-29% of Black and Hispanic students failed the TAAS reading test taken the same semester as their English courses compared with 10% of White students In other words, of grade 10 students in these three districts who are passing t heir English II courses, the rate of failure on the TAAS reading test for Black and Hispanic stu dents is close to triple that of White students. A similar, but slightly smaller, disparit y is apparent on the TAAS writing sub-test.Table 4.6
18 of 21 Rates of Passing and Failing TAAS and English II Co urse TAAS-Exit Test Results Black studentsHispanic studentsWhite students ReadingReadingReadingEnglish II Course FailedPassedFailedPassedFailedPassed Failed N39232421891734(%)10.1%5.9%11.0%8.6%3.1%6.3%Passed N111214596118155436(%)28.7%55.3%27.0%53.5%10.1%80.4% WritingWritingWritingEnglish II Course FailedPassedFailedPassedFailedPassed Failed N33291732582031(%)8.5%7.5%7.8%11.7%3.7%5.7%Passed N69256366141150441(%)17.8%66.1%16.6%63.9%9.2%81.4% Such a disparity can result from several causes. First, if the TAAS reading test is in fact a valid and unbiased test of reading skills, t he fact that close to 30% of Black and Hispanic students who are passing their sophomore English courses failed the TAAS reading test, as compared with only 10% of White st udents must indicate that minority students in these three districts are simply not re ceiving the same quality of education as their White counterpartsÂ—especially when one realiz es, as I will show in Part 5 of this article that by 1995 Black and Hispanic students in Texas statewide were being retained in grade 9 at much higher rates than White students The only other explanation for the sharp disparity is that the TAAS tests and the mann er in which they are being used (with a passing score of 70% correct) are simply less val id and fair measures of what Black and Hispanic students have had an opportunity to learn, as compared with White students. These analyses were reported in the July 1999 report (Haney, 1999) and discussed in direct testimony and cross-examination during th e TAAS trial in September 1999. Here is how Judge Prado interpreted these findings in his January 7 ruling: The Plaintiffs provided evidence that, in many case s, success or failure in relevant subject-matter classes does not predict su ccess or failure in that same area on the TAAS test. See Supplemental Report of Dr. Walter Haney Plaintiff's expert, at 29-32. In other words, a stu dent may perform reasonably well in a ninth-grade English class, for example, a nd still fail the English portion of the exit-level TAAS exam. The evidence s uggests that the disparities are sharper for ethnic minorities. Id. at 33. However, the TEA has argued that a student's classroom grade cannot be e quated to TAAS performance, as grades can measure a variety of fac tors, ranging from effort and improvement to objective mastery. The TAAS test is a solely objective
19 of 21measurement of mastery. The Court finds that, based on the evidence presented at trial, the test accomplishes what it s ets out to accomplish, which is to provide an objective assessment of whether st udents have mastered a discrete set of skills and knowledge. (Prado, 2000, p. 24) With due respect to Judge Prado, I believ e there are two flaws in this reasoning. First, Judge Prado interprets the disparities in th e rates at which, among students who pass their English II courses, minorities fail the "English portion" of TAAS far more frequently than White students, as evidence of the need for "objective assessment" of student skills. Though he did not explicitly say so his reasoning seems to be that an objective test is necessary because the grades of m inority students are inflated. This interpretation, however, takes one specific finding out of the context in which I presented it, both in the Supplementary report (Haney, 1999, pp. 29-33) and in testimony at trial. In both cases, and as described above, it was shown th at even if one ignores the question of possibly inflated grades, the intercorrelations amo ng TAAS scores themselves (i.e., that reading and math scores correlate more highly than reading and writing scores) raise serious doubts about their validity. Second, even if we assume the validity of TAAS tests and accept Judge Prado's reasoning that the lack of correspondence between E nglish grades and TAAS reading and writing scores demonstrates the need for objective assessment of student mastery, the fact that "the disparities are sharper for ethnic minori ties," represents prima facie evidence of inequality in opportunity to learn. Even if Black a nd Hispanic students' teachers are covering the same academic content as White student s' teachers, that 27-29% of Black and Hispanic students who passed their English II c ourse failed the TAAS reading test (as compared with 10% of White students) obviously must indicate that their teachers are not holding them to the same academic standards as the teachers of White students. 4.5 More appropriate use possible This discussion leads naturally to a simp le solution for avoiding reliance on test scores in isolation to make high stakes decisions a bout students. As previously mentioned, the recent High Stakes report of the National Research Council (Heubert & Hauser, 1999) states clearly that using a sliding s cale or compensatory model combining test scores and grades would be "more compatible wi th current professional testing standards" than relying on a single arbitrary passi ng score on a test (Heubert & Hauser, 1999, pp. 165-66). Moreover this is exactly how tes t scores are typically used in informing college admissions decisions, such that s tudents with higher high school grade point averages (GPA) need lower test scores to be e ligible for admission, and conversely students with lower GPA need higher test scores. Ir onically enough this is indeed exactly how institutions of higher education in Texas use a dmissions test scores in combination with GPA. For example, in 1998, the University of H ouston required that in order to be eligible for admissions, high school students who h ad a grade point average of 3.15 or better needed to have SATI total scores of at least 820, but if their high school GPA was only 2.50, they needed to have SATI total scores of 1080 (University of Houston, 1998). Literally decades of research on the vali dity of college admissions test scores show that such an approach, using test scores and grades in sliding scale combination produces more valid results than relying on either GPA or ad missions test scores alone (Linn, 1982; Willingham, Lewis, Morgan & Ramist, 1990). Mo reover, such a sliding scale approach generally has been shown to have less disp arate impact on ethnic minorities (and women) than relying on test scores alone (Hane y, 1993).
20 of 21 The tendency for a sliding scale approach to have smaller adverse impact on minorities can be illustrated with the data on TAAS scores and English II grades discussed in the last section. Texas now effectivel y uses a double-cut or conjunctive model of decision-making, whereby students currentl y must have a grade of 70 in their academic courses (such as English II) and a score of 70 on TAAS to graduate from high school. These requirements are illustrated in Figur e 4.4 (which is the same as Figure 4.3 except that a vertical line has been added to repre sent the 70-grade requirement and a horizontal line has been added to represent the TAA S 70-score requirement. Figure 4.4 Scatterplot of TAAS Reading Scores and E nglish II Grades with 70 Minima Shown Note also that the data shown in Figure 4 .4 are the same as those summarized in the top portion of Table 4.6. As indicated there, 8 0.4% of white students in this sample passed both the English II course and the TAAS read ing test, while only 10.1% of White students passed English II and failed the TAAS read ing test. In contrast, 53-55% of Black and Hispanic students passed both the course and th e test, but 27-29% of Black and Hispanic students passed English II, but failed the TAAS test. Suppose now that instead of applying a do uble cut rule so that students have to have scores of 70 in both the course and the test t o pass, they need to have a minimum of 140 combined. This circumstance is illustrated in F igure 4.5, below.
21 of 21 Figure 4.5 Scatterplot of TAAS Reading Scores and E nglish II Grades with Sliding Scale Shown As can be seen, under such a sliding scal e approach, higher grades can compensate for lower test scores and vice versa (that is why t he sliding scale approach is sometimes called a compensatory model). Under this approach, the number of Black and Hispanic students passing would increase from 1,395 to 1,765 Â—a 27% increase. Under a sliding scale approach, the number of White students passin g would also increase slightly (from 436 to 487), but since the latter increase is small er proportionately, the disparate impact on Black and Hispanic students would be reduced. The sliding scale decision rule illustrat ed here (TAAS-R + Eng II grade > 140) was chosen merely for illustrative purposes. As with co llege admissions tests, in practice such a sliding scale approach ought to be based on empir ical validation studies. But the example illustrates the way in which an approach mo re in accord with professional standards would significantly reduce adverse impact The literature on college admissions testing strongly suggests it would yield more valid decisions too.
1 of 15 Education Policy Analysis ArchivesVolume 8 Number 41The Texas Miracle in Education Walt Haney 5. Missing Students and Other Mirages As previously mentioned, dropout rate is one of the indicators used in the TEA accountability system for rating Texas districts an d campuses. Also, as summarized in Section 3.3 above, the TEA has reported that dropou t rates have been decreasing in Texas during the 1990s. However, in 1998 when I began stu dying what had been happening in Texas schools, I quickly became suspicious of the v alidity of the TEA-reported dropout data. At least one independent organization in Texa s had previously challenged TEA's "dropout calculation methodology" (TRA, 1998, p. 2) Moreover, two independent sources were reporting substantially higher rates o f dropouts (or attrition) or, conversely, lower rates of high school completion than would be implied by TEA dropout data (Fassold, 1996; IDRA, 1996). Hence, to examine independent evidence on recent patterns of high school completion in Texas and possible effects of the TAA S on grade enrollment patterns and high school completion, I assembled data on the num bers of White, Hispanic and Black students enrolled in every grade (kindergarten to g rade 12) in Texas over the last two decades. (Note 12) Before describing analyses of these data, three additional points should be made. First, in assembling this data set, we have taken c are to double-check the accuracy of all data input (in this context, "we" refers to myself and Damtew Teferra, a Boston College doctoral student who helped me assemble the Texas e nrollment data set). Second, to my original set of data on grade enrollment by ethnic group for each year between 1975-76 and 1998-99, I added data on the numbers of high sc hool graduates each year (provided to me, again, thanks to the kind assistance of Dr. Rincon and Terry Hitchcock). Third, I should mention that data on enrollments and graduat es for 1998-99 were not available until recently and hence were not considered in my previous reports or in the TAAS trial in the Fall of 1999. Finally, in case others might wish to verify results shown below, or conduct other analyses of Texas enrollments over th e last quarter century, I make available via this publication, the set of data I h ave assembled (see, Appendix 7). 5.1 Progress from Grade 9 to High School Graduation In this analysis, I simply took the numbe rs of White, Black and Hispanic Texas high school graduates by year and divided each of t hese numbers respectively by the number of White, Black and Hispanic students enroll ed in grade nine three years earlier. The resulting ratios show the proportion of grade n ine students for each ethnic group who progress on time to high school graduation thre e-and-a-half years later. The results of this analysis are shown in Figure 5.1.
2 of 15 Figure 5.1 shows that between 1978 and 19 85-86, the ratio of HS graduates to grade nine students three years earlier ranged betw een 0.72 and 0.78 for White students and between 0.57 and 0.64 for Black and Hispanic st udents. Between 1985-86 and 1989-90 these ratios declined slightly for all ethn ic groups, from 0.72 to 0.70 for Whites, from 0.59 to 0.57 for Blacks and from 0.57 to 0.56 for Hispanics. However, in 1990-91, the first year the TAAS high school graduation test was used, the ratios for all three groups evidence the most precipitous drops in the w hole 20-year time series: for Whites from 0.699 to 0.640 (a drop of 0.059), for Blacks f rom 0.567 to 0.474 (a drop of 0.093) and for Hispanics from 0.564 to 0.476 (a drop of 0. 088). In other words, the steep drop in this indicator of progress from grade 9 to high school graduation was about 50% greater for Black and Hispanic students than for Wh ite students. In 1991, the ratios for all three ethnic groups showed a slight rebound, from 0.640 to 0.700 for Whites, from 0.474 to 0.518 for Blacks and 0.476 to 0.513 for Hispanics. In 1992-93, the first year in which the TAAS graduatio n requirement was fully implemented, Whites showed a minor decline, from 0. 700 to 0.689, but for Blacks and Hispanics declines were larger: from 0.518 to 0.479 for Blacks and from 0.513 to 0.491 for Hispanics. From full implementation of the TAAS as a requirement for high school graduation in Texas in 1992-93 (with the passing sc ore set at 70%) until 1998-99, the ratio of HS graduates to grade nine students three years earlier has been just at or below 0.500 for Black and Hispanic students, while it has been just about 0.700 for White students. Figure 5.2 presents another view of these data. This figure shows the ratio of the number of Texas high school graduates divided by th e number of grade nine students three years earlier for White and Nonwhite students What this figure shows even more clearly than the previous figure is that since the three-year period 1990-92 in which the TAAS exit test requirement was phased in, the gap i n this ratio for White and Nonwhite students has widened substantially. Specifically, d uring the period 1978 through 1989, the average gap in the ratios graphed in Figure 5.2 was 0.146. However, the average gap
3 of 15in the ratios for Whites and Nonwhites since the TA AS exit test requirement was fully implemented in 1992-93 has been 0.215. This indicat es that the TAAS exit test has been associated with a 50% increase in the gap in progre ssion from grade 9 to high school graduation for Nonwhite students as compared with W hite students. 5.2 Grade-to-Grade Progression Ratios What happened between the late 1970s and the mid-1990's? (Note 13) Where did the decline in progression between grade nine and h igh school graduation occur for Black and Hispanic students? Was it at grade 10 whe n they first took the TAAS exit test, or in grade 12 after they had had a chance to take the TAAS-X as many as eight times? To shed light on this question, I calcula ted the grade-to-grade progression ratios of the number of students enrolled in one grade divide d by the number of students enrolled in the previous grade in the previous year, separat ely for the Black, Hispanic and White ethnic groups. Altogether, 858 such calculations we re computedÂ—13 grade transitions (from kindergarten to grade 1, etc., to grade 12 to high school graduation) for 22 years and three ethnic groups. Overall there was consider able consistency in these grade transition ratios. Across the last twenty years, an d the 13 grade transitions, for the three ethnic groups, overall, transitions from one grade to another have been highly consistent, with 99 or 100% of each ethnic group, on average, p rogressing from one grade in one year to the next grade in the following year. What the detailed results show, however, is that there are two sets of grade progression ratios that were highly unusual (greate r than 1.24 or more than 2 standard deviations from the mean across all transition rati os; see Haney 1999, Table 5). First, in the decade 1976 to 1986, there were 25 grade progre ssion ratios that exceeded 1.24. These were all for the grade 1/kindergarten ratios, and mostly for Black and Hispanic students, though there were a few years for which t he comparable ratios for White students exceeded 1.24. It is likely that these hig h ratios resulted partly from a time when kindergarten attendance in Texas was not unive rsal and many students entered school in grade 1 without previously having attende d kindergarten. Since 1990, there were more than a dozen grade progression ratios that exceeded 1.24. For each and every year from 1992-93 to 199899, the grade 9/grade 8 progression
4 of 15ratio for Black and Hispanic students has exceeded 1.24, while the comparable ratio for White students has remained in the range of 1.08 to 1.11. As shown in Figure 5.3, since 1990 the grade 9/grade 8 progression ratio for Blac k and Hispanic students has risen dramatically, while the comparable rate for White s tudents increased only slightly. The data also reveal that before the mid1980s, the grade9/grade8 progression ratios for Black and Hispanic students were only sl ightly higher than those for Whites. These results clearly indicate that since 1992 prog ress from grade 9 to high school graduation has been stymied for Black and Hispanic students not after grade 10 when they first take the TAAS exit test, but in grade ni ne before they take the test. These results clearly support the hypothesis advanced in my December 1998 report, namely that after 1990 schools in Texas have increasingly been retaining students, disproportionately Black and Hispanic students, in grade nine in order to make their grade 10 TAAS scores look better (Haney, 1998, pp. 17-18). This hypothesis was discussed during the TAAS trial. In his ruling, Judge Prado held that "Expert Walter Haney's" hypothesis that s chools are retaining students in ninth grade in order to inflate tenth-grade TAAS results was not supported with legally sufficient evidence demonstrating the link between retention and TAAS (Prado, 2000, p. 27). In Section 5.6 below, I present documentation that was not allowed into the TAAS trial as evidence to support the hypothesis. For no w, however, suffice it to note that the pattern apparent in Figure 5.3 provides a clear exp lanation for one aspect of the Texas "miracle," namely, the apparent decrease in the rac ial gap in test scores (reviewed in Section 3.3 above). One clear cause for the decreas e in the racial gap in grade 10 TAAS scores in the 1990s (see Table 3.2 and Figure 3.2) is that Black and Hispanic students are being increasingly retained in grade 9 before t hey take the grade 10 TAAS tests. Between 1989-90 (the year before TAAS was implement ed) and the late 1990s, the
5 of 15grade9/grade8 progression ratios for Black and Hisp anic students grew from about 1.20 to 1.30, while the comparable ratio for White stude nts remained at about 1.1. It is apparent from Figure 5.3 that the h igher rates of grade 9 retention of Black and Hispanic students, as compared White students, did not begin with TAAS. The results shown in Figure 5.3 indicate that the grade 9/grade8 progression ratios for minorities began to diverge from those of White stu dents in Texas in the 1980s, before TAAS and even before TEAMS. In an historical sense, then, TAAS and TEAMS could not have directly caused the steady increase since the early 1980s in the proportions of Black and Hispanics retained in grade 9. But the fi rst statewide testing program in Texas, the TABS, did begin in 1980, just about the time the ratio of minority ninth graders to eighth graders began its upward climb, c ompared with the relative stability of this ratio for White students. Whatever the histori cal cause, the fact that by the end of the 1990s 25-30% of Black and Hispanic students, as compared with only 10% of White students, were being retained to repeat grade 9, in stead of being promoted to grade 10, makes it clear that the apparent diminution in the grade 10 racial gap in TAAS pass rates is in some measure an illusion. Data for the last two academic years, i.e ., 1997-98 and 1998-99, provide a picture of how grade progression ratios compare across the grade levels. Specifically, Figure 5.4 shows the grade progression ratios for grades 1 thr ough 12 and for graduates. For grades 1 through 12 these are simply the number of student s enrolled in a particular grade in 1998-99 divided by the number enrolled in the previ ous grade in 1997-98. The only exception to this pattern is for graduates in which the ratio shown is the number of graduates in 1999 divided by the number enrolled in grade 12 in the fall of 1998-99. As can be seen, for most grade levels the progressi on ratios are highly similar for Black, Hispanic and White students. Indeed for grades 2 th rough 8 all of the transition ratios are close to 1.00. Note however how sharply the transit ion ratios diverge for grades 9 and 10. In 1998-99, there were about 30% more Black and Hispanic students enrolled in grade 9 than had been enrolled in grade 8 in 1997-9 8 (as compared with about 10% more
6 of 15Whites). Also, in 1998-99 there were 25-30% fewer B lack and Hispanic students enrolled in grade 10 than had been enrolled in grad e 9 in 1997-98. These data indicate that at the end of the 1990s even for students who had been going to school for virtually their entire careers under TAAS testing (a student in grade 9 in 1998-99, would have been in grade 1 in 1990-91, if not retained in grad e), there remains a huge gap in progress in the early high school years for Black a nd Hispanic students as compared with Whites. As will be shown subsequently, after being retained to repeat grade 9 and/or 10, tens of thousands of students in Texas drop out of school. 5.3 Progress from Grade 6 to High School Graduation The apparent increase in grade 9 retentio n rates suggests a need to revisit the question of rates of progress toward high school gr aduation. In Section 5.1 above, we saw that the rate of progress of Black and Hispanic students from grade 9 to high school graduation fell to about 50% after full implementat ion of the TAAS as a requirement for high school graduation in 1992-93. But now, having seen in section 5.2 that the rate of retention in grade 9 appears to have increased mark edly for Black and Hispanic students in Texas during the 1990s, it is useful to revisit the question of rates of progress toward high school graduation using base years other than grade 9 as a starting point from which to chart progress. This is because the grade 9 to h igh school graduation progress ratio may be lowered because of the increasing numbers of students "bunching up" in grade 9. A number of analyses have been conducted, examining the rates of progress from grades 6, 7, and 8 to high school graduation, six, five and four years later, respectively. For the sake of economy of presentation in an alrea dy overlong treatment, I present here only the results of the grade 6 to high school grad uation six years later (this also allows us later to compare these results with data reporte d by TEA on grade 7-12 dropout rates). These are presented for cohorts labeled by their ex pected year of high school graduation. The cohort class of 1999, for example, would have b een in grade 6 in 1992-93. Figures 5.5 and 5.6 show the progress of grade 6 White and minority (Black and Hispanic) grade 6 cohorts of students to grades 8, 10, 11, 12 and high school graduation. As can be seen, over the last 20 years, for both Wh ite and minority cohorts, close to 100% of grade 6 students appear to be progressing t o grade 8 two years later. For White students in grade 6 cohorts of the classes of 198285, about 90% proceeded to grade 11 and 12 on time and about 80% graduated six years af ter they were in grade 6. For minority grade 6 cohorts the rates of progress were lower: for grade 6 cohorts of the classes of 1982-85 about 80% of Black and Hispanic students progressed on time to grades 11 and 12 and about 65% graduated. For classes of 1986 to 1990, there were s low but steady declines in all rates of progress for White students, from grade 6 to 8, fro m grade 6 to 10, etc. For minority cohorts of the classes of 1986 to 1990, there were initially sharper declines in rates of progress to grades 10, 11, and 12, but the cohorts of the 1989 and 1990 classes showed some rebounds in rates of progress to grades 10, 11 and 12 (and for the 1990 cohort to graduation). These patterns are associated with imp lementation of the first Texas high school graduation test, the TEAMS from 1985 to 1990 In 1991, the initial year of TAAS testing the grade 6 to high school graduation ratios fell precipitously; from 1990 to 1991, the r atio fell from 0.75 to 0.68 for Whites and from 0.65 to 0.55 for minorities. From 1992 to 1996, this ratio held relatively steady, for Whites at about 0.75 and for minorities at about 0.60. Since 1996, there have been slight increases in the high school graduation to grade six ratios, for Whites to 0.78 in 1999 and for minorities to almost 0.65.
7 of 15 Stepping back from specific numbers repre sented in Figures 5.5 and 5.6, three broad findings are apparent. First, the plight of B lack and Hispanic students in Texas is not quite as bleak as it appeared when looking at grade 9 to high school graduation ratios, which showed only 50% since 1992 progressin g from grade 9 to high school graduation. The bottom line in Figure 5.6 indicates that for most classes of the 1990s, 60-65% of Black and Hispanic students progressed fr om grade 6 to graduate on-time six years later (the grade 9 to graduation ratios are l ower because of the increasing rates of retention in grade 9).
8 of 15 Second, one of the major features of both Figures 5.5 and 5.6 is that in each, the bottom two lines (representing the grade 12 to grad e 6, and graduation to grade 6 ratios) tend to converge over the last 20 years. This means that over this period, given that students make it to grade 12, they are increasingly likely to graduate. For White students for example, in the class of 1999, almost 80% progr essed from grade 6 to grade 12, and 78% to graduation. In contrast, in the classes of t he early 1980s, around 90% were making it from grade 6 to grade 12, but only about 80% were graduating. For minority classes of the early 1980s, about 80% were progress ing on-time to grade 12, but only about 65% graduating. For minority classes of 1998 and 1999, 68-69% progressed to grade 12 and 64-65% to graduation on time. In other words, a major pattern revealed in these two figures is that since high school graduat ion testing was introduced in Texas in the mid-1980s, one major change appears to have bee n that larger proportions of students who reach grade 12 do graduate. The flip side of this pattern is that ove r this interval, smaller proportions of students, both White and minority are progressing a s far as grade 12. For White classes of the early 1980s, about 90% of students in grade 6 progressed to grade 12 six years later, but by the 1990s the corresponding ratios ha d dropped to slightly below 80%. For minority classes of the early 1980s around 80% prog ressed from grade 6 to grade 12 six years later, but by the 1970s only 70% were progres sing on time to grade 12. The most obvious reasons for these substa ntial declines in progress from grade 6 to grade 12 six years later are increased rates of retention in grades before 12 and increased rates of dropping out before grade 12. In the next section, we review data on rates of retention in grade in Texas, and in Sectio n 5.5 explain an alternative strategy to estimate numbers of dropouts. 5.4 Cumulative Retention Rates In 1998, the TEA published the 1998 Comprehensive Biennial Report containing statewide rates of retention in grade, reported by ethnicity. These data are of interest for several reasons. First, these data provide confirma tion of what was apparent in the data shown in Figure 5.3, namely that the rate at which Black and Hispanic students are retained in grade 9 is 2.5 to 3.0 times that of the rate at which White students have to repeat grade 9.Table 5.1 Texas Statewide Rates of Retention in Grade 1996-97 by EthnicityGrade White %retained Afric.-Amer %retained Hispanic % retained Total % K2.30%1.40%1.60%1.80%14.40%7.00%6.60%5.60%21.60%3.20%3.40%2.50%30.90%2.10%2.10%1.50%40.70%1.30%1.40%1.10%
9 of 15 50.60%0.90%1.00%0.80%61.00%2.10%2.30%1.60%71.60%3.70%3.80%2.70%81.30%2.10%2.90%2.00%99.60%24.20%25.90%17.80%104.80%11.60%11.40%7.90%113.20%8.30%7.90%5.40%122.50%6.30%7.20%4.40%Total2.70%5.70%5.80%4.20% Source: TEA, 1998 Com prehensive Biennial Report, Table 4.2, p. 53. These data also allow us to see that desp ite much rhetoric lately about so-called "social promotion," retention in grade may be more common for Black and Hispanic students in Texas than is social promotion. Using a n approach suggested by Robert Hauser, I analyzed data on patterns of retention in grade in Texas statewide as reported by the Texas Education Agency (and summarized in th e table above). The approach suggested by Hauser is simply to subtract annual gr ade retention rates from 1.00 to yield rates of non-retention. The non-retention rates can then be multiplied across the grades to yield "compound" non-retention rates. The result s for 1996-97 are shown in Table 5.2.Table 5.2 Cumulative Rates of Grade Promotion, 1996-97 WhiteBlackHispanic Grades 1-393.22%88.13%88.33%Grades 4-697.72%95.76%95.37%Grades 7-897.12%94.28%93.41%Grades 9-1281.22%57.57%56.11%All twelve grades71.86%45.81%44.15% Source: Based on TEA, 1998 Comprehensive Biennial Report, Table 4.2, p. 53 White students have a probability of prog ressing through 12 grades without being retained in grade of about 72%. However, for Black and Hispanic students the comparable rates are 46% and 44%. In short, even be fore the end of so-called social promotion, Black and Hispanic students in Texas app ear more likely than not to be retained in grade over the course of a 12-year scho ol career. Note also that the compound retention rate for Hispanics (56%) is abou t double that for White students
10 of 15(28%), even before taking into account that Hispani cs are much more likely than White students to drop out of school before grade 12. Not e also that even before the secondary level of education, Black and Hispanic students in Texas are more likely not to be promoted (that is, to be retained in grade) than Wh ite students. The data in Table 5.2 indicated that at both the early elementary (grades 1-3) and upper elementary (grades 4-6) Black and Hispanic students are 70-75% more li kely than White students to be "flunked," and retained to repeat a grade in school 5.5 Dropouts and the Illusion of Progress The retention rates shown in Table 5.1 ma y be used together with statewide enrollment data f or 1995-96 and 1996-97 to calculate the grade levels a t which students are dropping out of school in Texas. The logic of these calculations is as follow s. If we assume no net migration of students into Texas, the number of students enrolled in say, grad e 6 in 1996-97 ought to be equal to the sum of the number of students enrolled in grade 5 times the ra te of non-retention in grade 5, plus the number enrolled in grade 6 times the grade 6 retention rat e. Using this approach we may calculate the predicted grade enrollments in 1996-97 and compare them with the actual 1996-97 enrollments. Table 5.3 and Figure 5.7 show the results of such c alculations for the Black, Hispanic, White and Total groups of students enrolled in Texas schools. As can be seen, across all groups for grades 2 thro ugh 9 the enrollments for 1996-97 predicted on the basis of 1995-96 enrollments and reported rates of retention are quite close to the actual enrollments for 1996-97. For these grade levels the actual enro llments vary from those predicted by less than about 2%. For grade 1, actual enrollments in 1996-9 7 exceed those predicted by 56%. The differences between actual and predicted grade 1 en rollments are fairly consistent across ethnic groups and presumably derive from the fact that acr oss all groups kindergarten attendance was not universal in 1995-96 (hence the grade 1 enrollments in 1996-97 are larger than predicted from 1995-96 kindergarten enrollments).Table 5.3
11 of 15 Grade Enrollments in Texas, 1996-97 Predicted and Actual Minus Predicted BlackHispanicWhiteTotal Grade Predicted ActualMinusPred'd % Diff. Pred. Act.MinusPred'd % Diff. Pred. Act.MinusPred'd % Diff. Pred. Act.MinusPred'd % Diff. 1st42,9252,8706.3%117,5646,3905.2%126,3069,2026.8%2 86,85818,3996.4% 2nd42,9989172.1%112,5102,3302.0%131,4401,5601.2%287 ,0204,7351.7% 3rd42,1125841.4%109,4342,0811.9%132,4801,0970.8%284 ,0203,7681.3% 4th42,0164991.2%107,7482,3552.1%133,6838140.6%283,6 973,4181.2% 5th41,0523880.9%105,9892,0151.9%137,2465680.4%284,1 703,0881.1% 6th41,3903690.9%106,3601,5751.5%141,235-281-0.2%288 ,8431,8050.6% 7th41,2205131.2%105,6562,2372.1%137,6251,7791.3%284 ,5664,4641.6% 8th40,208190.1%104,465460.0%138,0441890.1%282,76820 30.1% 9th50,6963920.8%131,4921,2250.9%149,4542,1751.4%330 ,4225,0121.5% 10th42,418-5,791-15.8%103,814-14,969-16.9%141,855-1 0,705-8.2%288,978-32,356-11.2% 11th34,138-3,504-11.4%79,894-8,984-12.7%125,045-7,7 67-6.6%239,395-20,573-8.6% 12th27,732-1,679-6.4%64,911-5,375-9.0%111,361-8,038 -7.8%203,876-14,964-7.3% Note however that for grades 10, 11 and 1 2 much larger disparities are apparent and vary considerably by ethnic group. Overall, enr ollments in grades 10, 11 and 12 in 1996-97 were more than 65,000 lower than predicted based on the previous year's enrollments. The missing students were predominantl y Black and Hispanic. Grade 10 enrollments in 1996-97 were about 16% lower than ex pected for Black and Hispanic students, but only about 8% lower than expected for White students. What happened to these missing students? It seems extremely likely that they dropped out of school. This is not terribly surpris ing since previous research shows clearly that retention in grade is a common precurs or to dropping out of school. The grade 9 retention rates in Texas are far in excess of national trends. A recent national study, for example, showed that among youn g adults aged 16-24, only 2.4 percent had been retained in grades 9-12 (NCES Dropout rates in the United States 1995, Report No. dp95/974735). The recent report of the National Research Council (NRC) also shows Texas to have among the highest gr ade 9 retention rates for 1992 to 1996 among the states for which such data are avail able (Heubert & Hauser, 1999, Table 6-1). (Note 14) A casual observer might well wonder what is wrong with retaining students in grade 9 if they are academically weak. The answer i s explained in the recent report on high stakes testing from the National Research Coun cil: The negative consequences, as grade retention is cu rrently practiced, are that retained students persist in low achievement levels and are more likely to drop out of school. Low performing students who hav e been retained in kindergarten or primary grades lose ground both aca demically and socially relative to similar students who have been promoted (Holmes, 1989; Shepard
12 of 15 and Smith, 1989). In secondary school, grade retent ion leads to reduced achievement and much higher rates of school dropout (Heubert & Hauser, 1999, p. 285). Even the TEA has acknowledged that "resea rch has consistently shown that being overage for grade is one of the primary predictors of dropping out of school in later years. . Being overage for grade is a better predict or of dropping out than underachievement." (TEA, 1996 Comprehensive Biennia l Report on Texas Public Schools, pp. 35, 36.). Hence, it is fair to say that the soaring grade 10 TAAS pass rates are not just an illusion, but something of a fraud from an educatio nal point of view. Table 5.4 presents data to support this view.Table 5.4 Texas Grade 10 Enrollments 1996-97 and Taking TAAS February 1997, by Ethnicity Enrollments 1996-97 Taking TAAS Tests, February 1997 Alternative Pass Rates PredictedActualNo. % passing all 3 tests Based on Actual F96 Enrl. Based on Pred't F96 Enrl.Black 42,41836,62727,45148.0%36.0%31.1% Hispanic 103,81488,84569,42152.0%40.6%34.8% White 141,855131,150108,21581.0%66.8%61.8%Source: Enrollments and no. taking and passing TAAS : TEA, PEIMS Data 1996-1997 and www.tea.state.tx.us/student.assessment/results/ summary/sum97/gxen97.htm (downloaded March 22, 2000). Predicted enrollments based on 1995-96 enrollments and rates of grade promotion and retention as expla ined in text. What these data show is that the dramatic ally improved pass rates on the 1997 grade 10 TAAS tests are in part a result of student s who dropped out (or are otherwise missing) between grade 9 in 1996 and the TAAS testi ng in February 1997. The overall pass rates reported by TEA on the 1997 grade 10 TAA S tests, of 48%, 52% and 81% for Black, Hispanic and White students, respectively, d rop to 36%, 40.6% and 66.8% if we base the pass rates on the Fall 1996 actual enrollm ents. And they drop even further, to 31.1%, 34.8% and 61.8% if we base the pass rates on the number of students predicted to have been in grade 10 in 1996-97 (based, as explain ed above on the 1995-96 grade 9 enrollments and the TEA reported rates of retention in 1996-97). This is, of course, also a reminder of an elementary fact of arithmetic. One can increase a proportion (such as percent passing) not just by increasing the numerator--but also by decreasing the denominator. In the next two sections, I estimate the proportions of the apparent gains in pass rates on the grade 10 TA AS tests between 1994 and 1998 that are attributable to decreases in the denominator (b ecause of exclusion of students because either they dropped out of school or were classifie d as special education) and increases in the numerator (that is actual increases in numbers of students passing TAAS). Later, in Part 7, I return to the topic of dropouts in Texas, specifically to review and try to
13 of 15 reconcile sources of evidence about high school com pletion in Texas. 5.6 Increase in Special Education Exclusions Before trying to distinguish the proporti ons of apparent TAAS gains that are real from those that are illusory, it is necessary to ex plain another manner in which students may be excluded from the grade 10 TAAS results used to rate secondary schools in Texas. It may be recalled that the soaring pass rat es on the grade 10 TAAS summarized in Part 3 above were based on grade 10 students "no t in special education." As far as I know, the TEA has not reported directly numbers of grade 10 students over time who were "in special education." However, TEA has repor ted the grade 10 pass rates separately for all students and for all students no t in special education (at www.tea.state.tx.us/student.assessment/results/summ ary/). This allows us simply to subtract the two sets of data to derive the numbers and percentages of students who took the grade 10 TAAS who were classified as "in specia l education." Summary results are shown in Table 5.5. (Note 15) Table 5.5 Number and % of Grade 10 TAAS Takers in Special Education, 1994-1998Numbers of Grade 10 TAAS Takers in Special Educatio n YearAll groupsAfric.-Amer.HispanicWhite 199476028331991468519959049103223515581199611467150030176810199713005151837077617199814558181842718284 Percentages of Grade 10 TAAS Takers in Special Educ ation YearAll groupsAfric.-Amer.HispanicWhite 19943.9%3.3%3.3%4.5%19954.5%4.0%3.7%5.2%19965.3%5.4%4.5%6.1%19975.8%5.3%5.1%6.6%19986.3%6.3%5.7%7.1% As can be seen, the numbers and percentag es of students taking the grade 10 TAAS, but classified as "in special education," hav e increased steadily between 1994 and 1998. This means that increasing numbers of student s who have made it to grade 10 and taken the grade 10 TAAS have been excluded from sch ool accountability ratings. Indeed between 1994 and 1998, the numbers of Black and His panic students taking the grade 10 TAAS counted as "in special education" more than do ubled, though the percentage of
14 of 15White students counted as "in special education" re mained higher (7.1% vs. 6.3% and 5.7% for Black and Hispanic tenth graders, respecti vely). This means that a portion of the increase in pass rates on the grade 10 TAAS is attr ibutable simply to the increases in the rates at which students were counted as in special education and hence excluded from school accountability ratings and from summary stat istics showing pass rates for students not in special education. 5.7 How Much Illusion from Exclusion? In Part 2 above, I reviewed evidence of t he dramatic gains made in the passing rates on grade 10 TAAS between 1994 and 1998. As sh own in Table 3.1, the percentage of students in Texas not in special education who p assed all three grade 10 TAAS tests increased from 52% in 1994 to72% in 1998, a 20 poin t increase. In the preceding two sections (5.5 and 5.6), we have seen that portions of this gain are purely an illusion due to increases in the numbers of students dropping out o f school before taking the grade 10 TAAS, or else taking the grade 10 TAAS but excluded from accountability results because they are counted as "in special education." Hence, it is useful now to try to estimate what portion of the increased pass rate on TAAS is purely an illusion produced by these two kinds of exclusion. In the previous section we saw that the p ercentage of students taking the grade 10 TAAS who were classified as "in special education" increased from 3.9% in 1994 to 6.3% in 1998. This suggests that around 2% of the 2 0-point gain in TAAS scores over this interval may be attributable simply to the inc rease in special education classifications. Note also that the increase in spe cial education classifications has been larger for Black than for White students, so this m ay also account for a portion of the closing of the "race gap" in TAAS scores over this period. In contrast, the increase in Hispanic students classified as special education h as been slightly less than the comparable increase for White students, so this fac tor could not account for the apparent shrinkage in the race gap in TAAS scores between Hi spanic and White students. What about the possible effects of increa ses in dropout rates in inflating the apparent grade 10 pass rates? To answer this questi on we would need to have estimates of the dropout rates between the early 1990s and 19 98. In Section 5.5 above, I presented estimates of dropouts for one year, namely 1996-97. Nonetheless, the grade 8 to 9 progression ratios discussed in Section 5.2 clearly suggest that dropout rates increased between the early and late 1990s. Specifically betw een the early and late 1990s, the grade 8 to 9 progression ratios for Black and Hispanic st udents increased from around 1.20 to nearly 1.30. This suggests that the rate at which B lack and Hispanic students are being retained in grade, and having to repeat grade nine increased over this interval by around 50%. Since grade retention is a common precursor to dropping out, this certainly suggests an increased dropout rate. At the same time, the an alyses of progress for grade 6 cohorts presented in Section 5.3 revealed that grade 6 to g rade 11 progression ratios for Whites and minorities varied by not more than 5% during th e 1990s (for Whites, the ratio was consistently between 85% and 89%; and for minoritie s between 75% and 80%). The reason for focusing here on progress to grade 11 is because the data on enrollments is from the fall whereas TAAS is taken in the spring. But if students progress to grade 11, they presumably have taken the exit level version o f TAAS in spring of the tenth grade. What this suggests is that the majority o f the apparent 20-point gain in grade 10 TAAS pass rates cannot be attributed to exclusion o f the types just reviewed. Specifically, if rates of progress from grades 6 to grade 11 have varied by no more than 5% for cohorts of the classes of the 1990s, this su ggests even if we take this as an upper
15 of 15bound, the extent to which increased retention and dropping out before fall grade 11, and add 2% for the increased rate of grade 10 special e ducation classification, we still come up with less than half of the apparent 20-point gai n in grade 10 TAAS pass rates between 1993 and 1998. So at this point in our analysis, it appears that while some of the gains may be due to these three forms of exclusion, a maj ority portion of the apparent gain is not. Hence it will be useful to turn in Part 7 to s ee whether the apparent gains on TAAS show up in any other evidence on the status and pro gress of education in Texas. Before turning to that topic, in Part 6 we review evidence from survey research about the effects of TAAS on education in Texas.
1 of 14 Education Policy Analysis ArchivesVolume 8 Number 41The Texas Miracle in Education Walt Haney 6. Educators' Views of TAAS When it was learned in early May 1999 tha t the trial in the GI Forum case was to be postponed from June until September 1999, I real ized that this delay would allow sufficient time to undertake surveys of Texas teach ers about TAAS. We had a variety of indirect evidence that raised doubts about the vali dity and reliability of TAAS scores and the relationship of TAAS tests to secondary school teaching: TAAS results statewide from 1990 to 1998, the historical record concerning the setting of the passing score of 70% correct on the TAAS, patterns of grade enrollme nts in Texas over the last two decades, and data on the relationship between high school course grades and TAAS scores. However, we did not have any systematic evi dence from those most directly affected by the TAAS graduation test, namely, secon dary teachers and students, as to the educational value and effects of the TAAS testing. Consequently, a survey was in order. Though we lacked the time and resources t o survey the opinions of Texas students, with the help of Boston College graduate students, I undertook two different surveys of statewide samples of secondary teachers in Texas. O ne survey, previously described above was the survey for the Risk Analysis study. T he second and larger survey was a "Survey of Testing and Teaching in Grades 7-12 in T exas," or what in shorthand was called the Testing and Teaching (TT) survey. To undertake these surveys, I purchased m ailing labels for a random sample of 4000 secondary teachers in Texas (specifically math and English/Language Arts) from Market Data Retrieval of Shelton, Connecticut. The number of 4,000 mailing labels was selected simply to meet the minimum purchasing requ irements of this firm. From this list of 4,000, I then randomly selected 1,000 names to b e used for the Testing and Teaching survey and 500 names to be used for the Risk Analys is survey. The survey forms were mailed on May 23, 1999, with self-addressed, stampe d return envelopes. We tabulated all responses that were returned by the end of June, 19 99, specifically 148 responses for the TT survey and 66 for the Risk Analysis survey. For both surveys we double-checked the accuracy of data entry before tabulating results. S ince the Risk Analysis survey has been described in Part 4 above, I do not discuss it furt her here. After undertaking this survey, I learned of two other surveys of Texas educators regarding TAAS: one by James Hoffman of the Univers ity of Texas at Austin and colleagues and the other by Gordon and Reese (1997) I describe these surveys in the order in which they were undertaken and reported, r ather than the order in which I learned of them. 6.1 Survey/Interviews with Public School Teachers i n Texas Gordon and Reese surveyed 100 Texas teach ers and followed up with interviews with 20 of the initial respondents. The authors do not explain how the survey
2 of 14respondents were sampled, but they do mention they were "graduate students in educational administration" (Gordon & Reese 1997, p. 349). Given the authors' affiliation as professors of educational administra tion at Southwest Texas State University, one suspects that respondents may well have been an opportunity sample of graduate students in the authors' program and perha ps other similar graduate programs. Nonetheless the authors do report that respondents' schools represented a cross section of Texas public schools relative to education level (elementary, middle school, high school), size, location (urban, suburban, rural), s ocioeconomic status (high and low SES) and TAAS category (exemplary, recognized, acceptabl e and low performing) (Gordon & Reese 1997, p. 349). In both the written survey and the follow -up interviews respondents were asked to address four broad questions: 1. How are students at your school prepared for TAA S? 2. What are the effects of TAAS on your students?3. What are the effects of TAAS on you as a teacher ? 4. What are the effects of TAAS on your school? In the initial written survey, respondent s were given a full blank page to respond to each question. In the follow-up interviews, a st ratified random sample of respondents was chosen for in-depth interviews. Interviews were audio-taped, transcribed and coded to identify patterns among responses. Regarding preparation for TAAS, responden ts indicated that a huge amount of school time was devoted to coaching students for TA AS, with TAAS preparation becoming "all-consuming" during a period of four to eight weeks before the testing (p. 355). In most schools TAAS practice quizzes were ad ministered on a regular basis with emphasis on teaching to the TAAS format, such as ha ving students practice "bubbling" in answers on machine scorable answer sheets. Respondents' answers regarding effects of TAAS on students were categorized as dealing with emotional, academic and social effects For one group of students, teachers reported "no emotional effects at all because these students fail to recognize the importance of TAAS and are totally indifferent abou t it" (p. 356). A second group experiences moderate stress "which tends to motivat e them to work harder to prepare for the test" (p. 356). A third group of students exper ience high stress as a result of TAAS. Among some in this group, according to respondents, the stress leads to anxiety and even panic. Among others it leads to anger and rese ntment. And another "subgroup eventually responds to the stress by "shutting down ";; they cope by telling themselves they have no chance of doing well on TAAS and givin g up" (p. 356). One respondent reported that the stress of TAAS "contributes to th e dropout rate" (p. 357). Regarding effects of TAAS on teachers, th e vast majority of interview respondents (17 of 20) reported that TAAS leads to an emphasis on teaching TAAS-related content and "de-emphasis on teaching content not related to TAAS," (p. 359), including less emphasis on higher-level skills. All 20 interviewee s also reported "concern, frustration and disappointment, caused by observing the negativ e effects of TAAS failure on at-risk students" (p. 360). Interviewees also reported that TAAS scores are "not accurate measures of the academic progress that their at-ris k students have made" (p.360). Regarding effects on teachers, "Nineteen of the interviewees agreed that TAAS makes them accountable in terms of teaching TAAS-re lated content, but that it does not make them accountable in terms of being effective t eachers" (p. 360). While acknowledging the need for teacher accountability, respondents felt that TAAS was not a
3 of 14good vehicle for achieving accountability because TAAS is not a true measure of student learning and . .it is unfair to use a sin gle instrument like TAAS to compare the performance of teachers who are working with studen ts of widely varying socioeconomic backgrounds, academic abilities and m otivational levels" (pp. 360-61). Regarding effects on schools, interviewee s reported that "considerable human and material resources are expended on TAAS preparation (p. 361) and that aspects of the curriculum that did not relate to TAAS were de-emph asized. Respondents were split as to whether or not their schools were "receiving pre ssure from parents and the community to do well on TAAS" (p. 362). In their discussion, Gordon and Reese wri te that teacher respondents "reported not just 'teaching to the test' but also teaching to th e test format and doing so at the expense of large portions of the curriculum" (p. 363, empha sis in original). They also report that via focused "TAAS prep" teachers can "teach student s how to respond correctly to test items even though the students have never learned t he concepts on which they are being tested" (p. 364). The authors conclude that "drill and kill" coaching and preparation for TAAS are taking a "toll on teachers and students al ike" and comment: The most devastating effects of high-stakes testing seem to be occurring to the students who these tests are supposed to help t he mostÂ—lower achieving students. Presumably, by setting clear standards an d measuring results, state mandated tests make schools accountable for the bas ic education to which all children are entitled. According to participant s in our study, however, their at-risk students" academic progress is being hindered by the negative effects of failing a test that many teachers insist does not measure what their students need to learn at their current stages of d evelopment, does not measure the progress their students have made, and is culturally biased. (Gordon & Reese, 1997, pp. 364-65). The authors concluded with a number of re commendations for public dialog about the merits of high stakes testing, staff developmen t, monitoring of the effects of high stakes testing, and establishment of a broader syst em of student assessment. 6.2 Testing and Teaching Survey of Secondary Math/E nglish Teachers I did not learn of the Gordon & Reese (19 97) survey until recently. However, as previously explained, when the TAAS trial was postp oned from June until September 1999, my colleagues and I decided to undertake a su rvey of a representative sample of teachers in Texas statewide. The purpose of our Testing and Teaching s urvey was to obtain the opinions of a representative sample of secondary math and English /Language arts teachers in Texas statewide about the relationships between mandated testing and teaching and the effects of mandated testing. The survey form we used is a m inor revision of a survey instrument that was administered to teachers nationwide in the early 1990s as part of a study funded by the National Science Foundation (Madaus et al., 1992). Specifically, from their survey instrument, one set of questions related to elementary education was deleted, one question was added, and space was provided at the e nd of the survey form for respondents to comment and provide their name and a ddress, if they wished to receive a summary of survey results. Note that our survey for m did not specifically ask about TAAS. A copy of our Testing and Teaching survey for m is provided in Appendix 1.
4 of 14 By the end of June 1999, we had received 148 responses to our Testing and Teaching survey (representing a 14.8% response rate ). (Note 16) After survey forms were received, data were entered and checked for ac curacy, and a code book documenting data coding was developed. Before summa rizing overall results of the Testing and Teaching survey, I should mention that on two of the forms returned, respondents had not completed answers to most quest ions, so they were excluded, leaving the main analysis sample with 146 responden ts. Respondents showed a good distribution of grade levels from 7 to 12, with several indicating teaching at more than one grade level. T he vast majority were certified teachers (143) and roughly half (72) indicated that they had more than 12 years of teaching experience. The vast majority (123) also r eported that they were "very comfortable" teaching their subject area. As the survey form we used was addressed to the topic of mandated testing, it did not ask respondents directly about TAAS. However, i n response to one question (C1), 118 respondents indicated that students in their cl ass were required by their state or district to take standardized tests in the subject during the current calendar year. Space was provided for respondents to write the names of mandated tests to which they were referring and 112 respondents explicitly mentioned TAAS. In response to a question about how manda ted test results are used, respondents indicated that the most common uses were: to publish test scores (81%);to evaluate teachers (66%);to place students in programs (57%);to promote/graduate students (53%). In contrast, only a minority of respondents (46%) i ndicated that mandated test results were used to alter the school curriculum. In response to two sets of questions abou t teachers' own use and administrators' use of mandated test results, teachers indicated th at results were "minimally" to "somewhat" important for a variety of purposes; but the uses rated most important across both sets of questions were two uses by administrat ors:, namely school evaluation and district evaluation (both rated on average between "very" and "extremely" important). A section of questions asked about test p reparation. Results for these questions suggested a huge amount of test preparation, with t he majority of respondents indicating that they do many different kinds of test preparati on and 50% of respondents indicating that they spend more than 30 hours per year on test preparation. Also, 75% of respondents said that they begin test preparation m ore than one month before the mandated test. In a set of questions addressed to the re lationships between testing, curriculum and evaluation, respondents indicated that mandated tes ting influences teaching in a variety of ways, including influencing the increase or decr ease of emphasis on certain topics and the content and format of tests that teachers use. In response to a question about the similarity of content of mandated testing and their own instruction, only 52% of 129 respondents answered "quite" or "very" similar. Another series of questions asked about m ore general influences of mandated testing. The percentages of teachers agreeing (that is, agreeing or agreeing strongly) with each of these statements are summarized in Table 6. 1.Table 6.1
5 of 14 Summary Results of Secondary Teachers' Answers abou t General Influences of Mandated Testing in TexasStatement Percent of teachers answering "Agree" or "Strongly Agree" (n=139 to 142) 6. Mandated testing influences teachers to spend mo re instructional time in whole group instruction.65%7. Mandated testing influences teachers to spend mo re instructional time in developing critical thinking skills.458. Mandated testing influences teachers to spend mo re instructional time on individual seat work.579. Mandated testing influences teachers to spend mo re instructional time in developing basic skills.7310. Mandated testing influences teachers to spend m ore instructional time with small groups of students working together (cooperative learning).2411. Mandated testing influences teachers to spend m ore instructional time solving problems that are likely to appear on tests.8812. Mandated testing influences teachers to spend m ore instructional time in the use of manipulatives and/or experiments for concept development.2213. Teachers in my district are gearing their instr uction to mandated tests.8214. Mandated testing helps students achieve the obj ectives of the curriculum.3215. Teachers in my district have a pretty good idea of what students can do without using mandated tests.8216. The evaluation of teachers' competence is influ enced (directly and/or indirectly) by their students' mandated test scores.6817. Mandated testing contributes to the realization of the goals of the current educational reform movement.2918. My state or district testing program sometimes leads teachers to teach in ways that go against their own ideals of g ood educational practice.6419. My district is putting pressure on teachers to improve their students' mandated test scores.8620. Students' mandated test scores are below the ex pectations of my school or district.3821. Mandated testing influences some teachers in my district to engage in non-standard testing practices (such as c hanging responses or increasing testing time limits).12
6 of 14 22. Mandated testing influences some administrators in my district to engage in non-standard testing practices (such as c hanging responses or increasing testing time limits).12 While far more could be said about these results, k ey findings are as follows: Teachers in Texas are clearly feeling pressure to r aise TAAS scores (86% of respondents agreed with the statement "My district is putting pressure on teachers to improve their students' mandated test scores.") Teachers have a pretty good idea of what students c an do without mandated tests (82% agreed with the statement "Teachers in my dist rict have a pretty good idea of what students can do without using mandated tests." ) More teachers disagreed (45%) than agreed (32%) wit h the statement that "Mandated testing helps students achieve the object ives of the curriculum." More teachers disagreed (39%) than agreed (29%) wit h the statement that "Mandated testing contributes to the realization of the goals of the current educational reform movement." On the brighter side, results of the Testing and Te aching survey suggest that teachers and administrators are not widely engaging in non-standard testing practices (only 12% of respondents agreed with the last two statements (# 21 and #22) in part of F of the survey form). Indeed, one respondent commented "Perhaps I misunderstood questions 21 & 22. Are you asking i f my district condones cheating? Absolutely not, the repercussions for tha t are very severe in this state" . As indicated, the last portion of the Tes ting and Teaching survey form provided space for respondents to offer comments after these instructions: "If you would like to offer any comments about the relationship between m andated testing and teaching in Texas secondary schools, please write them here." A total of 51 respondents offered comments. On balance, these spontaneous comments on the relationship between mandated testing and teaching in Texas secondary sc hools were far more negative than positive about the role of mandated testing, with c omments such as the following: TAAS results haven't had the desired effect. It is used more as a "HAMMER" rather than a tool to improve. (Case 17)I am not against mandated testing; but every time w e work out a procedure for balancing the teaching, the state moves the tes t to a different grade level. We have it working well now, and now they're talkin g about moving it to 9th & 11th instead of 10th. (Case 39)Mandated state TAAS Testing is driving out the best teachers who refuse to resort to teaching to a low-level test! (Case 67) In citing these few comments here, I note that the full set of all respondents' comments appears as Appendix 2. 6.3 Survey of Texas Reading Specialists The third survey of educators in Texas ab out TAAS was by Hoffman, Pennington & Assaf of the University of TexasÂ—Austin and Paris of the University of Michigan. I
7 of 14did not learn of this survey until just before the TAAS trial in the Fall of 1999 and results of this survey were not allowed to be enter ed as evidence in the TAAS case. Nonetheless, Hoffman and colleagues have been very generous in sharing with me not just a manuscript reporting on their survey results but also an entire set of their original data. The Hoffman et al. (1999) survey was of m embers of the Texas State Reading Association (TSRA), an affiliate of the Internation al Reading Association, whose membership includes classroom teachers, reading spe cialists, curriculum supervisors, and others in leadership positions. The purpose of the survey "was to examine the ways in which TAAS affects teachers, teaching and studen ts from the perspective of the professional educators who are closest to classroom s and schools" (Hoffman et al., 1999, p. 3). The survey form contained 113 items, many du plicated or slightly adapted from Urdan & Paris's (1994) survey of teachers in the st ate of Michigan and the Haladyna, Nolen, and Haas (1991) survey of teachers in Arizon a. The survey items were mostly Likert-scale items (with a five-point scale answer format: 1=strongly disagree, 2=disagree, 3= agree, 4=strongly agree, and 5=don't know) asking about attitudes, test preparation and administration practices, uses of s cores, effects on students, and overall impressions of trends. In addition, five items were included containing invitations for extended written responses. The authors surveyed a random sample of m embers of the TSRA. After a reminder letter and a second random sampling, they received a total of 201 usable responses representing an overall return rate of 27 % of surveys sent (representing 5% of the total membership of TSRA). The authors report t hat no biases were detected in the response rates "based on geographical areas of the state" of Texas (p. 4). The authors reported results in three dif ferent ways: percentages responding to particular questions in particular ways, scaled res ponse representing answers summed across items relating to similar topics, and verbat im quotations of written responses. Overall, respondents to the Hoffman et al. survey w ere older (61% between the ages of 40-60), and more experienced (63% with over 10 year s experience and 45% with over 20 years experience) than classroom teachers in gen eral in the state of Texas (p. 5). Scaled score responses indicated that on a composite measure of general attitudes toward TAAS "reading specialists strongly disagree with some of the underlying assumptions and intentions for TAAS" (p. 5). Other scaled score responses revealed that that "reading specialists challenge the basic valid ity of the test and in particular for minority and ESL speakers who are the majority in T exas public schools" (p. 6). Another composite variable representing general attitudes t owards TAAS reflected "a strong negative attitude toward TAAS" (p. 7). Respondents' answers regarding effects of TAAS on students revealed that a majority said that TAAS often or always caused stud ent irritability, upset stomachs and headaches. Responses to three questions regarding o verall impressions of TAAS were particularly striking. One question asked: The results from TAAS testing over the past several years seem to indicate that scores are on the rise. Do you think this rise in test scores reflects increased learning and higher quality teaching? To this question, 50% answered no, and 27 % answered yes. Another question read as follows: It has been suggested that the areas not tested dir ectly on TAAS (e.g., fine
8 of 14arts) and other areas not tested at certain grades levels (e.g., science at the 4 th grade level) receive less and less attention in the curriculum. What do you feel about this assertion? In response to this question, 85% answere d "very true" or "somewhat true." A third question read as follows: It has also been suggested that the emphasis on TAA S is forcing some of the best teachers to leave teaching because of the restraints the tests place on decision making and the pressures placed on them an d their students. A total of 85% of respondents agreed with this stat ement. Written comments "revealed the depth of f eeling and passion on the part of teachers with respect to trends in TAAS testing:" I am very sad that education has stooped to the low level of measuring performance with standardized testing and Texas has taken it even lower with their TAAS. We know what works in education. W e just seem to ignore the research and keep on banging our heads a gainst the "TAAS wall" and "retention walls."Please support teachers more than ever. Our childre n are hurting more than ever. If there was ever a time to change it is now. Give teachers back their classrooms. Let them teach and spend quality time w ith their students. They need us!I think TAAS is the biggest joke in Texas. I have n ever seen such an injustice.I believe that TAAS interferes with the very nature of our job. The pressure from administrators to increase campus scores leave s teachers little time for real instruction...."My heart breaks to see so many teachers "just survi ving." I believe that our solution is just to support each other because the public has no real concept of the situation.TAAS is ruining education in Texas! Help! 6.4 Similarities and Differences in Survey Results The surveys summarized above were underta ken independently and polled somewhat different samples of Texas educators. Gord on and Reese surveyed Texas teachers who were "graduate students in educational administration" (Gordon & Reese 1997, p. 349). Though the authors do not explain ex actly when this survey was conducted, it was presumably around 1996. The surve y by Hoffman et al. and the one undertaken by me were both performed during 1998-99 though of somewhat different populations. Hoffman et al. surveyed reading specia lists statewide, while I surveyed secondary math and English/language arts teachers. Despite these differences, results of the three independent surveys of Texas educators ha ve four broad findings in common. Texas schools are devoting a huge amount of time an d energy preparing
9 of 14students specifically for TAAS As mentioned, in the Gordon & Reese survey, respondents reported a huge amount of school time w as devoted to coaching students for TAAS, with TAAS preparation becoming "all-consuming during a period of four to eight weeks before the testing (p. 355). In the Tes ting and Teaching survey, 75% of respondents said that they begin test preparation m ore than one month before the mandated test (TAAS). And in the Hoffman et al. sur vey, when asked whether the rise in TAAS scores reflected "increased learning and highe r quality teaching," nearly twice as many respondents answered "no" (50%), as answered yes" (27%). In their written comments to this question many teachers explained t hat they felt test preparation was what accounted for the rising scores: I feel that it reflects that we are doing a better job teaching for the test. We are being forced to teach the test. (Case 11).Students are being trained earlier on how to take t he TAAS test. In 5-10 more years a different format will be provided & lo w scores will be the reason to teach to that test too. (Case 17).I think students know how to take the test because we practice ad nauseum. (Case 20). TAAS is a poor measure of actual student performanc e. Increases are due to becoming accustomed to the test. (Case 38).Teachers are teaching to TAAS period. Curriculum is directed by TAAS even in K. TAAS doesn"t address all areas if it d id colleges would have better results than ever before instead of remedial classes! (Case 46). The scores reflect an increase in time spent on one test instead of teaching students the regular curr.[iculum]. (Case 49). Teachers are spending the school day teaching to th e test. (Case 84). Higher quality teaching is not exhibited on a daily basis. However TAAS test taking skills occur everyday (Case 95). We've been teaching to the TAAS so long, our studen ts are used to it. (Case 102). The rising scores may be a result of better test-ta king skills rather than knowledge. (Case 131). No, School districts have figured out how to teach to the TAAS and to exclude students from being accounted. (Case 136).I believe the scores reflect that students are lear ning test-taking strategies. (Case147). Kids are just being programmed on how to take and p ass the TAAS test, not truly mastering skills. (Case 151). Teachers are learning how to teach the TAAS (Case 1 69).
10 of 14I believe that students are simply being taught to take the test, not learn and apply the knowledge. (Case 199). One comment from the Hoffman et al. surve y described the emphasis on test preparation this way: Our campus has 2 practice TAAS (annually) (Nov. & F eb) plus the "real" taas. Our wkly lesson plans contain TAAS warm-ups, TAAS lesson objectives, and 20 min of reading. I personally am sick of TAAS by April & May. My Teacher evaluation last yr was down because my stud ent scores were down by 7 pts. I personally have 6 friends who quit teaching altogether because of TAAS. (Case No. 94) Even some of the teachers who answered "Y es" in the Hoffman survey, that the rise in TAAS scores did reflect "increased learning and higher quality teaching," qualified their answers considerably in their writt en comments: Students are learning more of the basic skills TAAS tests because teachers are figuring out better ways to teach them. Student s are NOT receiving a well-rounded education because Social Studies & Sci ence are being cut to teach TAAS skills. (Case 106).Yes, there is increased learning but at a partial p rice. I have seen more students who can pass the TAAS but cannot apply tho se skills to anything if it's not in TAAS format. I have students who can do the test but can't look up words in a dictionary and understand the differe nt meanings. They can write a story but have trouble following directions for other types of learning. As for higher quality teaching, I'm not s ure that I would call it that. Because of the pressure for passing scores, more an d more time is spent practicing the test and putting everything in TAAS format. (Case 184). A handful of respondents suggested that t hat the rise in TAAS scores was due not to test preparation or increased learning, but to t he TAAS tests getting easier over time, to schools excluding low scoring students, or to ad ministrators' cheating: TAAS scores have seemed to rise in election years. The tests seemed easier in those years. (Case 121).It seems as though the questions are actually easier. (Case 127). No, School districts have figured out how to teach to the TAAS and to exclude students from being accounted. (Case 136)I think the tests are easier to make the legislator s look better. (Case 155). The test seems to have gotten easier. (Case 159)There are a lot of teachers and administrators who know how to "cheat" and get higher scores by kids. They don't want their sc hool to score bad, so they cheat. (Case 160).I also think there are admin. who are cheating ex. Austin schools. (Case
11 of 14193). Emphasis on TAAS is hurting more than helping teach ing and learning in Texas schools. As mentioned, the results of the Hoffman et al. su rvey showed that a clear plurality of respondents (50%) reported that TAAS score gains were not due to "increased learning and higher quality teaching." N o directly analogous questions were asked in the Testing and Teaching or Gordon & Reese surveys, but some of the findings from these surveys confirm Texas teachers' generall y negative views about the educational impact of TAAS. Recall that in the Test ing and Teaching survey, it was found that more teachers disagreed (45%) than agree d (32%) with the statement that "Mandated testing helps students achieve the object ives of the curriculum." Also, more teachers disagreed (39%) than agreed (29%) with the statement that "Mandated testing contributes to the realization of the goals of the current educational reform movement." Recall also that Gordon & Reese concluded that "dri ll and kill" coaching and preparation for TAAS were taking a "toll on teachers and studen ts alike"Â—especially "lower achieving students" whose "academic progress is bei ng hindered by the negative effects of failing a test that many teachers insist does no t measure what their students need to learn at their current stages of development, does not measure the progress their students have made, and is culturally biased." (pp. 364-65). As in the Hoffman et al. survey, written responses to our Testing and Teaching survey help to convey something of teachers' depth of feeling and passion about TAAS: Texas has the "Texas Assessment of Academic Skills" test. Most schools have established a class strictly for the TAAS test Our curriculum is based on previous TAAS test questions. We "teach the TAAS in our classes. Our administrators have even gone as far as incorporati ng TAAS objectives and materials into daily instruction in ALL subject are as. We are not covering skills for higher level thinking at times because o f state mandated tests. (Case 13).Testing is now more important than teaching. Studen ts learn much about testing, little about subject. (Case 20).We are testing our students to death! My students h ave been taken out of class four times this year for standardized testing Too MUCH! (and for what?) (Case 29).There are too many loopholes. Students who were nev er on an IEP or in CM are being forced into it so that they will be exemp t from standardized tests. (Case 42).Mandated state TAAS Testing is driving out the best teachers who refuse to resort to teaching to a low level test! (Case 67).TAAS has become the Be All and End All. It is ridic ulous to put so much on one test, where even good students have been kno wn to guess and not even read the question. I have seen them. Our schoo l can be at risk because one student chooses to mess up. One year we were on probation for 1 student over the limit. (Case 87, emphasis in origi nal). Mandated testing has severely damaged the mathemati cs curriculum! (Case
12 of 1493).Teaching to the TAAS results in a level of educatio n which is substandard. I strongly feel TAAS should be abolished. (Case 104).Mandatory tests are hard on both teachers and stude nts. Our state set the End of Course test one week before semester finals. The stress level for all of us is high. The end of school in itself is diffi cult. Why do we compound the situation by adding another useless test. Our s tate is also taking the EOCourse test out of the schedule. They are replacing the E OC with another TAAS test. At least the EOC covered current material. Now extra work is added because the TAAS covers different are as than Algebra essential elements. (Case 123).We are so concerned about the TAAS & End of Course exam that we are teaching the test, but the kids are not learning th e material. I can teach the test, and have a very high percentage pass, yet hav e kids that know no Algebra. Going to three years TAAS testing in the f uture will reduce education to completely teaching the test, and we w ill graduate an illiterate generation. (Case 130). I really feel that we are definitely getting away f rom teaching the basic concepts to teaching the test and this is very sad because the farther the student goes in mathematics the less he or she know s of the why's. (Case 133).It stiffles professional growth and academic growth as well. Too much emphasis on testing. (Case 147). Emphasis on TAAS is particularly harmful to at-risk students A third finding common across the three surveys is that the focus o n TAAS in Texas is especially harmful to particular kinds of students. This findi ng is interesting because none of the surveys asked directly about this issue. Nonetheles s, the matter arose in all three inquiries. Recall Gordon & Reese's concluding comme nt that in the common opinion of their interviewees "their at-risk students' academi c progress is being hindered by the negative effects of failing a test that many teache rs insist does not measure what their students need to learn at their current stages of d evelopment, does not measure the progress their students have made, and is culturall y biased" (Gordon & Reese, 1997, pp. 364-65). Also, spontaneous comments in the Testing and Teaching and Hoffman et al. surveys raised similar concerns. From the former: I personally wonder about the fairness of these tes ts. Children from lower SES tend not to do as well. Therefore, it tends to be discriminatory I think. I think some children do not have the cultural experi ences that help them answer the questions accurately. (Case 84).The TAAS test is driving the curriculum and not tea ching students how to think. It also punishes ESL studentsthey can compl ete four years of high school with adequate grades but not be allowed to w alk at graduation because they do not have enough command of English to pass the TAAS Exit. (Case 66).
13 of 14 In teachers' written comments in the Hoff man et al. survey, several teachers mentioned the problems created for special educatio n students by emphasis on TAAS. Here is one extended example: Special education assessments. . and diagnostic e valuations are NOT aligned with TAAS objectives. Therefore children ar e sometimes not qualified for spe. ed. services who have low IQs an d yet are expected to pass TAAS to graduate. I.E.P. goals for reading and math (other than "mainstream" IEP's) are not compatible with TAAS in our district. Reading goals are not detailed enough in comprehension, mat h is not grade-level appropriate. IEP's tend to emphasize discrete skill s, such as computation while TAAS emphasizes application and problem solvi ng. Texas criteria for diagnosis of L.D. do not take into consideration TA AS standards. Teachers, under pressure to have good scores, over-refer stud ents for spe. ed. testingÂ—sometimes 1/3-1/2 of their classes! Most ad ministrators ( NOT mine) pressure ARD committees to exempt all student s in spe. ed. from taking TAAS. Appropriate alternative assessments ar e not available. TAAS does not take into account LEP students, or student s in special education, who are being "included" in higher numbers. (Case N o. 89) (Note 17) Emphasis on TAAS contributes to retention in grade and dropping out of school Finally, all three surveys provide support for the proposition that emphasis on TAAS contributes to both retention in grade and students dropping out of school. One question in the Hoffman et al. survey asked respondents: Are there efforts to exclude/exempt students from t esting who might not do well on the test and thereby negatively affect a sc hool's rating? Overall, 67% of respondents answered "oft en" or "sometimes" in response to this question. Obviously, there are ways of excluding st udents other than by retention in grade and encouraging drop outs (such as special ed ucation classification). But recall that one out of 20 interviewees in the Gordon and R eese survey said directly that "the stress of TAAS contributes to the dropout rate." A majority of respondents in the Testing and Teaching survey rated "to promote/graduate stud ents" as a common use of mandated tests in Texas. Additionally, many written comments in the Hoffman et al. surveys expressed dissatisfaction with the practice of reta ining students in grade based on TAAS scores, irrespective of other evidence about studen t learning. In concluding this discussion of the resu lts of three surveys of Texas educators regarding TAAS, it is only fair to add one major ca veat. Despite the preponderance of negative comments about the effects of TAAS on educ ation in Texas, there were some comments suggesting that the role of TAAS is at lea st somewhat beneficial: It seems to wor kout fairly well for most of us wit h TAAS, however, the end of course tests are not that useful. (Case 5).I believe there is a purpose for these tests. If no thing else, it gives teachers goals for their students. But I do not believe my t eaching competence should be based on those scores solely (Case 34, emphasis in original). In light of this contrast, with most teac hers reporting the effects of TAAS to be harmful, but with a minority reporting positive eff ects, it is useful to draw back, to try to
14 of 14gain a broader picture of the status of education i n Texas. It is to such a perspective that we turn in Part 7. (Note 18)
1 of 29 Education Policy Analysis ArchivesVolume 8 Number 41The Texas Miracle in Education Walt Haney 7. Other Evidence on Education in Texas Beyond the views of teachers, what other evidence is available that might provide a picture of the status and progress of education in Texas? In Part 7, we review four kinds of evidence. First, we compare sources of evidence on high school completion in Texas with the data previously presented in Part 5 above. Next we compare data on retention in grade for states which have reported such data. In Section 7.3 we review evidence available from SAT college admissions testing over the last 30 years. Then, in Section 7.4 we return to take a closer look at NAEP dataÂ—so me of which, as we saw in Part 2 above, has previously been cited as evidence of the Texas "miracle" in education. Finally, we comment briefly on several other sources of evid ence about education in Texas. 7.1 Dropout Data on Texas Revisited As mentioned previously, when I first sta rted studying education in Texas approximately two years ago, a major discrepancy qu ickly contributed to my suspicions about the validity of the TEA reported data on drop out rates in Texas (some of which was reproduced in Table 3.3 above). The TEA data showin g declining dropout rates in Texas were contradicted by two independent sources of evi dence: a series of attrition studies reported by the Intercultural Development Research Association (IDRA), and reports on dropouts in the United States from the National Cen ter for Education Statistics (NCES). The IDRA and NCES sources did not, however, contain estimates of dropout rates for Texas as far back as I needed to examine the appare nt effects of high school graduation testing on grade enrollments and high school gradua tion. Consequently, I sought to analyze data on Texas high school graduates and enr ollments by grade going back to the mid-1970s. Nonetheless, having done so, it is now h elpful to recount the IDRA and NCES reports' findings and to compare them with res ults previously presented. Before reviewing and comparing these sources, let me revie w TEA-reported dropout data in more detail than was done in Part 3 above. TEA Dropout Data In the Fall of 1999, the Texas Education Agency (T EA) released a report titled 1997-98 Report on Texas Public School Dropouts (The report was originally issued in September 1999, and in a revis ed edition in December.) The highlights of the report were as follows:
2 of 29How many students drop out? In 1997-98, a total of 27,550 students in Grades 712 dropped out of Texas public schools. Statewide, the annual dropout rate was 1.6 percent, unchanged from 1996-97. The 1997-98 actual longitudinal dropout rate, calcu lated for a cohort of students tracked from 7th to 12th grade, was 14. 7 percent. Who drops out and why? About 77 percent of dropouts were overage for grade down from over 80 percent in 1996-97. On average, males continued to drop out at a slight ly higher rate than females. Hispanic students had the highest average dropout r ate, at 2.3 percent, followed by African American students (2.1%). Reasons cited for dropping out of school included p oor attendance, entering non-state-approved General Educational Dev elopment (GED) programs, and pursuing a job. Are they leaving certain districts? School districts with the largest enrollments (50,0 00 or more students) had the highest average dropout rate, at 2.1 percen t. Generally, districts with lower student passing rat es on the Texas Assessment of Academic Skills (TAAS) had higher dro pout rates. How do we compare nationally? Based on the Current Population Survey, an estimate d 4.6 percent of students in Grades 10-12 dropped out of school acro ss the nation. Texas had one of the lower dropout rates out of 32 states that met required Common Core of Data collection standards f or school year 1996-97. (TEA, 1999, 1997-98 Report on Texas Public School Dropouts, p. iii) Table 8 of the TEA report presented data on "histor ical dropout rates by ethnicity." Figure 7.1 presents a graph of these data.
3 of 29 Source:1997-98 Report on Texas Public School Dropou ts Texas Education Agency. Austin, Texas, September 1999 (Revised December 199 9), p. 15 (p. 22 of pdf version) These data obviously indicate that the an nual dropout rate in Texas (that is the numbers of dropouts reported in grades 7-12 divided by the grade 7-12 enrollment) has fallen dramatically in the last decade. I refrain f rom commenting further on these results until after summarizing other evidence on dropouts in Texas. IDRA Attrition Studies In the mid-1980s, under a contract with the Texas Department of Community Affairs, the Intercultural Development Research Association (IDRA), undertook a series of studies, one aim of w hich was to estimate "the magnitude of the dropout problem in the State of Texas" (IDRA 1986, p. i). After describing the paucity of previous reliable research on dropouts i n Texas, the IDRA researchers developed an index of attrition to estimate dropout rates not just statewide, but also at the level of school districts in Texas. The index developed and used by IDRA consists of ta king grade level enrollments for a base year and comparing them to e nrollments in subsequent years. Since school and district enrollm ents are not constant, with changes in size due to increasing or declining enrollments, it is necessary to take the growth trend into account in computing attrition rate. The size change ratio was calculated by dividing th e total district enrollment for the longitudinal study end year, by the total d istrict enrollment for the base study year. Multiplying the base year enrollme nt by the district change ratio produces an estimate of the number of student s expected to be enrolled at the end year. (IDRA, 1986, p. 9). In short, the IDRA attrition index method for estimating dropouts is very similar to the way in which I calculated progress from grad e 9 to high school graduation (as reported in Section 5.1 above). The IDRA method dif fers, however, in two respects from the one used in calculating results presented in Se ction 5.1. First, instead of simply assuming that the numbers of students in grade nine in a particular year (say 1990-91) in a particular school system represents a reasonable estimate of the numbers expected to graduate three years later (in 1993-94), the IDRA a pproach adjusts this estimate to take into account the overall growth or decline in enrol lments in the system over the time period studied (thus, for example, if overall grade 9-12 enrollment increased 25%
4 of 29 between 1990-91 and 1993-94), the IDRA approach ass umes that the number enrolled in grade 12 in 1993-94 would be 25% greater than the 1 990-91 grade 9 enrollments). Second, the IDRA approach focuses on grade enrollme nts and has not been applied, at least insofar as I am aware, to the question of how many students actually graduate from Texas high schools at the end of grade 12. The IDRA has regularly updated its attrit ion calculations since its original study in 1986. Table 7.1 presents the organization's most re cent results, showing percent attrition from grades 9 to 12, from 1985-86 to 1998-99 (note that data for 1990-91 are missing).Table 7.1 IDRA Reported Attrition Rates, Grades 9-12 (% Attri tion)Race/Ethnic Group '85-86'86-87'87-88'88-89'89-90'91-92'92-93'94-95'95 -96'96-97'97-98'98-99 Black34%38%39%37%38%39%43%50%51%51%49%48%White272624201922253031323131Hispanic454649484848495153545353Total333433313134364042434242Source: IDRA website, www.idra.org/, accessed 5/8/0 0 (data for 1990-91 missing) Comparison of the TEA and IDRA data revea ls two broad findings. First, for the academic year 1988-89, their estimates of dropouts are somewhat comparable. For that year, the IDRA reported attrition rates of 37%, 20% and 48% for Black, White and Hispanic students respectively. And if we multiply the TEA-reported annual dropout rates for grades 7-12 by six to approximate a longi tudinal dropout rate across this grade span, we get 45.1%, 27.3% and 48.6% for Black, Whit e and Hispanic students respectively. These estimates are not terribly clos e, but at least they are in the same ballpark. And the differences are in the directions one would expect. The TEA reported data yield slightly higher percentages since they c over grades 7-12, while the IDRA attrition percentages cover just grades 9-12. Second, after 1988-89, the IDRA and TEA r esults diverge dramatically. The IDRA data show attrition increasing between 1988-89 and 1998-99, while the TEA data show dropouts to be decreasing sharply over the sam e period. The divergence is so dramatic as to make one wonder whether the two orga nizations are referring to the same stateÂ—or even living on the same planet. The IDRA r esults show increases in attrition such that by 1997-98, 49% of Black, 31% of White an d 53% of Hispanic students dropped out between grades 9 and 12. In contrast, t he TEA reported data suggested longitudinal dropout rates for grade 7-12 of 12.6%, 5.4% and 13.8% for Black, White and Hispanic students respectively. In other words, the IDRA results indicate that the dropout problem in Texas in the late 1990s was four to six times worse than the TEA was reporting. Whose estimates are to be trusted;those o f the IDRA or of the TEA? Before giving my answer to this question, let me summarize result s of one more organization, this one from outside Texas. NCES Dropout Studies Over the last decade the National Center for Educa tion Statistics (NCES) has issued a series of reports on dropouts in the United States. The eleventh report in the series presents data on high school dropout and completion rates in
5 of 29 1998, and includes time series data on high school dropout and completion rates for the period 1972 through 1998. The high school completio n rates are based on results of the Census Bureau's Current Population Surveys (CPS) of random U.S. households conducted in October of each year. The CPS surveys have not been designed with the specific intent of deriving state level high school completion rates and so in order to help derive reliable estimates, the NCES analysts who pr epared the dropout reports have calculated averages across three years of CPS surve ys. Also, it should be explained that the CPS data are based on self-reports of high scho ol completion whether it be via normal high school completion or via alternative hi gh school completion such as the GED testing. (Note 19) Table 7.2 reproduces a table from the lat est NCES dropout report, showing high school completion rates of 18 through 24 year olds, not currently enrolled in high school or below, by state: October 1990-92, 1993-95 and 19 96-98. As can be seen for all three time periods, these data show Texas to have among t he lowest rates of high school completion among the 50 states. In each time period the median high school completion rate across the states was about 88%, while the com pletion rate for Texas was about 80%. This pattern indicates that the median noncomp letion rate across the states is about 12% while that of Texas is about 20% (about 66% wor se than the median of the other states).Table 7.2 High School Completion Rates of 18 Through 24 Yearolds, Not Currently Enrolled in High School or Below, by State: October 1990-92, 1993-95 and 1996-98 1990-921993-951996-98Total National 85.5%85.8%85.6% Alabama83.983.684.2Alaska86.990.588.3Arizona81.783.877.1Arkansas87.588.384.5California77.378.781.2Colorado88.188.485.5Connecticut89.994.791.6D.C.86.293.088.5Delaware84.087.784.9Florida84.180.683.6Georgia85.180.384.8Hawaii93.592.092.3Idaho84.786.185.8Illinois96.086.586.6
6 of 29 Indiana87.888.589.3Iowa94.693.288.0Kansas93.290.991.2Kentucky81.182.485.2Louisiana83.980.181.6Maine91.992.991.6Maryland88.693.694.5Massachusetts89.892.590.6Michigan87.288.691.0Minnesota92.593.190.0Mississippi85.493.982.0Missouri88.190.490.4Montana91.689.691.1Nebraska92.594.191.2Nevada82.181.978.2New Hampshire87.986.989.2New Jersey90.891.691.8New Mexico84.182.378.6New York88.087.084.7No. Carolina83.085.585.2North Dakota96.396.494.7Ohio90.088.389.4Oklahoma84.386.786.0Oregon89.682.675.4Pennsylvania90.289.487.6Rhode Island87.989.486.1So. Carolina85.087.887.6South Dakota89.191.389.8Tennessee75.784.586.9Texas80.079.580.2Utah93.993.490.7Vermont87.088.193.6Virginia88.687.585.9Washington90.785.787.7West Virginia83.386.889.1
7 of 29 Wisconsin92.493.590.8Wyoming92.090.887.6Min75.778.775.4Max96.396.494.7Mean87.688.187.1Median87.988.387.6 Source: Kaufman, P., Kwon, J., Klein, S. and Chapma n, C. (1999). Dropout rates in the United States: 1998. (NCES 2000-022). Wash., D.C.: National Center for Education Statistics, p. 20.Comparing evidence on dropouts in Texas We have now described and summarized five different sources of evidence on dropout rates in Texas: 1) dropout data reported by the TEA; 2) IDRA attrition analysis results; 3) the most recent NCES report on high school completion, based on CPS surveys; 4) cohort progression analyses from grade 9 to high school graduation and from 6 to high school graduation discussed in Part 5 above; and 5) estimated dropouts for 1996-97 based on 1995-96 grade enrollments and 1996-97 retention rates (reported in Section 5.5 ab ove). How can we make sense of these vastly different estimates of the extent of t he dropout problem in Texas, with dropout rate estimates for the late 1990s ranging f rom a low of 14.7% reported by the TEA as the "1997-98 actual longitudinal dropout rat e" for grades 7 through 12, to a high of the 42% attrition rate reported by IRDA, also fo r 1997-98, but only for grades 9 through 12? First, it seems clear that the TEA-report ed dropout rates can be largely discounted, as inaccurate and misleading. A November 1999 repor t from the Texas House Research Organization, The Dropout Data Debate recounts that "In 1996, the State Auditor's Office estimated that the 1994 dropout numbers repo rted by the Texas Education Agency (TEA) likely covered only half of the actual number of dropouts" (p. 1). The report goes on to recount numerous problems in TEA' s approach to calculating dropout rates including changing rules over time in how to define dropouts, relying on district reports of dropouts, while at the same time, beginn ing in 1992-93 using dropout rate as a key factor in TEA's accountability ratings of distr icts, and apparent fraud in district reporting. The TEA has developed a system for class ifying school leavers in dozens of different ways and many types of "leavers" are not counted as dropouts. Indeed, in 1994, the TEA started classifying students who "met all g raduation requirements but failed to pass TAAS" as non-dropout "leavers." Second, based on a comparison of the coho rt progression analyses from grade 9 to high school graduation with those from 6 to high sc hool graduation, it seems clear that the IRDA attrition analyses may represent somewhat inflated estimates of the extent of dropouts because of the increased rate of retention of students in grade 9 (see Figure 5.3). Still the IDRA approach does have one virtue as compared with cohort progression analyses; namely, it attempts to adjust for net imm igration of students into Texas schools. I return to this point later. But first le t us compare the other three sources of evidence. The estimates of dropouts for 1996-97 bas ed on 1995-96 grade enrollments and 1996-97 retention rates indicated that about 68,000 high school students dropped out of school between 1995-96 and 1996-97. Adding the miss ing students across the three grades to estimate longitudinal dropout rates sugge sts overall dropout rates of 27%
8 of 29across grades 10-12 (22.5% for White, 33.7% for Bla ck and 38.5% for Hispanic students). These estimates correspond relatively we ll with the grade 6 to high school graduation cohort analyses (results of which were g raphed in Figures 5.5 and 5.6). These results showed that of grade 6 students in the coho rt class of 1997, 75.8% of White students and 61.1% of Black and Hispanic students g raduated in 1997, implying that 24.2% of White and 38.9% of minority students did n ot graduate and may have dropped out. Overall for the cohort class of 1997, 31% of t he students in grade 6 in 1990-91 did not graduate in 1997. (The 27% figure just cited is slightly lower, presumably because it does note take into account students who drop out b etween fall of grade 12 and high school graduation the following spring). Can these results be reconciled with the most recent NCES report on high school completion, based on CPS surveys? Recall that this report indicated that for 18 through 24 year-olds in Texas (not currently enrolled in hi gh school or below) surveyed in October 1996-98, 80.2% reported completing high sch ool. This implies a non-completion or dropout rate of 19.8%. (The CPS s urvey samples on which this estimate is based are not large enough to derive se parate estimates by ethnic group.) It should be noted first that the CPS surveys of 18-24 year-olds in 1996-1998, do not correspond very precisely with the cohort of studen ts in the Texas class of 1997. Nonetheless, two other factors may explain why the CPS derived non-completion (or dropout) estimate of 19.8% is so much lower than 31 % estimate derived above for the class of 1997. One possibility suggested by a previous N ational Research Council report is that the CPS household surveys tend to under-represent m inority youth generally and to underestimate high school dropout rates specificall y. In discussing evidence on educational attainment of Black youth, Jaynes and W illiams (1989, p. 338) comment that "after age 16, there are very serious, and per haps growing, problems of surveying the black population, especially black men," and go on to discount a dropout estimate from CPS data from the 1980s for Blacks as simply n ot credible. If the CPS surveys do in fact under-represent minority youth, this would deflate the overall dropout estimates for Texas derived from this source, since all indic ations (even those from the TEA) are that dropout rates in Texas are higher for Black an d Hispanic youth than for White youth. (Note 20) The other possibility, alluded to previou sly, is that the CPS surveys are based on self-reports of high school completion whether it b e via normal high school completion or via alternative high school completion such as t he GED testing. To explore this possibility, I consulted annual Statistical Reports from the GED Testing Service (1990-1998). Before presenting results from this so urce, it may be useful to explain the GED Testing program briefly. The Tests of General Educational Developm ent were developed during World War II to provide adults who did not complete high school with an opportunity to earn a high school equivalency diploma. There are five GED tests: Writing Skills, Social Studies, Science, Interpreting Literature and the A rts, and Mathematics. States and other jurisdictions that contract to use the GED tests es tablish their own minimum scores for award of the high school equivalency diploma, with the condition that state minimum requirements cannot be below a floor approved by th e Commission on Educational Credit and Credentials (an agency of the American C ouncil on Education). For most of the past 10 years, the approved minimum was that ex aminees had to attain standard scores of at least 40 on each of the five GED tests or an average standard score of at least 45. "In the United States, this minimum stand ard of 'Minimum 40 or Mean 45' was met by an estimated 75% of the 1987 high school nor m group." (GED Testing Service,
9 of 291995 GED 1994 Statistical Report p. 31). In the early 1990s, four states were usin g this Commission-approved minimum passing standard on the GED tests for award of the high school equivalency degree: Louisiana, Mississi ppi, Nebraska, and Texas. An additional 27 states were using a similarly low "Mi nimum 35 and Mean 45" standard. The GED has been widely used in Texas; and in 1996, Texas became the first state in the nation to issue more than 1,000,000 GED credent ials since 1971, when the GED started tracking this statistic" (GED Testing Servi ce, 1997 GED 1996 Statistical Report p. 27). About this time, in keeping with the nati onal movement to raise educational standards, the GED Testing Service decided to raise the minimum passing score on the GED: In concert with the secondary schools movement to r aise standards, in January 1997 the GED Testing Service raised the min imum score required for passing the tests. The new standard is one that only 67 percent of graduating seniors can meet. (GED Testing Service, 1998 GED 1997 Statistical Report p. ii). (Note 21) (Source: GED Testing Service, 1990-1999, Statistical Reports, 1989, 1990, 1991, 1 992, 1994, 1996, 1997, 1998. Washington, D.C.: American Council on Educati on.) Given this background, let us now examine the evidence on GED taking in Texas. Figure 7.2 shows the numbers of people taking and p assing the GED (complete battery) from 1989 to 1998. As can be seen, the numbers taki ng the GED in Texas increased steadily between 1989 and 1996, from about 47,000 t o 74,000, a increase of 57% (during the same interval the increase in GED takin g nationally was about 26%). GED statistics also make it clear that during this inte rval the Texas GED-taking population was younger than the national GED-taking population Over this interval from 25% to 30% of GED takers in Texas were reported to be age 18 or less. (Note 22) The sharp upturn in GED taking in Texas b etween 1995 and 1996 (from 74,000 to
10 of 2987,000, a 17.5% increase) seems readily explained b y anticipation of the increase in the GED passing score as of January 1, 1997 (nationally there was a 5% increase in GED test taking between 1995 and 1996). As the GED Test ing Service GED 1997 Statistical Report explains "The five percent increase in 1996 is mos t likely attributed to adults attempting to complete the battery before implement ation of the 1997 standard" (GED Testing Service, 1998, p. iii). As a result of the new GED Testing Servic e minimum passing standard for 1997, 36 jurisdictions were required to raise their passi ng standard in 1997. Texas was one of them. Surely not coincidentally, the number of peop le taking the GED in Texas in 1997 dropped from 87,000 to 61,000Â—an almost 30% decreas e. Nationally there was a 5% decrease in GED-taking between 1996 and 1997. Among the 36 jurisdictions required to in crease their passing scores on the GED between 1996 and 1997, "the passing rate decreased by 3.8 percent from 1996 (71.8 percent) to 1997 (68 percent)" (GED Testing Service 1999, p. 6). In Texas, the GED passing rate fell from 75.2% to 64.2%. This 11% dec rease in the passing rate was almost triple the average decrease among the 36 jurisdicti ons that were required to increase the GED passing scores in 1997. (Note 23) These developments regarding the GED in T exas suggest a clear explanation for why the percentages of the cohort classes of 1997, 1998 and 1999, began to show slight increases in the percentages of students progressin g from grade 6 to high school graduation (for minorities from 60% to 65% and for Whites from 75% to 77%, see Figures 5.6 and 5.7). After the requirements for pa ssing the GED in Texas were stiffened in 1997, and the GED pass rate fell sharply, it app ears likely that more students in Texas decided to persist in school to graduation instead of seeking the alternative certification via the more difficult GED standard required by the GEDTS as of January 1, 1997. (Note 24) Now we can return to the question that pr ompted my study of GED data. Can GED credentialling in Texas explain why the CPS der ived non-completion (or dropout) estimate of 19.8% is so much lower than the 31% non -graduation rate derived from analyses of progress of the cohort class of 1997 fr om grade 6 to high school graduation? Before addressing this question let me note that ne ither GED Testing Service data, nor CPS-reported high school completion data are availa ble at the state level disaggregated by ethnicity, so we will have to address this issue across the three major ethnic groups in Texas, namely, White, Black and Hispanic. In 1990-9 1, according to TEA statistics there were are total of 256,000 White, Black and Hispanic students enrolled in grade 6 in Texas. Eleven per cent (i.e., the difference betwee n the 20% non-completion rate indicated by CPS results and the 31% non-graduation rate derived from the cohort analyses) equals about 28,000. This numberÂ—28,000Â—a ppears strikingly smaller than the numbers of people who were taking and passing t he GED in Texas in 1996 and 1997 (see Figure 7.2). But it must be recalled that thou gh the Texas population of GED takers is younger than the national population of GED take rs, only about 35% of GED test takers in 1997 were age 18 or less. If we assume th at 35% of the 40,000 GED test-takers in Texas who passed in 1997 might have been members of the cohort class of 1997 (surely a liberal estimate) we get 14,000. This sug gests that while GED-taking may account for a substantial portion of the difference between estimates of non-completion of high school based on our cohort analyses (31%) a nd from CPS-derived estimates (20%), it may not account for all of the difference Before summarizing conclusions from this discussion of different sources of evidence on dropout rates in Texas, let me mention briefly two other sources of evidence, and explain why the TEA's exclusion of GE D aspirants from its definition of
11 of 29dropouts is misleading. The first additional source of evidence is from the Annie E. Casey Foundation and in particular, the Casey Found ation's 2000 KIDS Count on-line data base. I was alerted to this source by Hauser ( 1997), who, while pointing out many limitations of CPS data for estimating dropout rate s, also mentions that KIDS Count project as using CPS data in an unusual way to try to obtain relatively current evidence on dropouts across the states. Specifically, this p roject has compiled from CPS data three-year rolling average estimates from 1985 to 1 997 of the percentage of teens ages 16-19 who are dropouts and the percentage of teens not attending school and not working. Since the 2000 KIDS Count results are read ily available on-line in table, graph and down loadable database form (www.aecf.org/kidsc ount/kc1999/), I do not discuss them in detail here. Suffice it to say that: 1) acc ording to both indicators of youth welfare, between 1985 and 1997, Texas had one of th e poorer records among the states, consistently showing more than 10% of teens ages 16 -19 as dropouts and more than 10% of teens not attending school and not working; and 2) if one examines the standing of Texas on these two indicators relative to those of other states, conditions in Texas seemed to have worsened in the early 1990s after im plementation of TAAS. Second, in a remarkable research effort f or MALDEF in the TAAS case, Mark Fassold assembled longitudinal data sets on the Tex as sophomore cohorts of 1994 and 1995 (the classes of 1996 and 1997). Using these da ta sets, Fassold (1999) was able to calculate the cumulative rates of passing the TAAS exit test for up to ten administrations of the test for which students were eligible before their scheduled graduation. He found that the cumulative pass rates for the classes of 1 996 and 1997 were 85.2% and 87.1% for White students, 62.3% and 66.1% for Blacks and 65.9% and 69.4% for Hispanics. These results indicate that the White non-graduatio n rate was in the range of 13-15%, for Blacks 34-38% and Hispanics 30-34%. Fassold's resul ts correspond reasonably well with the cohort progression analyses presented in P art 5 aboveÂ—especially when two factors are noted. First, Fassold's analysis exclud ed students classified as special education students. As we saw in part 5.6 above, so me 5 to 7% of students taking the TAAS exit test in recent years have been have been classified as special education. Second it is important to note that Fassold's analy sis began with grade 10 enrollments, but we have seen that the largest numbers of studen ts drop out between grade 9 and 10. Before leaving this brief summary of Fassold's anal yses, it is worth mentioning that despite criticisms by Texas state attorneys, Judge Prado found Fassold's analyses credible and if anything "likely over-estimated the minority pass rate" (Prado, 2000, p. 16). As mentioned, TEA's reports on dropouts c an be largely discounted, as inaccurate and misleading. But one aspect of the TEA approach to defining dropouts deserves commentary. According to the TEA approach to defini ng dropouts, a student who leaves school to pursue a GED high school equivalency degr ee in a state approved program is counted as a school "leaver," but not as a dropout. This approach is potentially misleading for a number of reasons. Here I will exp lain two. First, the common meaning of the term "dropout" is a student who leaves schoo l without graduating from high school. In this sense, students who leave high scho ol without graduating, whether or not they pursue a GED high school equivalency degree, are dropouts. At the same time, there is support for Texas's practice of not counti ng students enrolled in secondary school programs aimed at preparing for the GED as d ropouts in the NCES Common Core of Data definitions (see Winglee et al., 2000, for a recent discussion of the problem of defining dropouts). Nonetheless, recent research suggests tha t despite the term "high school equivalency degree," obtaining such certification i s not equivalent to normal high school
12 of 29graduation and moreover, relatively lax standards f or GED certification, as in Texas, can encourage students to drop out of high school befor e graduation. As Chaplin (1999, p. 2) recounts, "Recent evidence . suggests that drop ping out to get a GED would be a very costly decision (Cameron and Heckman, 1993; Murnane Willett, and Tyler, 1998)." He goes on to conclude that "the most reliable evidenc e generally suggests that obtaining a GED instead of a regular high school degree is like ly to result in substantially lower earnings later in life." (Chaplin, 1999, p. 6). (No te 25) Indeed, the earning power of GED recipients appears to be more similar to that o f dropouts than to high school graduates. Moreover, Chaplin explains: GED policies which make it easier to get a GED are designed primarily to help high school dropouts. By doing so, however, th ey may have the perverse effect of encouraging additional students to drop out. This is because by making it easier to get a GED the polici es may increase the expected earnings of high school dropouts and, ther efore, increase dropout rates. . In general less strict GED policies probably increase dropout rates. (Chaplin, 1999, p. 6). Chaplin presents evidence bearing on this point nationally, but what seems clear is that this is precisely what has happened in Texas t hrough most of the 1990s. Conclusions regarding dropouts in Texas It is clear that the TEA has been playing a Texas-sized shell game on the matter of c ounting dropouts. Every source of evidence other than the TEA (including IDRA, NCES, the Casey Foundation's KIDS Count data, Fassold's analyses and my own) shows Te xas as having one of the worst dropout rates among the states. (Recall that even t he Texas State Auditor's Office estimated that the 1994 dropout numbers reported by the TEA likely covered only half of the actual number of dropouts.) If we adopt the common sense definition that a dropout is a student who leaves school without grad uating from high school, analyses of data on enrollment by year, grade and ethnicity (an d numbers of high school graduates each year), tell a reasonably clear story of what h as happened in Texas over the last two decades. As shown in Figures 5.5 and 5.6, for the c ohort classes of 1982 to 1990, the percentage of Black and Hispanic students who progr essed from grade 6 to graduation six years later hovered around 65%. For White stude nts, the corresponding percentage started at about 80% and gradually declined to abou t 75% in 1990. For the cohort class of 1991, the year TAAS was implemented, the percent ages fell dramatically, to 55% for minorities and about 68% for White students. Betwee n 1992 and 1996, the corresponding percentages were 60% for minorities a nd 75% for Whites. Only after Texas was forced by the GED Testing Service to rais e its passing standard for receipt of a so-called high school equivalency degree in 1997, did the percentages persisting from grade 6 to high school graduation begin to creep ba ck up, to 65% for minorities in the class of 1999, and for White students to 78% in the same class. In sum, these results lead me to conclude that since the implementation of the TAAS high school graduation test in 1991, 22-25% of White students and 35-40% of Black and Hispanic students, have not been persisti ng from grade 6 to regular high school graduation six years later. Overall, during the 1990s the dropout rate in Texas schools was about 30%. As appalling as this result appears, in concluding this discussion of dropout evidence, I should point out that the hi gh school completion and drop out estimates derived from cohort analyses may actually understate the extent of the problem of dropouts (or to use TEA's euphemism, "school-lea ving before graduation"). Recall that one of the virtues of the IDRA attrition analy ses was that they sought to adjust
13 of 29estimates for net changes in school populations bec ause of student migration. The results of the cohort progression analyses just summarized implicitly assume that between the ages of 12 (grade 6) and 18 (grade 12), there is no net change in the size of the student population in Texas because of immigration (from ei ther other states or countries). If in fact there is a net out-migration, the dropout esti mates just summarized may be too high. If there is a net in-migration into Texas, the esti mates are low. To check on this possibility, I consulted a recent book on the demography of Texas, The Texas challenge: Population change and the futu re of Texas by Murdock et al. (1997). I cannot adequately summarize this inte resting book here. Suffice it to say simply that these demographers conclude that betwee n 1990 and 1995, migration into the state of Texas from other states and foreign co untries increased relative to what it had been in the 1980s (see Chapter 2). They suggest that annual rates of net migration into Texas have been on the order or 1-2% in the 15 years preceding 1995. The authors do not provide direct estimates of the age distribu tion of immigrants into Texas, but the overall implication of their results is clear. The estimates of the dropout problem in Texas derived from cohort progression analyses are somewhat low because they fail to take into account net in-migration of school age yo uth into the schools of Texas. (Note 26) But to be absolutely clear (and to avoid gettin g into semantic arguments about the meaning of the term "dropout"), I readily acknowled ge that what the cohort progression analyses show is the extent of the problem in Texas of students failing to persist in school through to high school graduationÂ—regardless whether it is caused by students having to repeat grade 9, failing to pass the exit level version of TAAS, officially "dropping out," opting out of regular high school p rograms to enter GED preparation classes, or some combination of these circumstances 7.2 Patterns of Grade Retention in the States As recounted above, previous research ind icates clearly that retention of students in grade, especially beyond the early elementary le vel, tends to increase the probability that students drop out of high school before gradua tion. As the recent report from the National Research Council succinctly stated, "In se condary school, grade retention leads to reduced achievement and much higher rates of sch ool dropout" (Heubert & Hauser, 1999, p. 285). For this reason, I sought to analyze rates of grade retention across the states (as reported in Heubert & Hauser, 1999, Tabl e 6.1 corrected) in a variety of ways and to see if there was a relationship between rate s of retention at the secondary level and rates of high school completion subsequently re ported by Kaufman et al. (1999). In their Table 6.1, Heubert & Hauser (199 9) reported rates of grade retention (specifically percentages of students retained in g rade) for 26 states and the District of Columbia in selected states for years for which suc h data were available (most other states do not collect grade retention data at the s tate level). As Heubert & Hauser (1999, p. 137) themselves observe, "Retention rates are hi ghly variable across the states." For example, first grade retention rates are reported a s varying from 20% to only 1 %. Rates of retention in the high school years are reported to vary similarly, from highs of 21-26% to lows of less than 5%. Using the approach describ ed in Section 5.4 above, I have analyzed rates of cumulative promotion and retentio n. Not surprisingly, cumulative chances of retention also vary widely. For example, in Mississippi and the District of Columbia, in recent years the chance of students be ing retained in grades 1 through 3 are more than 20%, while in other states (such as Maryl and and Arizona) chances are less than 5%. To explore the possible link between rete ntion in grade 9 and high school
14 of 29 completion, I merged data from Heubert & Hauser's T able 6.1 with data from the recent NCES Dropouts in the United States 1998 report (Kaufman et al., 1999). The resulting data set is shown in Table 7.3.Table 7.3 Grade 9 Retention and High School Completion in the StatesStateYear Grade 9 Retention Rate High school completion rate 18-24 year-olds, 1996-98Alabama1996-9712.6%84.2%Arizona1996-977.077.1District of Columbia 1996-9718.784.9 Florida1996-9714.383.6Georgia1996-9713.184.8Kentucky1995-9610.785.2Maryland1996-9710.394.5Massachusetts1995-966.390.6Michigan1995-964.891.0Mississippi1996-9719.782.0New York1996-9719.584.7North Carolina1996-9715.885.2Ohio1996-9711.489.4Tennessee1996-9713.486.9Texas1995-9617.880.2Vermont1996-974.893.6Virginia1995-9613.285.9Wisconsin1996-978.590.8 Sources: Heubert & Hauser (1999) Table 6.1; Kaufman et al. (1999), Table 5. Note that from the first source I took th e grade 9 retention rate for 1995-96 or 1996-97, whichever was latest. Note also that the h igh school completion rates suffer from the problems discussed earlier regarding CPS d ata as a source of evidence on high school graduation and dropouts. Nonetheless even a casual inspection of these data reveals a clear pattern. States with the higher rat es of grade 9 retention tend to have lower rates of high school completion. This pattern can be seen more clearly in Figure 7.3. (Note 27) Interestingly, Texas with a grade 9 reten tion rate of 17.8% has a slightly lower high school completion rate (80.2%) than we would e xpect given the overall pattern among the states shown in Figure 7.3--even though, as previously discussed this rate for
15 of 29Texas may well be inflated relative to other states because of the high rate of GED taking in Texas. Obviously, such a correlation betw een two variables, in this case, higher rates of grade 9 retention associated with lower ra tes of high school completion, does not prove causation, but such a relationship certainly tends to confirm the finding from previous research that grade retention in secondary school leads to higher rates of students dropping out of school before high school graduation 7.3 SAT Scores It is clear that a substantial portion of the increased pass rates on the TAAS exit test between 1991 and 1998 is, as mentioned previou sly, an illusion based on exclusion. Specifically, much of the apparent increase in grad e 10 TAAS pass rates is due to increased numbers of students taking the grade 10 e xit level version of TAAS being classified as special education students, and incre ased rates of students dropping out of high school in Texas, at least until 1997. When the low standard in Texas for passing the GED had to be raised because the GED Testing Servic e set a new minimum passing standard as of January 1, 1997, this seems to have had the effect of encouraging a few percentage points more students to persist in schoo l to graduation. Nonetheless, as best I can estimate, abou t half of the apparent increase in TAAS exit level pass rates cannot be attributed to such exclusions. So it is relevan t to address the question of whether gains on TAAS are a real in dication of increased academic learning among students in Texas or whether they re present scores inflated due to extensive preparation for this particular test. To help answer this question, it is neces sary to look at other evidence of student learning in Texas, to see whether the apparent gain s on TAAS since its introduction in 1991 are reflected in any other indicators of stude nt learning in Texas. I now summarize evidence from the SAT college admissions testÂ—the t est that used to be called the Scholastic Aptitude Test, briefly (and redundantly) the Scholastic Assessment Test, and now is officially named SAT-I. SAT scores are reported separately for th e verbal (SAT-V) and math (SAT-M) portions of this college admissions test, on a scal e ranging from 400 to 800 for each
16 of 29sub-test. Using data from the College Board on SAT scores for the states, I examined performance on the SAT of students in Texas compare d to students nationally from a number of perspectives (state rankings on the SAT-V and SAT-M from the 1970s to the 1990s, relative performance of different ethnic gro ups of students, performance of all SAT-takers vs. high school senior test-takers, etc. ). I will not try to summarize results of all of these analyses here. Suffice it to say that the general conclusion of these analyses is that, at least as measured by performance on the SAT, the academic learning of secondary school students in Texas has not improved since the early 1990s, at least as compared with SAT-takers nationally. (Source: Colle ge Board, State SAT Scores, 1987-1998 Number of SAT Candidates with Verbal and Math Mean Scores and Standard DeviationsÂ—National and for each State, 19 72 through 1998 and Report on the Record Numbers of Students in the High School Class (press release dated August 31, 1999).) Summary results of two sets of analyses o f Texas students' performance on the SAT compared with students nationally from 1972 to 1999 are shown in Figures 7.4 and 7.5. As can be seen from these figures, the perform ance of Texas students on the SAT was relatively close to the national average in 197 0s, but beginning around 1980, increasingly large gaps were apparent on both the S AT-V and SAT-M between national and Texas average scores. There was a slight narrow ing in the Texas-national gap on the SAT-M from about 1990 until 1993, but from 1993 to 1998, the gap has increased such that in 1999, on average Texas students were scorin g 12 points below the national average on the SAT-M (499 vs. 511). In short, the pattern of results on both the SAT-V and SAT-M for Texas secondary school students relative to students nationally fai ls to confirm the gains on the exit level TAAS during the 1990s. Moreover, the pattern of res ults on the SAT-M indicates that at least since 1993, Texas students' performance on th e SAT has worsened relative to students nationally. (Note 28)
17 of 29 One possible explanation for why gains on TAAS do not show up in gains on the SAT is that increasing numbers of students in Texas have been taking the SAT over the last three decades. Not surprisingly, state officia ls in Texas have advanced this idea to explain the obvious discrepancy between dramatic ga ins on TAAS in the 1990s and the relatively flat SAT scores for students in Texas. T o evaluate this possibility, we can look at numbers of students taking the SAT annually from 1972 to the present. It is indeed true that the numbers of students taking the SAT in Texas have increased faster (from around 50,00 annually through most of the 1970s to 100,000 in 1998) than nationally (from about 1 million annually to 1.2 million recen tly). However, it is also true that over this period the population of Texas has been increa sing far faster than the U.S. population. Murdock et al. (1997, p. 12) report for instance that the population of Texas grew from 11.2 million in 1970 to 18.7 million in 1 995 (a 67% increase) compared to a national population increase from 203 million to 26 3 million (a 29% increase). They also point out that the youth population of Texas i n particular has been growing faster than the national youth group. A better way to evaluate the hypothesis t hat increases in SAT-taking in Texas explain the flat pattern in SAT scores is to compar e the numbers of SAT-takers to the high school population. One such statistic is repor ted by the College Board, namely the percent of high school graduates taking the SAT. Fi gure 7.6 shows relevant data for the 50 states for 1999. Specifically, this figure shows state average SAT-M scores for 1999 compared with the percentage of high school graduat es in 1999 taking the SAT.
18 of 29 As can be seen in Figure 7.6, there is a clear relationship between these two variables. States with smaller percentages of high school graduates taking the SAT tend to have higher average SAT-M scores. States with la rger percentages of high school graduates taking the SAT tend to have lower average SAT-M scores. What about Texas? According to College Bo ard data, in 1999 Texas had 50% of high school graduates taking the SAT, scoring on av erage 499 on the SAT-M. This means that Texas, according to the pattern shown in Figure 7.6, has a slightly lower SAT-M average than states with comparable percentag es of high school graduates taking the SAT. For example, according to the 1999 College Board data, there were seven states that had between 49% and 53% of high school graduates taking the SAT (Alaska, California, Florida, Hawaii, Oregon, Texas and Wash ington). Among these states Texas had the lowest SAT-M average in 1999 (499), except for Florida (498). Leaving aside Florida, Texas had an SAT-M average 15-25 points below states with comparable percentages of high school graduates taking the SAT These results clearly indicate that the relatively poor standing of Texas among the sta tes on SAT scores cannot be attributed to the proportion of secondary school st udents in Texas taking the SAT. Moreover, the College Board data may actu ally understate the relatively poor performance of Texas students on the SAT. This is b ecause Texas has such a poor record regarding student progress to grade 12 and g raduation. Even if we use the very conservative estimates of high school completion de rived from CPS data (and reproduced in Table 7.2 above) we see that Texas ha s a rate of non-completion of high school among young adults of about 20%Â—more than 5 percentage points above the national rate (and as the discussion in Section 7.1 indicated, this figure surely underestimates the extent of the high school dropou t problem in Texas). 7.4 NAEP Scores Revisited As mentioned in Section 3.4 above, 1996 N AEP mathematics scores were released in 1997 and seemed to provide confirmation that gains apparent on TAAS were
19 of 29 Table 7.4 TAAS Standard Score Results All Students Not in Special Education, 1994-99 1994 Mean (SS) 1995 Mean (SS) 1996 Mean (SS) 1997 Mean (SS) 1998 Mean (SS) 1999 Mean (SS) Gain 1994-99 Gain/SD (SD for TLI=15;SD for x-level writing test est'd = 200)Grade 4 Reading4-78.44-80.14-79.94-80.94-84.44-22.214.171.124Math4-70.54-74.64-77.34-79.04-80.04-80.910.40.69Writing16401647164616631670167333-Grade 8 Reading8-77.88-78.08-79.88-81.88-83.38-126.96.36.199Math8-70.08-69.78-73.88-76.78-78.78-80.810.80.72Writing15911606161116311655166372-Grade 10 Readingx-77.7x-77.8x-80.0x-82.1x-83.9x-188.8.131.52Mathx-69.9x-71.2x-72.9x-75.2x-77.4x-184.108.40.206Writing164816771685171917081734860.43Source: www.tea.tx.state.us/student.assessment/resu lts/summary/ The last column in Table 7.4 shows the 19 94 to 1999 gains on TAAS divided by the relevant TAAS test standard deviation (15 for t he reading and math TAAS tests and 200 for the TAAS exit level writing test). These re sults, average test score changes divided by the relevant standard deviations, may be interpreted as effect sizes. Before discussing the meaning of the resu lts shown in the last column of Table 7.4, a brief summary of the idea of effect size may be h elpful. (Yes, dear reader, yet another digression. But if you know about meta-analysis and effect size, just skip ahead.) The concept of effect size has come to be widely recogn ized in educational research in the last two decades because of the increasing prominence of meta-analysis. Meta-analysis refers to the statistical analysis of the findings of prev ious empirical studies. With the proliferation of research studies on particular iss ues, statistical analysis and summary of patterns across many studies on the same issue have proven to be a useful tool in understanding patterns of findings on a research is sue (Glass, 1976; Cohen, 1977; Glass, McGaw & Smith, 1981; Wolf, 1986; Hunter & Schmidt 1 990, and Cooper & Hedges, 1994 are some of the basic reference works on metaanalysis). In meta-analysis, effect size is defined as the difference between two group mean scores expressed in standard
20 of 29score form, orÂ—since the technique is generally app lied to experimental or quasiexperimental studiesÂ—the difference between the mea n of the treatment group and the mean of the control group, divided by the standard deviation of the control group (Glass, McGaw & Smith, 1981, p. 29). Mathematically this is generally expressed as: Interpretation of magnitude of effect siz es varies somewhat according to different authorities, but one commonly cited rule of thumb i s that an effect size of 0.2 constitutes a small effect, 0.5 a medium effect and 0.8 a large effect (Cohen, 1977, Wolf, 1986, p. 27). As a general guideline, the Joint Disseminatio n Review Panel of the National Institute of Education adopted the approach that an effect size had to be one-third (0.33) or at least one-quarter (0.25) of a standard deviat ion in order to be educationally meaningful (Wolf, 1986, p. 27). While meta-analysis has been applied in m any areas of social science research, perhaps most directly relevant to interpretation of TAAS and NAEP score changes are studies which have employed meta-analysis to examin e the effects of test preparation and coaching. Becker's (1990) analysis of previous stud ies of the effectiveness of coaching for the SAT is a good example of such a study. Thou gh she used a metric for comparing study outcomes which is somewhat unusual in the met a-analysis literatureÂ—namely the standardized mean-change measureÂ—this measure is co mputed in standard deviation units, and is directly analogous to effect size. Be cker analyzed study outcomes in terms of some 20 study characteristics having to do with bot h study design and content of coaching studied. Like previous analysts, she found that coaching effects were larger for the SAT-M than for the SAT-V. However, unlike some previous researchers, she did not find that duration of coaching was a strong predict or of the effects of coaching. Instead, she found that of all the coaching content variable s she investigated, "item practice," (i.e., coaching in which participants were given practice on sample test items, was the strongest influence on coaching outcomes). Overall, she concluded that among 21 published comparison studies, the effects of coachi ng were 0.09 standard deviations of the SAT-V and 0.16 on SAT-M. Against this backdrop, the gains on TAAS summarized in Table 7.4 appear quite impressive. Across all three grades and all three T AAS subject areas (reading, math and writing), the magnitude of TAAS increases ranged fr om 0.43 to 0.72 standard deviation units. According to guidelines for interpreting eff ect sizes, these gains clearly fall into the range of medium to large effects. Also, the gains o n TAAS clearly exceed the gains that appear possible, according to previous research, fr om mere test coaching. (In one respect though, the TAAS gains do parallel results from Bec ker's study of test coaching: gains on math tests are larger than those on reading tests.) The gains on TAAS seem especially impressive when it is recalled that the gains on TA AS summarized in Table 7.4 represent performance of hundreds of thousand of Texas studen ts, while most of the studies examined via meta-analysis involved mere hundreds o r thousands of subjects. Having re-examined TAAS score changes in Texas from the effect size perspective, we may now turn to revisit NAEP scores for Texas. The fundamental question we address is whether NAEP results for Tex as provide confirmation of the dramatic gains apparent on the TAAS. We first consi der NAEP results for Texas, overall,
21 of 29 for grade 4 and 8 students and then take a closer l ook at results disaggregated by ethnic group. Table 7.5 Mean NAEP Scores, Texas and Nation, Grade 4 and 8, 1990-98 19901992199419961998 MeanSDMeanSDMeanSDMeanSDMeanSD Reading, Grade 4 Texas 2133421239 21735Nation 216.736214.341 21738 Reading, Grade 8 Texas 26231Nation 260.036259.637 264.035 Writing, Grade 8 Texas 154 Nation 15035 Mathematics, Grade 4 Texas 217.930.3 228.729.2 Nation213.131.8219.731.7 223.931.2 Mathematics, Grade 8 Texas258.235.4264.636.8 270.234.0 Nation262.636268.436.3 272.036.4 Science, Grade 8 Texas 145.1 Nation 148.534.1 Source: NAEP Data Almanac http://nces.ed.gov/nation sreportcard/TABLES/index.shtml There are two perspectives from which to consider the NAEP results for Texas shown here. We may compare the mean scores of Texas 4th and 8th graders with 4th and 8th graders nationally, or for NAEP reading and mat h state assessments (the only ones done in more than one year), we may look at how the performance of Texas students seems to have changed over time. From the former pe rspective, it is clear that the performance of Texas 4th and 8th graders is very si milar to the performance of 4th and 8th graders nationally. In all eleven instances in which state NAEP assessments allow comparison of student performance in Texas with stu dent performance nationally, there is not a single instance in which average NAEP scores in Texas vary from national means by as much as two-tenths of a standard deviation. T exas grade 8 students scored better than students nationally on the NAEP writing assess ment in 1998, but they scored worse on the science assessment in 1996, by about the sam e amount (+ 0.10 standard deviation units in writing and 0.10 in science). It may be re called that according to guidelines in the metaanalysis literature, differences of less than one-quarter of a standard deviation are small and not considered educationally meaningful. In reading, at grade 4 we have three
22 of 29 years in which we can compare Texas NAEP reading sc ores with the national average, 1992, 1994 and 1998. There appears to be a very sli ght trend for Texas grade 4 reading scores to have converged with the national average between 1992 and 1998; but note, that to begin with, in 1992 the Texas average was only o ne-tenth of a standard deviation below the national average: (216.7-213)/36 = 0.102. In grade 8 reading we have a Texas-national comparison for just one year, 1998. In 1998, Texas eighth graders scored on average only very slightly below the national av erage, but again, the difference was less than one-tenth of a standard deviation: (264-2 62)/35 = 0.057. We also have three years in which we can compare national and Texas NAEP math scores, 1990, 1992 and 1996. In 1992, the Texas NAE P math score average at grade 4 (217.9) was only slightly below the national averag e (219.7), but by 1996, it was slightly above the national average, by an amount equivalent to about 15% of a standard deviation: (228.7-223.8)/31.2= 0.154. For 1990, 199 2 and 1996, the Texas NAEP math grade 8 average was slightly below the national ave rage by amounts equivalent to 12%, 10% and 5% of the national standard deviation. Now, let us put aside national NAEP resul ts and simply consider the gains apparent in state NAEP results for Texas. Between 1994 and 1 998, the Texas NAEP reading average increased from 212 to 217, an amount equiva lent to 12% of the 1994 national standard deviation (5/41 = 0.122). At grade 8, the Texas NAEP math average increased 12 points between 1990 and 1996, an amount equivale nt to 33% of a standard deviation (12/36 = 0.33). According to the guidelines cited e arlier from the meta-analysis literature, this is an amount that qualifies as a small, but ed ucationally meaningful difference. More germane to the question whether TAAS gains are real is consideration of the magnitude of the gains apparent on TAAS (shown in T able 7.4 above) and those apparent on state NAEP results (shown in Table 7.5). In gene ral, the gains on TAAS, between 1994 and 1999 (in the range of 0.43 to 0.72 standar d deviation units) are far larger than the range of gains apparent on NAEP (in the range o f 0.12 to 0.33). Unfortunately, there is only one pair of years in which we have results from state NAEP and TAAS for the same subject, namely reading. Between 1994 and 1998 the average grade 4 TLI increased from 78.4 to 84.4, equivalent to 0.40 sta ndard deviations. Between 1994 and 1998, the average grade 4 Texas NAEP score increase d from 212 to 217, equivalent to 0.12 standard deviations (5/41 =0.122, and even if we divide by the Texas standard deviation, we get just 5/39 = 0.128). Even before w e look beneath the surface of NAEP averages for Texas, these results, with gains on NA EP far less than half the size of gains apparent on TAAS (and in the single instance when a direct comparison was possible, NAEP gains of 0.12 were just 30% the size of the 0. 40 gain apparent on grade 4 TAAS), suggest clearly that the bulk (at least two-thirds) of the dramatic gains on TAAS are simply not real. Next, let us delve below the surface of t he Texas state NAEP averages and consider the Texas NAEP reading and math averages s eparately for White, Black and Hispanic students. Table 7.6 shows relevant results for state NAEP reading and math tests.Table 7.6 Texas Mean NAEP Scores by Ethnicity Grade 4 and 8, 1992, 1994 and 1998 19901992199419961998
23 of 29 Reading, Grade 4 White 224227 232Black 200191 197Hispanic 201198 204 Reading, Grade 8 White 273Black 245Hispanic 252 Mathematics, Grade 4 White 228 242Black 197 212Hispanic 207 216 Mathematics, Grade 8 White273279 285 Black236243 249 Hispanic245248 256 Source: NAEP Data Almanac nces.ed.gov/nationsreport card/TABLES/index.shtml, Reese et al., 1997; Mullis et al., 1993. As can be seen here, the gap between the average NAEP scores of White students in Texas and those of Black and Hispanic students i s fairly consistently in the range of 25 to 35 points. There is a tendency for Hispanic stud ents in Texas to score slightly better on NAEP tests than Black students; but overall, Hispan ic and Black students in Texas score on average between two-thirds and a full standard d eviation below the mean of White students. Moreover, at grade 4, there is an increase in the White-minority gap in NAEP reading scores between 1992 and 1998. In 1992, the NAEP grade 4 reading average was 224 for White students, 200 for Black students and 201 for Hispanics. By 1998, the corresponding averages were 233, 197 and 204. At this point, the reader may begin to do ubt the consistency of my approach to data analysis. In Section 4.1, when discussing the issue of adverse impact, I applied three tests of adverse impact: the 80% rule, tests of statistic al significance, and evaluation of practical significance of differences. The critical reader may well wonder whether, if I applied these same standards to the NAEP results fo r Texas, and in particular the 1996 NAEP math results for math, I might so easily dismi ss the significance of the gains apparent for Texas. Apparent gains for Texas in NAEP math sco res between 1992 and 1996 were indeed statistically significant. And in terms of p ractical significance, critical readers may well be asking themselves, even if the gains were n ot large in terms of the standard deviation units perspective suggested in the meta-a nalysis literature, gains on the order of a third of standard deviation, when apparent for a population of a quarter million students (roughly the number of fourth graders in Texas in 1 996), are surely are of practical significance. Also, it may be recalled from Section 3.4 above that the NAEP math gains for Texas fourth graders between 1992 and 1996 were greater than the corresponding
24 of 29 gains for any other state participating in these tw o NAEP state assessments. So any reasonable person must concede that the apparent im provement of Texas grade 4 NAEP math average from 217.9 in 1992 to 228.7 in 1996 (a gain of about one-third of a standard deviation), if real, is indeed a noteworth y and educationally significant accomplishment. But there is that "if." The other perspec tive not yet brought to bear in considering changes in NAEP test score averages is advice offer ed in Part 1. When considering average test scores, it is always helpful to pay at tention to who is and is not tested. NAEP seeks to estimate the level of learn ing of students in the states not by testing all students in the states in a particular grade, b ut through use of systematic and representative sample of schools and students. With out getting into details of NAEP sampling, let us focus here on the fact that not al l students sampled are actually tested. Some students selected for NAEP testing are exclude d because they are limited English proficient (LEP) or because of their status as spec ial education students, whose individualized education plans (IEPs) may call for them to be excluded from standardized testing. NAEP researchers have long recognized tha t exclusion of sampled students from NAEP testing has the potential to create bias in NA EP results. Here is how one NAEP report discussed the issue: The interpretation of comparisons of achievement be tween two or more assessments depends on the comparability of the pop ulations assessed at each point in time. For example, even if the profic iency distribution of the entire population at time 2 was unchanged from that at time 1, an increase in the rate of exclusion would produce an apparent gai n in the reported proficiencies between the two time points if the ex cluded students tend to be lower performers. (Mullis et al., 1993, p. 353). Because excluding sampled students from N AEP testing has the potential for skewing results, over time NAEP has developed detai led guidelines for excluding students from testing, and has taken special steps to try to include LEP and special education students in NAEP testing, for example, by allowing accommodations to standard NAEP testing procedures to meet the needs of special education students. (See Reese et al., 1997, Chapter 4 for a discussion of e fforts to make NAEP math assessments more inclusive.)Table 7.7 Percentages of IEP and LEP Students Excluded from NAEP State Math Assessments, Texas an d NationMathematics, Grade 4199019921996 Texas 8%11%Nation 8%6% Mathematics, Grade 8 Texas7%7%8%Nation6%7%5%
25 of 29Source: Reese et al., 1997, pp. 91, 93; Mullis et a l., 1993, pp. 324-25 Given this background, let us now conside r the percentages of students sampled in state NAEP math assessments who were excluded from testing. Table 7.7 shows the percentages of sampled students excluded from testi ng in NAEP state math assessments in 1990, 1992 and 1996 for both Texas and the natio n; recall that in the original trial state NAEP assessment in 1990 only grade 8 was tested. As can be seen in this table, at the national level, between 1992 and 1996, the percenta ges of students excluded fell slightly (from 8% to 6% at grade 4, and from 7% to 5% at gra de 8). These results at the national level were presumably a result of efforts to make N AEP more inclusive in testing LEP and special education students. However, in Texas, the percentages of students excluded from testing increased at both grade levels: from 8 % to 11% at grade 4, and from 7% to 8% at grade 8. This means that some portion of the increased NAEP math averages for Texas in 1996 are illusory, resulting from the incr eased rates of exclusion of LEP and special students in Texas from NAEP testing. The ga ps in rates of exclusion between Texas and the nation in 1996 also mean that compari sons of Texas with national averages in that year will be skewed in favor of Texas for t he simple reason that more students in Texas were excluded from testing. In short, as with TAAS results, some portion of the apparent gains on NAEP math tests in Texas in the 1 990s is an illusion arising from exclusion. As with TAAS gains, can we estimate what portion of the apparent NAEP gains are real and what portion are artifactual attributa ble to the increased rates of exclusion of Texas students from NAEP testing? Fortunately, rega rding NAEP we have a clear model for estimating the impact of exclusion on NAEP scor es. In NAEP 1992 Mathematics Report card for the Nation and the States Mullis et al. (1993, pp. 352-355) discuss the problem of excluding students from NAEP testing and apply a model for estimating the effects of exclusion on distributions of NAEP score s. What these researchers did was to recompute national NAEP results based on the assump tion that "all excluded and all absent students, had they been assessed, would have scored below the 25th percentile of all students" (Mullis et al.,1993, pp. 353). Using this approach, we can recompute the NAEP math averages for Texas in 1996, assuming that the percentages of Texas students excluded from NAEP testing were at the national ave rage (6% at grade 4 and 5% at grade 8, as opposed to the observed 11% and 8% exclusions reported for Texas in 1996.). The NAEP data almanac reports that on the 1996 NAEP math assessments, the scores equivalent to the 10th percentile in Texas w ere 190.4 and 225.5 for grade 4 and 8, respectively. Using these figures, assuming that th e 1996 exclusion rates in Texas were the same as the national rates (and that excluded s tudents in Texas would have scored at the 10th percentile), we may recompute the average grade 4 and grade 8 NAEP math scores for Texas as follows: Grade 4: 0.95(228.7) + 0.05(190 .4) = 226.9 Grade 8: 0.97(270.2) + 0.03(225 .5) = 268.9 These results indicate that on the order of 20%-25% of the NAEP gains for Texas between 1992 and 1996 were due simply to the high r ate of exclusion of students from NAEP testing in 1996. In other words, given these c alculations to adjust for the high rates of exclusion of Texas students from NAEP testing in 1996, the gain of scores in Texas from 1992 to 1996 would be 9 points at grade 4 and 4.3 points at grade 8. The former is still considerably above the national increase of 4 points at grade 4, but no longer highest
26 of 29among the states (North Carolina showed a grade 4 N AEP math gain of 11 points between 1992 and 1996, while excluding just 7% of g rade 4 students from testing in 1996). And the gain of 4.3 points at grade 8 would leave Texas very near the level of the national gain apparent between 1992 and 1996. In summary, review of results of NAEP fro m the 1990s suggests that grade 4 and grade 8 students in Texas performed much like stude nts nationally. On some NAEP assessments, Texas students scored above the nation al average, and on some below. In the two subject areas in which state NAEP assessmen ts were conducted more than once during the 1990s, there is evidence of modest progr ess by students in Texas; but it is much like the progress evident for students nationa lly. Reviewing NAEP results for Texas by ethnic group, we see a more mixed picture. In many comparisons, Black and Hispanic students show about the same gain in NAEP scores as White students, but the 1998 NAEP reading results, suggest that while White grade 4 reading scores in Texas have improved since 1992, those of Black and Hispan ic students have not. More generally, however, the magnitudes of the gains app arent on NAEP for Texas fail to confirm the dramatic gains apparent on TAAS. Gains on NAEP in Texas are consistently much less than half the size (in standard deviation units) of Texas gains on state NAEP assessments. These results indicates that the drama tic gains on TAAS during the 1990s are more illusory than real. The Texas "miracle" is more myth than real. Before leaving this review of state NAEP results for Texas, it may be helpful to mention Rodamar's (2000) excellent review once more As mentioned previously, he reviewed TAAS and NAEP results for Texas not in ter ms of changes measured in standard deviation units, but in terms of percent p assing TAAS and percent meeting the NAEP "basic" proficiency standard. While he focused on reading and math test scores (i.e., he did not review NAEP science and writing r esults), Rodamar reached conclusions very similar to those derived from reviewing NAEP r esults in terms of effect size changes: When it comes to educational achievement, by nearly any measure except TAAS, Texas looks a lot like America. Texas was nea r the national average on many measures of educational performance when TA AS was introducedÂ—and remains there. (Rodamar, 2000, p. 27 ). 7.5 Other Evidence TAAS scores, graduation rates, SAT scores and evidence from NAEP are the most obvious sources of evidence regarding education in Texas. But I have also searched for other evidence that might be available. For example in its annual review of the "state of the states," Education Week has assembled a wide range of data on a number of dimensions of education in the states (Jerald, 2000 ). Since this source is widely available, I will not review it in detail. But three findings are worth mentioning. First, Texas received a grade of D in the category of Improving Teacher Quality. Second, the Lone Star state received only middling marks on dimensio ns of School Climate (C) Resource Adequacy (C+), and Equity (C). Finally, I was struc k by the relatively low rate of going to college in Texas. In Texas, in 1996, only 54 % o f high school graduates were reported to be enrolling in a twoor four-year college, as compared with 65% nationally (Jerald, 2000, p. 71). (Note 30) This led me to inquire further into anoth er Texas testing programÂ—the Texas Academic Skills Program of TASP test. The Texas Hig her Education Coordinating Board describes the TASP testing program thus at its webs ite (www.thecb.state.tx.us/):
27 of 29 You Ready for College?Are you are ready for college courses? Not sure?Texas, you can find out if you have the reading, wr iting, and math skills you need to do college-level work through th e Texas Academic Skills ProgramÂ—or TASP. The TASP Test, which is par t of the TASP program, is requiredÂ—it is not optional.Beginning in fall 1998, you must take the TASP Test or an alternative test, before beginning classes at a pub lic community college, public technical college, or public univer sity in Texas. TASP Test is not an admissions test, however. You c annot be denied admission to a public institution of higher educati on based on your TASP Test score. If you need to improve your skills you are not alone. About one-half of students entering college need some help. Take the TASP Test while you are in high school so you can identify the skills you need to improve. You'll be confident that you are ready for college. What have been the results of this "colle ge readiness" testing program? I found the graph reproduced in Figure7.7 in a report available on the same website. Source: Texas Academic Skills Program, Annual Repor t on the TASP and the Effectiveness of Remediation, July 1996.
28 of 29 I could not find more recent results of T ASP testing on the Texas Higher Education Coordinating Board website, but Chris Patterson on the Lone Star Foundation of Austin, TX (personal communication March 22, 2000) generous ly sent me a summary of TASP results from 1993 to 1997, reproduced in Table 7.8 below.Table 7.8 Annual Texas Academic Skills Program Report of Student Performance Pass Rates by Race/Ethnicity and Test Area 1993-1997 High School Graduating ClassesYear Total Count All 3 Parts Pass Rate Reading Pass Rate Math Pass Rate Writing Pass Rate All Groups 199364,66278.0%90.3%86.0%90.3%199463,25765.2%83.2%79.3%82.5%199573,20751.7%75.3%64.3%80.8%199668,81048.1%74.4%60.6%80.0%199767,83343.3%70.7%55.9%79.3% Native American 199310783.2%92.5%89.7%90.7%199410864.8%82.4%84.3%81.5%199516152.8%77.0%66.5%89.4%199613657.5%79.4%69.1%84.6%199713042.3%64.6%65.4%82.3% Asian 19932,42479.5%90.5%95.7%84.8%19942,62563.0%78.6%92.5%69.3%19953,16853.9%69.7%85.2%66.9%19962,60849.2%68.9%80.8%66.2%19972,39248.5%66.1%78.6%68.7% Black 19935,67857.7%79.8%69.0%79.6%19945,85944.2%70.9%60.3%69.3%19957,01531.2%60.8%43.4%69.3%
29 of 29 19967,00829.7%61.1%41.2%69.5%19977,86724.9%56.9%35.9%68.2% Hispanic 199314,34967.6%84.2%79.6%85.1%199415,07552.9%75.9%70.9%75.3%199518,12137.9%65.4%53.2%72.2%199617,92634.8%65.1%49.1%71.4%199719,16630.9%62.3%45.0%70.9% White 199342,10484.2%93.7%89.9%93.9%199439,59073.1%88.2%84.5%87.9%199544,74260.3%82.0%70.7%87.1%199641,13257.0%81.1%67.6%86.4%199738,27853.0%78.1%64.0%86.4%Source: Texas Higher Education Coordinating BoardNote: These results reflect pass rates on the initi al attempt on the TASP test only.Reviewing these results from TASP testing, and comp aring them with results of TAAS testing (see Figure 3.1 for example), the conclusio n seems inescapable that something is seriously amiss in the Texas system of education, t he TAAS testing program or the TASP testing programÂ—or perhaps all three. Between 1994 and 1997, TAAS results showed a 20% increase in the percentage of students passing all three ex it level TAAS tests (reading, writing and math). But during the same in terval, TASP results showed a sharp decrease (from 65.2% to 43.3%) in the percentage of student s passing all three parts (reading, math, and writing) of the TASP college re adiness test.
1 of 10 Education Policy Analysis ArchivesVolume 8 Number 41The Texas Miracle in Education Walt Haney 8. Summary and Lessons from the Myth Deflated Before recapping the territory covered in this article and suggesting some of the broader lessons that might be gleaned from the myth of the Texas miracle in education, I pause for one more digression (readers who have mad e it this far likely will not be too surprised by yet another detour). The detour is to recount a small survey of scholars undertaken in the summer of 1999. After this side e xcursion, I summarize "the myth of the Texas miracle." Finally, in closing, I suggest some of the broader lessons that might be gleaned from this examination of the illusory Te xas miracle. 8.1 The "Two Questions Survey" on School Reform In August 1999, as I was preparing for th e start of the TAAS trial in September, I re-read a number of key documents regarding the dev elopment of the TAAS testing program in Texas. One was the Minutes of the Texas State Board of Education in July 1990 (a full copy of these minutes is reproduced in appendix 8 of this article for ease of reference). It may be recalled that it was at this meeting that the Board set the passing scores on TAAS. When reviewing minutes of this meet ing, I was struck by the following passage: Commissioner [of Education in Texas] Kirby reiterat ed some of the information presented to Committee of the Whole dur ing the Thursday, July 12, 1990, work session on the TAAS, noting the reco mmendations of the staff regarding this item. Mr. Davis asked for the rationale for the two-year phase in rather than going immediately to the 70% [passing score on TAAS ] or a one-year phase in. The commissioner stated that this would give th e board an opportunity to clearly set that 70% is the standard--to state the expectation and expect the schools to present the skills to the students and h elp the students develop those skills so that this is not an unreasonable ex pectation. Dr. Kirby said that since this is a different, more difficult test the needed phasein time is suggested at least until the results of the fall ad ministration are known. Mr. Davis expressed concern that the test does not appe ar to be indicative of what is being presented in the classroom. Commissio ner Kirby replied that the test is an accurate measurement of what student s should be learning, but the test is moving much further in the areas of pro blem solving, higher order thinking skills, making inferences, and drawing con clusions. He said that it is not believed that at this point in time every st udent has been adequately prepared in those skills, because with the Texas Ed ucational Assessment of Minimum Skills (TEAMS) tests, emphasis has been pla ced on the basic skills. The commissioner noted that the test drives the curriculum and that it
2 of 10will require a year or two to make that kind of adj ustment in the focus of the curriculum. (TEA, 1997, Appendix 9 of the Texas Student Assessment Program Technical Digest for the Academic Year 1996 -1997 pp. 337 Â– 354) My reaction to this record was that it is shall we say, slightly implausible to suppose that simply changing from the basic skills TEAMS test to the more challenging TAAS test would lead to statewide changes in teachi ng in Texas such that within "a year or two" teachers would be focusing not simply on "b asic skills" but on "problem solving, higher order thinking skills, making inferences, an d drawing conclusions." To test my own reaction against the views of a broader sample of school reform observers, I undertook a "two questions survey of school reform. So, on Monday, August 16, 1999, I sent a survey via electronic mail to sixteen people, whom I respected as knowledgeable students of school reform initiatives around the country. On August 21, I resent the query to an additional 11 people whose names had been suggested by respondents to my first query. As of September 6, 1999, I had received 10 responses to my questions. Though I do not know what typical response rates are to email surveys of this sort (odd questions posed to busy people in late summer, with no explanation as to their possible import), my own vi ew is that a response rate of 37% (10/27=0.3704) is probably not too bad. Here is the full text of the email survey including the two questions posed: Colleagues: I would like to ask the favor of asking you to answer two questions. Given your professional expertise, I tru st the questions will be of some interest. Also, your answers may be of some im port. For now, I will not explain the exact reason for my questions, as I wou ld not want it to influence your answers. Imagine a very large school system th at has been focusing on basic skills instruction for some years. The focus has been spurred in part by a high stakes test of basic skills. It is assumed t hat 80-90% of teachers have been covering the basic skills in their instruction In light of current educational reform id eas, the system decides that it needs to move beyond basic skills teaching to focus in the future on problem solving, higher order thinking skills, making infer ences and drawing conclusions. In light of this situation, and your expe rtise in studying school reform, my two questions to you are these: How long would it likely take for this large school system to shift from having 80-90% of teachers teaching basic skills, to having 80-90% of teachers teaching the more advanced skills? 1. What would be the key ingredients required to make such a shift in instruction possible in the time you envision in yo ur answer to the first question? 2. Please keep your answers brief and email them to me by August 30. In exchange for your kindness in responding to my requ est, I will compile answers, distribute them to whomever responds, and explain the specific reason that motivates the questions. The ten scholars who responded to the sur vey were (in alphabetical order): David K. Cohen, Jane David, Daniel Koretz, Henry Levin, H ayes Mizell, Fred Newmann, Stan Pogrow, Ted Sizer, Adam Stoll, and Anne Wheelock.
3 of 10 Before summarizing what they said in resp onse to the survey, two prefatory points should be added. First, all of these correspondents have generously allowed me to reproduce the full text of their survey responses ( see Appendix 9). Second, despite the generosity of these people in responding so quickly (all within three weeks at the end of summer 1999), we did not even attempt to use the su rvey results in the TAAS trial in September. Inasmuch as lawyers for the State of Tex as were already trying to exclude from the trial evidence they had known about for mo nths, Mr. Kauffman advised me that that they might not entirely welcome new evidence f rom a survey they had not even heard about before the trial began. As mentioned, all ten responses are repro duced in their entirety in Appendix 9. Here I simply summarize three overall patterns in t he ten responses. Gentle Chiding. Half of the respondents (Cohen, Koretz, Pogrow, Sto ll and Wheelock) chided me gently for advancing something of a false dichotomy between "basic skills" and advanced or "higher order thinki ng" skills. I can only plead mea culpa but given the background to the survey explained ab ove, I trust that my oversimplification may be forgiven. Shifting the course of large educational systems ta kes years. The first question asked "How long would it likely take for this large school system to shift from having 80-90% of teachers teaching basic skills, to having 80-90% of teachers teaching the more advanced skills?" Though all respondents qualified their answers in one way or another, all did provide some sort of time estimate. In brie f these were: Cohen, 10 years; David, 10 to 20 years; Koretz, 3 to 4 years; Levin, 2 to 5 years; Mizell, 7 to 8 years; Newmann, At least six years; Pogrow, 2 to 4 years; Sizer, At least 5 years; Stoll, At least 20 years; Wheelock, 10 to 15 years. Two things strike me about these response s. First is the remarkable variance in responses; from "2 to 4 years" to "at least 20 year s" (and even if we throw out these outliers, variance remains nearly as great). This s uggests that even among scholars who have studied such matters, we really do not know ve ry much about long it takes to shift the course of large educational enterprises. Second is that the median value seems to fall somewhere in the range of 5 to 10 years. This is of course far longer than the 1 to 2 years presumed by Commissioner Kirby in Texas in 1990. Huge resources required. The second survey question was: "What would be the key ingredients required to make such a shift in instruction possible in the time you envision in your answer to the firs t question?" Answers to this question were generally far longer than answers to the first question, but in general indicated that a large quantity and range of resources would be need ed to change the course of a large educational enterprise, including professional deve lopment opportunities for teachers, leadership, community outreach, lower pupil/teacher ratios, more instructional resources, better social services for students, and reform of teacher education institutions. Jane David's summary answer was "massive teacher re-educ ation and powerful recruitment strategies." Henry Levin's answer suggested that si gnificant change in instruction could come about in two to five years, given the followin g ingredients: continuous staff development, continuous support an d technical assistance, administrative encouragement, intrinsic and extrins ic incentives, public information on results, and a culture of commitment Add to this transformation of local teacher training programs, careful selection of new teachers, and a strong public relations campaign, a nd things will move. Every administrator will have to become a cheerlead er.
4 of 10He then added "The problem is that no district has ever been able to achieve these conditions. Further, this will be competing with ba sic skills testing that is often high stakes and high visibility promoted by the states." Adam Stoll wrote, in part: It's immensely hard to get a critical mass of teach ers within a school, let alone a district, to significantly change their pra ctice. I would think getting a majority to exhibit practice that is highly support ive of advanced skill acquisition would be very optimistic, but possibly attainable under optimal circumstances. I can only imagine having 80-90 % of teac hers place a lot of emphasis on "teaching the more advanced skills" if some pret ty sweeping changes occurred. I think it would take at least 20 years f or these changes to begin affecting practice on this scale. These extracts are really an inadequate s ummary of the observations offered by survey respondents, so I encourage readers to revie w their observations, reproduced in full in Appendix 9. Nonetheless, it is clear that v ery few of the ingredients suggested as needed for large-scale educational reform were prov ided in Texas in the early 1990s. This suggests why the purported "miracle" of educational reform in Texas is not only largely illusory, but indeed has had widespread negative co nsequences for both students and educators in the Lone Star state. After recapping t he myth of the Texas miracle, I will suggest that this is a lesson from which we should learn. Myopic accountability schemes based on high stakes testing likely will have simil arly perverse consequences elsewhere if we do not learn from the unfortunate story of Texas education in the last decade of the 20th century. 8.2 Recapping the Myth Since the territory covered in this artic le is extensive, let me try to sum up the journey so far. After an introduction (pointing out among other things that this writer may not be viewed by all as a totally unbiased observer of education in Texas), I summarized the recent history of education and statewide testi ng in Texas, which led to introduction of the Texas Assessment of Academic Skills (TAAS) i n 1990-91. Since then TAAS testing has been the linchpin of educational accoun tability in Texas, not just for students, but also for educators and schools. Part 3 recounted how a variety of evidenc e in the late 1990s led a number of observers to conclude that the state of Texas had m ade near miraculous educational progress on a number of fronts. Between 1994 and 19 98, the percentage of students passing the three grade 10 TAAS tests had grown fro m 52% to more than 70%. Also, the racial gap in TAAS results seemed to have narrowed. Statistics from the Texas Education Agency showed that over the same interval dropout r ates had declined steadily. Finally, in 1997, release of results from the National Assessme nt of Educational Progress (NAEP) showed Texas 4th graders to have made more progress on NAEP math tests between 1992 and 1996 than those in any other state partici pating in state NAEP testing. These developments led to a flurry of editorial praise fo r the apparent educational progress of the Lone Star State. Some went so far as to suggest even that the Texas experience should serve as a model for federal education legislation. Part 4 began a closer examination of both TAAS and what has been happening in Texas schools over the last several decades. Sectio n 4.1 showed that by any of the
5 of 10prevailing standards for ascertaining adverse impac t, grade 10 TAAS results continue to show discriminatory adverse impact on Black and His panic students in Texas. It was also shown that use of TAAS results in isolation to cont rol award of high school diplomas is a clear violation of professional standards concernin g appropriate test use. Previously I explained how expert witnesses for the state of Tex as had challenged my interpretation or the Standards for Educational and Psychological Testing sponsored by AERA, APA and NCME. In July, 2000, AERA issued a statement that, at least in my view confirms my interpretation of the Standards. (See www.aera.net/ about/policy/stakes.htm) Section 4.2 demonstrated that the passing scores set on TAAS tests were arbitrary, discriminatory and failed to take measurement error into account. Furthermore, analyses comparing TAAS reading, writing and math scores wit h one another and with relevant high school grades raise doubts about the reliabili ty and validity of TAAS scores. Finally, it was demonstrated how a sliding scale approach (t aking into account both test scores and grades) could be applied in a more professional ly sound and less discriminatory manner. Stepping back from the arcane technology of standardized testing, Part 5 discussed problems of missing students and other mirages in T exas. First, patterns of student enrollment in Texas between 1975 and 1999 were exam ined by studying rates of progress from grade 9 to high school graduation, grade to gr ade progression ratios, and grade 6 to high school graduation rates. Without trying to sum marize results of all of those analyses here, let me mention just some of the substantive f indings from these analyses. In 1990-91, Black and Hispanic high school graduates r elative to the number of Black and Hispanic students enrolled in grade 9 three years e arlier fell to less than 0.50 and this ratio remained just about at or below this level fr om 1992 to 1999 (the corresponding ratio had been about 0.60 in the late 1970s and ear ly 1980s). This finding indicated that only 50% of minority students in Texas have been pr ogressing from grade 9 to high school graduation since the initiation of the TAAS testing program. Subsequent analyses of progression ratios for all the grades indicated that the rates of Texas students being denied promotion from grade 9 to 10 have changed sharply over the last two decades. From 1977 until about 1981 ra tes of grade 9 retention were similar for Black, Hispanic and White students, but since a bout 1982, the rates at which Black and Hispanic students are denied promotion and requ ired to repeat grade 9 have climbed steadily, such that by the late 1990s, nearly 30% o f Black and Hispanic students were "failing" grade 9 and required to repeat that grade This finding led to a third series of ana lyses examining rates of progress from grade 6 and grade 8 to high school graduation. It was fou nd that the rate of progress from grade 6 to high school graduation fell from about 0.75 in 1990 to less than 0.70 for White students and from about 0.65 to 0.55 for minority s tudents. (The rate for minority students started to climb above 0.60 only in 1997, the year in which Texas was forced to raise the passing score on the GED high school equi valency tests). Since all this discussion of rates and ra tios may well obscure what is happening Â– or not happening Â– to large numbers of children in Texas, let us take one last look at the grade enrollment data for Texas. This time I show s imply numbers of students, not ratios or percentages. Figure 8.1 shows progress from grad e 6 to high school graduation 6.5 years later for the Texas high school classes of 19 82 to 1999 simply in terms of numbers of students (that is, total numbers of Black, Hispa nic and White students).
6 of 10 Also shown in this figure is the differen ce, that is the numbers of students who do not make it from grade 6 to high school graduation 6.5 years later. As can be seen, the numbers of children lost between grade 6 and high s chool graduation in Texas were in the range of 50 to 60 thousand for the classes of 1 982 to 1986. The numbers of lost children started to increase for the class of 1987 and jumped too almost 90 thousand for the class of 1991. For the classes of 1992 through 1999, in the range of 75 to 80 thousand children are being lost in each cohort. (For reader s who may have not waded through all of the previous parts of this very long article and simply skipped to this conclusion, it is worth noting that as discussed in Part 7, these est imates are probably conservative, since there has been a net in-migration of people into Te xas in the last two decades. Cumulatively for the classes of 1992 thro ugh 1999, there were about 2.2 million enrolled in grade 6 (in the academic years 1984-85 through 1992-93). The total number graduating from these classes was about 1.5 million In other words, for the graduating classes of 1992 through 1999, around 700,000 childr en in Texas were lost or left behind before graduation from high school. Section 5.4 of the article examined cumul ative rates of grade retention in Texas. These are almost twice as high for Black and Hispan ic students as for White students. The next section (Section 5.5) reports on estimates of dropouts by grade. It was found that most dropouts occur between grade 9 and 10 (ab out 16% of Black and Hispanic students and 8% of White students) but that another 6 to 10 percent dropout after grade 10 and also after grade 11. This portion of the art icle also shows the way in which apparent increases in grade 10 TAAS pass rates tend to disappear, if they are based not on numbers of students taking TAAS in the spring of gr ade 10, but instead on fall grade 9 or even fall grade 10 enrollments. Having been alerted to the fact that some portion of the gains in grade 10 TAAS pass rates were illusory, in Section 5.6 I next sou ght to estimate the numbers of students taking the grade 10 tests who were classified as "i n special education" and hence not counted in schools' accountability ratings. As repo rted in Section 5.6, the numbers of
7 of 10such students nearly doubled between 1994 and 1998. In the closing portion of Part 5, I sough t to estimate what portion of apparent gains in TAAS pass rates might be due to such forms of ex clusion. It was estimated that a substantial portion, but probably less than half of the apparent increases in TAAS pass rates in the 1990's are due to such exclusions. In Part 6 of this article, I sought to su mmarize the views of educators in Texas about TAAS, based on three statewide surveys of edu cators. These surveys were undertaken entirely independently, and surveyed som ewhat different populations of educators. General findings from this review were a s follows: Texas schools are devoting a huge amount of time an d energy preparing students specifically for TAAS. 1. Emphasis on TAAS is hurting more than helping teach ing and learning in Texas schools. 2. Emphasis on TAAS is particularly harmful to atris k students. 3. Emphasis on TAAS contributes to retention in grade and dropping out of school. 4. Survey results indicated that the emphasi s on TAAS is contributing to dropouts from Texas schools not just of students, but also t eachers. In one survey, reading specialists were asked whether they agreed with the following statement: It has also been suggested that the emphasis on TAA S is forcing some of the best teachers to leave teaching because of the rest raints the tests place on decision making and the pressures placed on them an d their students. A total of 85% of respondents agreed with this statement. In another survey, teachers volunteered comments such as the following : "Mandated state TAAS Testing is driving out the best teachers who refuse to resort to teaching to a low level test!" The penultimate portion of this article, Part 7, reviews a variety of additional evidence about education in Texas. Five different s ources of evidence about rates of high school completion are compared and contrasted. In a n effort to reconcile sharp differences apparent in these sources, a review of statistics on numbers of students, in Texas and nationally, taking the Tests of General E ducational Development (GED) was undertaken. People take the GED tests in order, by achieving passing scores, to be awarded high school equivalency degrees. The review of GED statistics indicated tat there was a sharp upturn in numbers of young people taking the GED tests in Texas in the mid-1990s. This finding helps to explain why the TEA statistics on dropouts are misleading. According to TEA accounting procedures, if students leave regular high school programs to go into state-approved GED preparation programs, they are not counted as dropouts. As Greene (1998) observed: [A]n important misleading feature of the [TEA] repo rted drop-out rates is that they exclude students who were transferred to approved alternate programs, including drop-out recovery programs. If the students in these drop-out or other alternative programs subsequently drop out, it is not counted against the district. This is like reportin g death rates at hospitals where you exclude patients transferred to intensive care units. If we put aside the TEA-reported dropout rates as misleading, differences in other sources of evidence on rates of high school complet ion in Texas appear reconcilable.
8 of 10NCES reports (based on CPS surveys) indicate that t he rate of high school completion among young people in Texas in the 1990s was about 80%. This would imply a non-completion (or dropout) rate of 20%. Initially this would appear markedly lower than the non-graduation rate of at least 30% derived fro m my analyses of TEA data on enrollments and graduates. But the CPS surveys coun t as high school completers, those who receive a regular high school diploma and those who receive a GED high school equivalency degree. So it seems clear that a conver gence of evidence indicates that during the 1990s, slightly less than 70% of students in Te xas actually graduated from high school (e.g. 1.5 million/2.2 million = 0.68). This implies that about 1 in 3 students in Texas in the 1990s dropped out of school and did no t graduate from high school. (Some of these dropouts may have received GED equivalency degrees, but as discussed in Part 7, GED certification is by no means equivalent to r egular high school graduation). Section 7.2 examined patterns of retentio n in grade 9 and high school completion among states for which such data are available. Res ults indicated that there is a strong association between high rates of grade 9 retention and low rates of high school completion (specifically, results suggested that fo r every 10 students retained to repeat grade 9, about seven will not complete high school) Part 7.3 examined SAT scores for Texas st udents as compared with national results. Evidence indicates that at least as measur ed by performance on the SAT, the academic learning of secondary school students in T exas has not improved since the early 1990s, at least as compared with SAT-takers nationa lly. Indeed results from 1993 to 1999 on the SAT-M indicate that the learning of Texas st udent has deteriorated relative to students nationally (and this result holds even aft er controlling for percentage of high school graduates taking the SAT). Part 7.4 revisited NAEP results for Texas Results for eight state NAEP assessments conducted between 1990 and 1998 were re viewed. Because of the doubtful meaningfulness of the NAEP achievement levels, NAEP results for Texas and the nation were compared in terms of NAEP test scores. In orde r to compare NAEP results with those from TAAS, the "effect size" metric (from the meta-analysis literature) was employed. This review of NAEP results from the 1990 s, showed that grade 4 and grade 8 students in Texas performed much like students nati onally. On some NAEP assessments Texas students scored above the national average an d on some below. In the two subject areas in which state NAEP assessments were conducte d more than once during the 1990s, there is evidence of modest progress by students in Texas, but it is much like the progress evident for students nationally. Reviewing NAEP res ults for Texas by ethnic group, we see a more mixed picture. In many comparisons, Blac k and Hispanic students show about the same gain in NAEP scores as White students, but the 1998 NAEP reading results indicate that while White grade 4 reading scores in Texas have improved since 1992, those of Black and Hispanic students have not. More generally, however, the magnitudes of the gains apparent on NAEP for Texas fail to con firm the dramatic gains apparent on TAAS. Gains on NAEP in Texas are consistently far l ess than half the size (in standard deviation units) of Texas gains on state NAEP asses sments. These results indicate that the dramatic gains on TAAS during the 1990s are mor e illusory than real. The Texas "miracle" is more hat than cattle. The final portion of the penultimate port ion of this article (Section 7.5) provided a brief review of other evidence concerning the state of education in Texas. Perhaps the most striking portion of this review were results f rom the Texas Academic Skills Program or TASP test during the 1990s. Between 1994 and 1997, TAAS results showed a 20% increase in the percentage of students passing all three ex it level TAAS tests (reading, writing and math). But during the same in terval, TASP results showed a sharp
9 of 10decrease (from 65.2% to 43.3%) in the percentage of student s passing all three parts (reading, math, and writing) of the TASP college re adiness test. 8.3 Testing and Accountability What might be the broader lessons from th e Texas myth for education elsewhere? Surely there are many different ones that might be read into this story (such as the need to be wary of the party line emanating from large bure aucracies, which education in Texas seems to have become; and the importance of compari ng alternative forms of evidence in order to begin to get at the truth about large and complex enterprises). But in closing, I comment briefly on only three of what I view as the broader lessons from the Texas myth story. Aims of Education The Texas myth story surely helps remind us of th e broader aims of education in our society. The dramatic gain s apparent on TAAS in the 1990s are simply not born out by results of other testing pro grams (such as the SAT, NAEP and TASP). But quite apart from test scores, surely one of the main outcomes of pre-collegiate education is how many students finis h and graduate from high school. By this measure of success, surely the Texas system of education in which only two out of three young people in the 1990s actually graduated from high school should not be deemed a success, much less a miracle. Testing and Accountability The TAAS testing program in Texas seems to have been spawned mainly by a yen for holding schools "a ccountable" for student learning. It is an unfortunately common manifestation of what ha s come to be called in the last several decades "outcomes accountability." As sugge sted above, however, quite apart from test scores, surely one of the most important outcomes of public education is how many young people finish schooling and graduate fro m high school. And this reminds us of the broader meaning of the term accountability ( Haney & Raczek, 1994). In its broader meaning the word accountability refers to providing an account or explanation not just of consequences, but of conduct. The Texas myth story, it seems to me, reminds us of how vital it is when judging educational endeavors to r eturn to the root meaning of the word accountability and inquire into conduct as well as consequences. It is of course always possible to come u p with some sort of bureaucratic scheme, as in Texas, for weighing various sorts of data abo ut schools and coming up with some kind of summary judgment about their quality. But a nyone who believes in the rationality of such approaches has forgotten the old paradox of value from the field of economics. The paradox refers to the fact that many obviously useful commodities, such as air and water, have very low if any exchange values, wherea s much less useful ones such as diamonds and gold, have extremely high value. Accor ding to Schumpeter's (1954) History of economic analysis it was recognized as early as the 16th century, b y "scholastic doctors" and natural philosophers that the exchange value or price of commodities derived not from any inherent character istics of the commodities themselves but from their utility or "desiredness" and relativ e scarcity. Without wandering into a digression on the field of economic theory (concern ing which I am an absolute amateur anyway), let me simply mention how this paradox was resolved by Kenneth Arrow. In 1950, Arrow published what has come to be known as his "impossibility theorem," in an article modestly titled "A difficulty in the concep t of social welfare." In this article, Arrow proved mathematically that if there are at le ast three alternatives which members of society are free to order in any way, any social welfare function yielding an ordering based on those preferences violates one of three ra tional conditions (as long as trivial and dictatorial methods of aggregation are excluded). I n short Arrow's "impossibility
10 of 10theorem" extended Pareto's finding about the immeas urability of general social welfare. Hazards of High Stakes Testing More than anything though, the Texas miracle story shows us the hazards of high stakes testing. It is, of course, possible to impose a "whips and chains" test-based accountability system on schools (as Schrag, 2000, described the Texas approach). Yet the Texas miracl e story shows us the need to return standardized testing to its rightful place, as a so urce of potentially useful information to inform human judgment, and not as a cudgel for impl ementing education policy.
1 of 10 Volume 8 Number 41The Texas Miracle in Education Walt Haney NotesA previous version of this paper was presented at t he Annual Meeting of the American Educational Research Association, New Orleans, Apri l, 2000. The normal legal citation for Judge Prado's decisio n in the TAAS case is GI Forum Image De Tejas v. Texas Education Agency, 87 F. Sup p. 667 (W.D.Tex. 2000). However, since this citation only recently became a vailable, in the body of this paper I cite Judge Prado's decision as Prado, 2000. 1. The volume 2, number 2 issue of The Scholar: St. Mary's Law Review on Minority Issues has recently published major portions of the repor ts of eight experts who testified in the TAAS trial, including portions of both my original (Haney, 1998) and supplementary (Haney, 1999) reports concerning the case. 2. I do not know how many schools have been taken over by the state, but I am aware that the TEA took over control of the Wilmer-Hutchi ns district in 1996 because of poor performance (Wertheimer, 1999). 3. At least one independent analyst has found that the equating of TAAS forms has not been successful. In a study commissioned by the Tax Research Association of Houston and Harris County (TRA), Sandra Stotsky ana lyzed TAAS reading tests for 1995 through 1998 and found that the grade 4, 8 and 10 TAAS reading tests for these years and grades were not comparable in diffi culty (see, Stotsky, 1998). 4. In the second report for the TAAS case (Haney 1999) I also applied the 80% rule to results for three different grade 10 TAAS tests (reading writing and math). Writing test results for Blacks and Hispanics have generally not fallen below 80% of the White pass rates, but TAAS math test results consistently have. 5. The latest version of the joint test Standards was issued in 1999, after the TAAS case and my work on it, were under way. Therefore, here I cite both 1985 and 1999 versions of the Standards. Where pertinent, I also document how specific provisions changed between 1985 and 1999. 6. The corresponding 1985 standard read: Standard 8.12 In elementary or secondary education, a decision or characterization that will have a major impact on a test taker should not automatically be made on the basis of a single test score. Other relevant information for the decision should also b e taken into account by the professionals responsible for making the decision. (APA, AERA & NCME, 1985, p. 54) 7. MALDEF attorneys sought to have the Heubert & Hause r report entered as evidence in the TAAS trial, but after attorneys for the state of Texas objected, the judge refused to allow the NRC report entered as ev idence in the case. In a symposium on the GI Forum case at the Annual Confer ence of the Council of Chief State School Officers, Snowbird Utah, June 17 2000, I asked Geoffrey T. Amsel, the lead lawyer for the State of Texas in th e case, why in the world he had sought to have NRC report excluded from evidence in the case. His public response? "I was just trying to be a pain in the as s." 8. The corresponding passages from the 1985 Standards are: Standard 6.9 When a specific cut score is used to select, classify, or certify test takers, the method and 9.
2 of 10rationale for setting that cut score, including any technical analyses, should be presented in a manual or report. When cut scores ar e based primarily on professional judgment, the qualifications of the ju dges also should be documented. (AERA, APA & NCME, 1985, p. 34) And 1985 Standard 2 .10 specifies that "standard errors of measurement should be reported for score levels at or near the cut score" (p. 22).It is worth mentioning that since 1990 considerable literature has been published on methods for setting passing scores on tests (for ex ample, Gregory Cizek, Setting passing scores, Educational Measurement: Issues and Practice Summer 1996, pp. 20-31). However in discussing the setting of passin g scores on TAAS in 1990, it seems reasonable to focus on literature that was pr ominently available before that year. 10. Part 6.2 below provides more explanation on how thi s survey was undertaken. Haney, Myth of the Texas Miracle, v. 4, July 28, 20 00, p. 58. 11. I was able to assemble this data set thanks to the generous assistance of Dr. Ed Rincon of Rincon Associates and Terry Hitchcock of the Texas Education Agency (TEA). 12. When a graph like Figure 5.2 was presented during t he TAAS trial (the same, except that it did not include 1998-99 data), the p attern was sufficiently startling that Judge Prado interjected exactly this question, What happened? 13. The original Table 6.1 in the NRC report contained several printing errors, but a corrected version has been released. 14. It should be explained that the TEA data cited show s slightly different numbers of students taking the three portions of the grade 10 TAAS (reading writing and math) in any given administration. To derive the results shown in Table 5.5, I calculated the number of special education students taking eac h portion of the TAAS in each years and then averaged the numbers and percentages taking each portion. 15. We should acknowledge that this response rate of le ss than 15% was certainly less than ideal. One likely reason for the low response rate is that we were able to mail the survey only one week before the last week of th e 1998-99 Texas school year. One respondent even spontaneously chided us for sen ding a survey that arrived during such a hectic time in the school year. Becau se of this timing we were unable to send follow-up letters to non-respondents. 16. ARD stands for Admission, Review and Dismissal, the name of the Committee in Texas schools that oversees special education desig nations and plans. 17. Recently, thanks to a suggestion of Jeff Rodamar, I have become aware of a set of short papers on the web site of the Texas Public Po licy Foundation (http://www.tppf.org/). One paper, by John Pisciott a summarizes two 1996 surveys of satisfaction and dissatisfaction of teachers in Texas. The report states: "Overall, this report indicates that with all the Texas publi c education reforms of recent years, the environment for Texas professional educa tors has not improved. One key finding is that public school teachers did not gene rally believe the teaching quality in their schools was improving. When asked if quali ty of teaching at their school had improved compared to five years ago, only 39% o f public school teachers said yes. Almost a third believed teaching quality was w orse than five years ago. In contrast, 71% of private school teachers saw their schools as better than five years ago. Social promotion, passing students from one gr ade to the next without adequate academic achievement, was another topic of the surveys. Public school teachers viewed social promotion as a widespread pr oblem. Over half of the public school teachers indicated that social promotion was a problem at their school, 18.
3 of 10compared to 29% of private school teachers. The mos t central question relating to teacher attrition was: Are you seriously considerin g leaving the teaching profession? For public school teachers, 44% said th ey were. Only 28% of the private-school teachers were seriously considering leaving the profession. As the major reason for leaving, private school teachers c ited inadequate financial compensation. Public school teachers cited poor wor king conditions as their major reason for leaving." (http://www.tppf.org/, accesse d 5/7/00). In trying to track down possible sources of discrep ancies in Texas dropout rates, I talked with Phil Kaufman of MPR of Berkeley Califor nia. Among other things he explained that the CPS data gathering began to use computer assisted telephone interviewing in 1994, and hence it is hazardous to compare CPS results from before and after that date. 19. In order to further explore this issue, I consulted with a number of scholars who have previously analyzed CPS, data including Robert Hauser, Phil Kaufman, Richard Murnane, Duncan Chaplin and John Tyler. Wha t I conclude from these consultations is that for a variety of reasons, one needs to be wary of dropout rate estimates based on CPS data. See, for example, Haus er, 1997; Chaplin, 1999. 20. The very next sentence after the passage quoted her e says "Consequently, GED graduates in 1997 and beyond must meet or surpass t he performance of the top twothirds of traditional graduating high school s eniors." Obviously this statement is mistaken. What was meant was that the new GED pa ssing standard raised the minimum scores such that instead of exceeding the p erformance of 25% of the norm group of high school seniors, the new minimum was equal to or surpassed the performance of 33% of the norm group. 21. A minor mystery appeared when it was learned that 1 5 to 20% of GED takers in Texas were only 16 or 17 years of age. GED annual r eports indicate that the minimum age for taking the GED in Texas is 18. So I called the Office of Continuing Education in the Texas Education Agency (512-463-9292, 6/1/00). It was readily explained that people can take the GED in Texas below age 18 if they have a letter from a parent, parole officer. or jud ge. In a personal communication (6/8/00), John Tyler generously told me how to solv e another mystery. GED statistics from the TEA are slightly different than those reported by GEDTS, apparently because TEA tends to report GED statisti cs in terms of GED certificates actually awarded, whereas GEDTS also reported numbe rs who pass the GED tests. 22. The only jurisdiction with a larger drop in its pas sing rate in 1997 was American Samoa, where only 30 people were tested in 1997. 23. To be clear, the new GED passing standard in Texas was more difficult than the pre-1997 Texas standard. It appears to be much lowe r than the passing standard on TAAS. Though I have been unable to locate any studi es comparing the difficulty of the TAAS and GED tests, according to Barasch et al. (1990, p. 9) To be successful in passing the GED in most states, a candidate must get a total minimum standard score of 225 on the five tests, with no score less than 35 on any single test. In general this means that a candidate who answers jus t over half of the questions in each test will get a passing score. As we have see n, in Texas until 1997, people could pass the GED with a total standard score of o nly 200. 24. For more recent evidence on economic returns to ear ning the GED, see Murnane, Willett, & Tyler, 2000. 25. It is worth noting that analyses of grade enrollmen t data in part 5.5 above suggest that Murdock et al.s estimate of the 1-2% annual i n-migration rate for the Texas population appears to hold for the school age popul ation in the 1990s. For example, 26.
4 of 10referring to Table 5.3, if we average the % differe nce between predicted and actual grade enrollments for 1996-97 for grades 2-7 where retention in grade is quite uncommon, we get a little over 1% across the three ethnic groups. Note too that to the extent that Hispanic in-migration is greater th an White in-migration, as Murdock et al., indicate, so too will the HispanicWhite gap in dropout rates be underestimated.The correlation between these two variables is -0.5 1, statistically significant at the 0.05 level, even with the small sample of states fo r which grade 9 retention rates are available. Also, I suspect that for Arizona, th e outlier data point in Figure 7.3, data on high school completion may be unreliable. I f we replace the 77.1 % high school completion rate for 1996-98 for Arizona with the rate of 83.8% that Kaufman et al. (1999) report for Arizona for 1993-9 5, the correlation changes to -0.7. And if we simply delete the Arizona case, the correlation is -0.80. If the Arizona case is deleted, the regression of HS compl etion rate on grade 9 retention rate is HSC = 95.6 0.69G9R (R2 = 0.657) This sugg ests that for every 10 students retained to repeat grade 9, about seven will not co mplete high school. Given this regression equation, the predicted rate of high sch ool completion for Texas would be 83.3, but the actual rate is about three points lower, at 80.2. 27. This finding is particularly significant given that previous research has shown that quantitative test scores are more sensitive to scho ol experiences than are verbal test scores (Haney, Madaus & Lyons, 1993). 28. The exception to this pattern is that since it was decided that a student must achieve at least a 2 score on the written compositi on in order to pass the TAAS writing test, a composition score of "1" plus 27 or more multiple-choice items is truncated to a scale score of 1499, which is one po int below mastery. 29. Rodamar (2000) also has an interesting summary of h ow college-going in Texas has changed between 1994 and 1996 (see exhibit 15, p. 21). 30.ReferencesAPA, & NCME. (1999) Standards for Educational and Psychological Testing Washington, DC: American Psychological Association.Arrow, K. J. (1950). A difficulty in the concept of social welfare. Journal of political economy 58, 328-346. Barasch, S. et al. (1990). ARCO GED High School Equivalency Examination NY: Prentice Hall.Becker, B. J. (1990). Coaching for the Scholastic A ptitude Test: Further synthesis and appraisal. Review of Educational Research, 60 (3), 373-417. Berk, R. (1986). A consumer's guide to setting perf ormance standards on criterion-referenced tests. Review of Educational Research, 56 (1), 137-172. Cameron, S. V. and Heckman, J.J. (1993), "The Noneq uivalence of High School Equivalents," Journal of Labor Economics 11(1) Part 1, 1-47. Cannell, J. J. (1987). Nationally normed elementary achievement testing in America's public schools: How all 50 states are above the nat ional average. Daniels, WV: Friends for Education.
5 of 10Cannell, J. J. (1988). Nationally normed elementary achievement testing in America's public schools: How all 50 states are above the nat ional average. Educational Measurement: Issues and Practice 7 (2), 5-9. Cannell, J. J. (1989). The 'Lake Wobegon' report: How public educators che at on standardized achievement tests. Albuquerque, NM: Friends for Education. Catterall, J. S. (1989). Standards and school dropo uts: A national study of tests required for graduation. American Journal of Education 98 (1), 1-34. Chaplin, D. (1999). GED Policies and High School Co mpletion: Are there Unintended Consequences? (Paper presented at the 1998 annual m eetings of the Association for Public Policy Analysis and Management, New York, N. Y.). Wash. D.C.: The Urban Institute.Cizek, G. (1996) Setting passing scores, Educational Measurement: Issues and Practice (Summer 1996) 20-31.Cohen, J. (1977). Statistical power analysis for the behavioral scien ces (Revised. Ed.). New York: Academic press.Cooper, H. and Hedges, L. V. (1994). The handbook of research synthesis New York: Russell Sage.Daley, B. (1999), Embarassed into success: Texas sc hool experience may hold lessons for Massachusetts. "Boston Globe" (June 10, 1999), p. A 1. Drahozal, E. C., & Frisbie, D. A. (1988). Riverside comments on the Friends for Education report. Educational Measurement: Issues and Practice 7 12-16. Dyer, H. S. (1973). Does Coaching Help? College Board Review 19 331-335. Fassold, M. (1999) Affidavit filed in case of GI Fo rum Image De Tejas v. Texas Education Western District of Texas (Civil Action N o. SA-97-CA-1278-EP), July 13, 1999.Fienberg, S. (1989). The evolving role of statistical assessments as evi dence in courts (Report of the Panel on Statistical Assessments as Evidence in Courts of the National Research Council.) NY: Springer-Verlag.Funkhouser, C. (1990). Education in Texas. Policies, practices and perspec tives (5th Edn.) Scottsdale, AZ: Gorsuch Scarisbrick.GED Testing Service. (1989-1998). Statistical Reports, 1989, 1990, 1991, 1992, 1994, 1996, 1997, 1998 Washington, D.C.: American Council on Education. Gilmore, M. E. (1927). Coaching for intelligence te sts. Journal of Educational and Psychological Measurement 19, 319-330. Glass, G. V (1976). Primary, secondary and meta-ana lysis of research. Educational Researcher, 5 3-8.
6 of 10Glass, G. V. (1978). Standards and criteria. Journal of Educational Measurement 15 237-261.Glass, G.V, McGaw, B. and Smith, M.L. (1981) Meta-analysis in Social Research Beverly Hills, CA: SAGE Publications.Gordon, S. P. and Reese, M. High stakes testing: Wo rth the price? Journal of School Leadership Volume 7 ( July 1997) 345-368. Greene, J. (1998). Why do over half of the public s chool students in big Texas districts fail to graduate? "Houston Chronicle" (October 22, 1998). Greenwald, E, Persky, H. Cambell, J. & Mazzeo, J. ( 1999) NAEP 1998 Writing report card for the nation and the states (NCES report 1999-462). Washington, D.C. National Center for Education Statistics.Grissmer, D. & Flanagan A. (1998). Exploring rapid achievement gains in North Carolina and Texas. Washington, D.C.: National Education Goa ls Panel. (Available at http://www.negp.gov/reports).Haas, N. S., Haladyna, T. M., & Nolen, S. B. (1990, April). Standardized achievement testing: War stories from the trenches. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Bosto n, MA. Haladyna, T. M., Nolen, S. B., & Haas, N. S. (1991) Raising standardized test scores and the origins of test score pollution. Educational Researcher 20 (5), 2-7. Haney, W. (1993). Testing and minorities. In Weis, L. & Fine, M. (Eds). Beyond silence: Class, race and gender in United States schools Albany, NY: State University of New York Press, pp. 45-73.Haney, W. (1998). Preliminary report on Texas Asses sment of Academic Skills Exit Test (TAAS-X). Chestnut Hill, MA: Boston College Center for the Study of Testing Evaluation and Educational Policy. (Portions of thi s report have been published in volume 2, number 2 issue of The Scholar: St. Mary's Law Review on Minority Issu es). Haney, W. (1999). Supplementary report on Texas Ass essment of Academic Skills Exit Test (TAAS-X). Chestnut Hill, MA: Boston College Ce nter for the Study of Testing Evaluation and Educational Policy. (Portions of thi s report have been published in volume 2, number 2 issue of The Scholar: St. Mary's Law Review on Minority Issu es). Madaus, G., West, M., Harmon, M. Lomax, R., and Via tor, K. (1992). The influence of testing on teaching math and science in grades 4 Â– 12. (report of a study funded by the National Science Foundation under grant SPA8954759) Chestnut Hill, MA: Boston College Center for the Study of Testing Evaluation and Educational Policy. Haney, W., Fowler, C., and Wheelock, A. (1999) Less Truth Than Error?: An independent study of the Massachusetts Teacher Test s. Education Policy Analysis Archives Volume 7 Number 4 February 11, 1999. Published on the WWW at: http://epaa.asu.edu. (With Damian Bebell and Nicole Malec).
7 of 10Haney, W., Madaus, G., and Lyons, R. (1993) The Fractured Marketplace for Standardized Testing Boston: Kluwer Academic Publishers. Boston: Kluwe r Academic Publishers.Haney, W. & Raczek, A. (1994). Surmounting outcomes accountability in education. Report prepared for the U.S. Congress Office of Tec hnology Assessment. Hauser, R. M. (1997). Indicators of High School Com pletion and Dropout. Pp. 152-84 in R. M. Hauser, B. V. Brown, and W. Prosser (Eds.) Indicators of Children's Well-Being New York: Russell Sage Foundation.Heubert, J. & Hauser, R. (Eds.) (1999). High Stakes: Testing for Tracking, Promotion and Graduation A Report of the National Research Council, Washin gton, D.C.: National Academy Press.Hodgkinson, H. (1986) Texas: The State and Its Educational System Institute for Educational Leadership.Hoel, P. (1971). Introduction to mathematical statistics NY: Wiley, 1971. Hoffman, J., Pennington, J., Assaf, L. & Paris, S. (1999) High Stakes Testing in Reading and Its Effects on Teachers, Teaching, and Students : Today in Texas, Tomorrow? (Unpublished manuscript). University of Texas at Au stin. Intercultural Development Research Association. (19 86 ). Texas School Survey project: A Summary of Findings San Antonio: TX: Intercultural Development Resear ch Association. (October 31, 1986).Jaynes, G., D. and Williams, R.n. W. (Eds.) (1989). A common destiny: Blacks and American society National Academy Press. Jerald, C. (2000). The state of the states. Education Week (January 13, 2000) pp. 62-163. Kaufman, P., Kwon, J., Klein, S. and Chapman, C. (1 999). Dropout rates in the United States: 1998 (NCES 2000-022). Wash., D.C.: National Center for Education Statistics, p. 20.Kaye, D. H. and Freedman, D. (1994) Reference guide on statistics. Federal Judicial Center.Koretz, D. M. & Barron, S. I. (1998) The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS). (RAND report MR-1014-EDU). Washington, D.C. RAND.Koretz, D. M. & Barron, S. I. (1999) Test-based acc ountability systems: Lessons of Kentucky's experiment. (RAND research brief RB-8017 ) (http://www.rand.org/publications/RB8017).Koretz, D. M., Linn, R. L., Dunbar, S. B., & Shepar d, L. A. (1991, April). The effects of high stakes testing on achievement: Preliminary fin dings about generalization across tests. Paper presented at the Annual Meeting of the Ameri can Educational Research
8 of 10Association, Chicago, IL:Lenke, J. M., & Keene, J. M. (1988). A response to John J. Cannell. Educational Measurement: Issues and Practice, 7, 16-18. Linn, R. L. (1982). Ability testing: Individual dif ferences, prediction and differential prediction. In Wigdor, A. K. & Garner, W. R Ability testing: Uses consequences and controversies. Part II Washington, D.C. National Academy Press. Pp. 335388. Linn, R. L. (2000). Assessments and accountability. Educational Researcher 29:2, 4-15. Linn, R. L., Graue, M. E., and Sanders, N. M. (1989 March). Comparing state and district test results to national norms: Interpreta tions of scoring 'above the national average'. Paper presented at the Annual Meeting of the Ameri can Educational Research Association, San Francisco, CA.Linn, R. L., Graue, M. E., and Sanders, N. M. (1990 ). Comparing state and district test results to national norms: The validity of claims t hat "everyone is above average". Educational Measurement: Issues and Practice, 9 (3), 5-14. Linton, T.H. and Debiec, M. (1992). The secondary l evel TAAS test: An analysis of the 70% mastery level. Paper presented to the Texas Tes ting Conference, Austin, Texas, March 2-4, 1992.Madaus, G.; West, M.M.; Harmon, M.C.; Lomax, R.G.; and Viator, K.A. (1992) The influence of testing on teaching math and science i n grades 4-12. Report of research funded by National Science Foundation Grant SPA8954 759. Chestnut Hill, MA: Center for the Study of Testing, Evaluation and Educationa l Policy. Mullis, I. V. S., Dossey, J. A., Owen, E. H., & Phi llips, G. W. (1993 ). NAEP 1992 Mathematics Report card for the Nation and the Stat es (Report No. 23-ST02). Washington, D.C.: National Center for Education Sta tistics. Murdock, S., Hoque, M. N., Michael, M., White, S., & Pecotte, B.. (1997). The Texas challenge: Population change and the future of Texa s College Station, TX: Texas A&M University Press.Murnane, R. J., J. B. Willett, and J. H. Tyler. (20 00) Who Benefits from Obtaining a GED?: Evidence from High School and Beyond? Review of Economics and Statistics 82:1, 23-37National Education Goals Panel, (November 5, 1998). North Carolina and Texas Recognized as Models for Boosting Student Achieveme nt (press release). Washington, D.C.: National Education Goals Panel.Office of Civil Rights (1999). The use of tests whe n making high-stakes decisions for students: A resource guide for educators and policy -makers. (Draft) Washington, D.C.: U.S. Department of Education.Office of Technology Assessment. (1987). State Educ ational Testing Practices. (Background Paper, December 1987). Washington, D.C. : Office of Technology Assessment.
9 of 10Palmaffy, T. (1998). The Gold Star State: How Texas jumped to the head of the class in elementary school achievement. Policy Review (March-April, 1998). No. 88. (http://www.policyreview.com//mar98/goldstar.html)Phillips, G. W., & Finn, C. E. (1988). The Lake Wob egon effect: A skeleton in the testing closet? Educational Measurement: Issues and Practice 7 10-12. Prado, E. (2000). Order in case of GI Forum Image D e Tejas v. Texas Education Western District of Texas (Civil Action No. SA-97-CA-1278-E P. Filed January 7, 2000. (GI Forum Image De Tejas v. Texas Education Agency, 87 F. Supp. 667 (W.D. Tex. 2000). Qualls-Payne, A. L. (1988). SRA responds to Cannell 's article. Educational Measurement: Issues and Practice 7 21-22. RAND (1999) Test-based accountability systems: Less ons of Kentucky's experiment. (RAND research brief RB-8017) (www.rand.org/publica tions/RB8017). (Base on Koretz, Daniel M. & Barron, Sheila I., 1998).Reese, C. M., Millert, K. E. Mazzeo, J. and Dossey, J. A. NAEP 1996 Mathematics Report Card for the Nation and the States. Washingt on, D.C.: National Center for Education Statistics. (Report available at http://w ww.ed.gov/NCES/naep). Rodamar, J. (2000). Tall Tales? Texas Testing Moves from Pecos to Wobegon. Washington, D.C.: Unpublished draft v.2.1 dated Feb 29, 2000. Schrag, P. (2000). "Too Good to be True ," The American Prospect 11:4 (January 3, 2000). (http://www.prospect.org/archives/V11-4/schr ag.html). Schumpeter, J. A. (1954). History of economic analysis NY: Oxford University press. Shepard, L. (1989, March). Inflated test score gains: Is it old norms or teach ing to the test? Paper presented at the Annual Meeting of the Americ an Educational Research Association San Francisco, CA.Shepard, L. A. (1990). Inflated test score gains: I s the problem old norms or teaching to the test? Educational Measurement: Issues and Practice, 9( 3), 15-22. Shepard, L.A. & Smith, M.L. (1989) Flunking Grades: Research and Policies on Retention. London: The Falmer Press. Stonehill, R. M. (1988). Norm-referenced test gains may be real: A response to John Jacob Cannell. Educational Measurement: Issues and Practice 7 23-24. Stotsky, S. (1998). Analysis of Texas reading tests grades 4, 8 and 10, 1995-1998. (report prepared for the Tax Research Association). Available: http://www.educationnews.org/analysis_of_the_texas_ reading_te.htm (August 1, 2000) Taxpayer Research Association (1999) TRA OVERVIEW, TRA Reading Tests. (http://tra.org. Accessed September 11, 1999).Texas House of Representatives House Research Organ ization (1999). The Dropout Data
10 of 10Debate Austin TX: Texas House of Representatives. (http: www.capitol.state.tx.us/hrofr). Texas Education Agency (1997). Texas Student Assess ment ProgramTechnical Digest for the Acadmeic year 1996-97. Austin, TX: TEA.Thorndike, Robert and Hagen, Elizabeth (1977 ). Measurement and evaluation in psychology and education (4th Edition). New York: Wiley. University of Houston. (1998). Freshman admission r equirements. (http://www.uh.edu/enroll/admis/freshman_req.html ) November 3, 1998. Wertheimer, Linda (1999) Inquiry into tampering nam es 4 local districts. Dallas Morning News February 18, 1999. Williams, P. L. (1988). The time-bound nature of no rms: Understandings and misunderstandings. Educational Measurement: Issues and Practice 7 18-21. Willingham, W., Lewis, C., Morgan, R., & Ramist, L. (1990) Predicting college grades: An analysis of institutional trends over two decade s Princeton NJ: Educational Testing Service.Winglee, M., Marker, D., Henderson, A. Young, B. A. & Hoffman, L. (2000 ). A recommended approach to providing high school dropo ut and completion rates at the state level (NCES 2000-305). Washington, D.C.: National Cente r for Education StatisticsWolf, F. M. (1986). Meta-analysis: Quantitative methods for research sy nthesis Newbury Park, CA: Sage.
1 of 1 Volume 8 Number 41The Texas Miracle in Education Walt Haney The Myth of the Texas Miracle in Education: Appendi ces Appendix 1: Testing & Teaching Survey Form Appendix 2: Testing & Teaching Survey Comments Appendix 3: Judge Prado Decision in GI Forum Case Appendix 4: Plaintiffs' Post Trial Brief Appendix 5: Defendants Response to Plaintiff's Post Trial Brief Appendix 6: Summary Comment on Judge Prado's Decisi on in the GI Forum Case Appendix 7: Texas Enrollments by Grade and Race (19 75-1999) Appendix 8: Minutes of the Texas State Board of Edu cation Meeting, July 1990 Appendix 9: Responses to Two Questions Survey