xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c20019999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00205
Educational policy analysis archives.
n Vol. 9, no. 8 (March 19, 2001).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c March 19, 2001
Effects of vouchers on school improvement : another look at the Florida data / Haggai Kupermintz.
Arizona State University.
University of South Florida.
t Education Policy Analysis Archives (EPAA)
xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 9issue 8series Year mods:caption 20012001Month March3Day 1919mods:originInfo mods:dateIssued iso8601 2001-03-19
1 of 13 Education Policy Analysis Archives Volume 9 Number 8March 19, 2001ISSN 1068-2341 A peer-reviewed scholarly journal Editor: Gene V Glass, College of Education Arizona State University Copyright 2001, the EDUCATION POLICY ANALYSIS ARCHIVES. Permission is hereby granted to copy any article if EPAA is credited and copies are not sold. Articles appearing in EPAA are abstracted in the Current Index to Journals in Education by the ERIC Clearinghouse on Assessment and Evaluation and are permanently archived in Resources in Education .The Effects of Vouchers on School Improvement: Another Look at the Florida Data Haggai KupermintzUniversity of Colorado at Boulder Related article. Abstract This report re-analyzes test score data from Florid a public schools. In response to a recent report from the Manhattan Inst itute, it offers a different perspective and an alternative explanatio n for the pattern of test score improvements among low scoring schools in Flo rida. Introduction A recent report from the Manhattan Institu te think tank (Greene, 2001) examined test scores of Florida public schools in 1999 and 2 000 to determine the effects of vouchers on student performance. The report ends wi th a conclusion: Â“The most plausible interpretation of the evidence is that th e Florida A-Plus system relies upon a valid system of testing and produces the desired in centives to failing schools to improve their performance.Â” My own analyses of the Florida data lead to no such conclusion. Instead, I found the evidence telling a more intere sting, and to my mind a more
2 of 13believable, story. I will argue that the evidence s uggests that the Â“voucher effectÂ” follows different patterns in the three tested subj ect areas: reading, math, and writing. Moreover, I will show that the most dramatic improv ements in failing schools were realized by targeting and achieving a minimum Â“pass ingÂ” score on the writing test, thereby escaping the threat of losing their student s to vouchers. Background The Florida A-Plus school accountability p rogram is based on tracking schools' performance and progress toward the educational goa ls set in the Sunshine State Standards. The main source of information on school performance is a series of standardized test in reading, math, and writing, kn own collectively by the somewhat redundant name FCAT (Florida Comprehensive Assessme nt Tests). All elementary, middle, and high school students are tested annuall y (different subjects in different grades) and the results are used to assign a grade to each school, from A to F, according to a formula that weighs the number of students per forming below and above pre-defined markers along the test score scales. An F grade assignment has a variety of consequences and a great deal of attention is direc ted toward F schools in the Florida system. One of the most visible and politically co ntested consequences of failing the State's tests is the voucher provision. If a school receive d another F grade in a four-year period, its students become eligible to take their public f unding elsewhere to a private or better-performing public school. In 1999, 78 school s have received an F grade. Greene's report examines the gains these schools made on the FCAT between 1999 and 2000, and the executive summary offers a prcis of the eviden ce: Â“The results show that schools receiving a failing gradeÂ…achieved test score gains more than twice as large as those achieved by other schools. While schools with lower previous test scores across all state-assigned grades improved their test scores, s chools with failing grades that faced the prospects of vouchers exhibited especially larg e gainsÂ” (Greene, 2001, p. ii). The report itself compares the average score gains of h igher-scoring F schools to lower-scoring D schools, serving as a control group Standardized group differences constitute Greene's estimated effect sizes of the Â“ voucher effectÂ”Â—0.12 in reading, 0.30 in math, and 0.41 in writing. Other analyses in the report calculate the correlations between FCAT and other standardized test administer ed in Florida schools, to gauge the validity of the FCAT. These findings lead Greene not only to the conclusions cited above, but also to strong public commentary in the local and national press in favor of Florida's voucher system and similar proposals in President Bush's sc hool reform plan. The moderate Â“voucher effectÂ” estimates and relatively cautious language of the report were replaced in the media by strong statements, emphasizing the magnitude of the raw score gains achieved by F schools. In an interview to the St. P etersburg Times (February 16, 2001), after the release of his report, Greene asserted: The F schools showed tremendous gains because they faced a particularly concrete outcome that they wished to avoid: embarrassment, loss of revenue, vouchersÂ”. Even mor e boldly, generalizing from the Florida findings, Greene offered the following proc lamation in a guest commentary in The New York Post (February 21, 2001): Â“So the impr ovement by Florida's failing schools was real. So, as debate proceeds over Presi dent Bush's education proposals, know this: Testing, accountability and choice are p owerful tools to improve education and, in particular, to turn around chronically fail ing schools. That's not a theory, but proven fact.Â”
3 of 13 My re-analyses of the Florida data suggest that Greene might have over-stated the case for the simple explanation he promoted in his report and in the press. A more careful examination of the patterns of gains reveal s that failing schools responded with a more sophisticated strategy than the undifferentiat ed, gross Â“voucher effectÂ” gave them credit for. The key element of the strategy was to achieve a particular score on the writing test, in order to elevate their grades. The strategy was extremely successful and all failing schools were able to escape the threat of vouchers by achieving a grade of D or better in 2000.Data The data for the analyses are school mean scores on the FCAT reading, math, and writing tests from 1999 and 2000. They include all curriculum groups in both years (available on-line from the Florida Department of E ducation web site: http://www.firn.edu/doe/sas/fcat.htm). These data a re slightly different from the data Greene used in his analyses, but as he comments (Gr eene, 2001, Note 10), the difference is inconsequential and similar conclusions will be reached using either dataset. The analyses below address issues that Greene either pa id no attention to in his report or dismissed as unimportant. The first example of the latter is regression toward the mean.An elusive regression artifact On page 10 of his report, Greene alerts h is readers to the potential biasing affect of regression to the mean: As another alternative explanation critics might su ggest that F schools experienced larger improvements in FCAT scores beca use of a phenomenon known as regression to the mean. There may be a sta tistical tendency of very high and very low-scoring schools to report fu ture scores that return to being closer to the average for the whole populatio n. This tendency is created by non-random error in the test scores, whi ch can be especially problematic when scores are "bumping" against the t op or bottom of the scale for measuring results. If a school has a scor e of 2 on a scale from 0 to 100, it is hard for students to do worse by chance but easier for them to do better by chance. Low-scoring schools that are near the bottom of the scale are very likely to improve, even if it is only a st atistical fluke. He then dismisses the threat because "the scores of those [F] schools were nowhere near the bottoms of the scale of possible s cores" (p. 10). Greene seems to confuse regression toward the mean with floor and c eiling effectsÂ–completely different phenomena. Scores "'bumping' against the top or bot tom of the scale" colorfully characterizes ceiling and floor effects but is an i nadequate description of the regression effect. Regression toward the mean operates wheneve r the correlation between two variables (the 1999 and 2000 test scores, in our ca se) is less than perfect. It influences the entire range of scoresÂ—not just the very extrem eÂ—with a force proportional to their distance from the sample mean. Therefore, the fact that F schools where far from the bottom of the score scale is a poor indication that regression effects are absent. The two relevant pieces of information are how far the grou p is from the sample mean and the magnitude of the correlation between the two variab les involved. Knowing these two
4 of 13 quantities allows us to forecast the expected magni tude of the pull toward the sample mean. Using standardize scores aids interpretation, as the predicted standardized Y equals Zy = rZx ( X and Y are the 1999 and 2000 test scores, respectively). For example, a school 2 standard deviation below the mean in 1999 will be expected to score only .85(2) = 1.7 standard deviations below the mean in 2000, assuming a correlation of .85 (a value compatible with the typical correlation is the Florida data)Â—an effect size of .3! In 1999, F schools were 1.9 SD s below the mean in reading, 1.7 SD s below the mean in math, and 1.8 SD s below the mean in writing. This simple analysis s hows that the excepted magnitude of the regression effect warrant s serious attention. Using a slightly more complicated formula (see, e.g., Campbell & Kenny, 1999, p. 28, Table 2.1), and the regression coefficient inst ead of the correlation, one can calculate the expected 2000 score or the expected score gain, given a particular level of performance in 1999. Table 1 gives the expected sco re gains, if regression toward the mean was the only factor responsible for these gain s, for the three FCAT tests, alongside with the observed gains for schools with different grades in 1999 [Note 1]. Figure 1 shows the same findings graphically.Table 1 Predicted and Observed Gains By School Grade Reading Math Writing Grade Observed Predicted Observed Predicted Observe d Predicted A -.68-2.29 8.62 6.11 .24 .27 B 2.24-1.01 6.85 6.65 .27 .29 C .15 1.13 7.83 8.47 .29 .30 D 4.37 5.1210.47 10.90 .33 .33 F 11.64 7.8119.18 12.42 .67 .37Figure 1. Predicted and Observed Gains By School Gr ade
5 of 13
6 of 13 Figure 1 portrays an interesting picture. The height of each red dot represents the observed gain in scores between the 1999 and 2000 a dministrations of the FCAT. The blue dots represent the predicted gains attributed to the regression effect, and the distance between the red and blue dots, connected by a dashe d line, depicts the "residual gain"Â—the amount of gain left after the regression effect has been accounted for. From Figure 1 we learn that a substantial portion (67% i n reading, 64% in math, and 55% in writing [Note 2]) of the observed gains among F sch ools is due to regression to the mean. Note also that F schools do not appear exceptional and their residual gains are comparable to those observed in B schools, for exam ple. These schools, however, start to stand out when we examine the patterns in math and even more so in writing. These observations agree with the order of effect sizes r eported by Greene in Table 3 of his report. Unfortunately, Greene stops here to conclud e: "a voucher effect." But the story has just begun to unfold. Within-group patterns We now direct our attention to the patter ns of change within each group of schools designated by the same grade. In his second respons e to the potential regression threat, Greene suggested that "if the improvements made by f schools were concentrated among those F schools with the lowest previous scores, th en we might worry that the improvements were more of an indication of regressi on to the mean (or bouncing against the bottom) than an indication of the desire to avo id having vouchers offered in failing schools". Curiously, while Greene argues for this s trategy he never conducts the analysis. Instead, he presents in Table 5 residual gains that already take the regression effect into account. Even then he ignores the large difference between lower and higher scoring F schools in writing. Ironically, this difference is 0.16, exactly equal to the "voucher effect" in writing! Moreover, the same rationale for using residual gains here should apply with equal force for the gains reported elsewhere in Gre ene's report. The basic logic remains the same between tables.
7 of 13Figure 2. Observed Gains by Initial Status and Scho ol Grade Figure 2 might cause us to worry, as Gree ne was right to point out. The red dots are the average gains made by the lower scoring schools (below the group median [Note 3]) and the blue dots the average gains made by higher scoring schools (above the group median) in each grade group. While the differences between gains of lower and higher scoring schools are constant across grade groups fo r reading, they increase substantially as grades get lower for math. For writing only, D a nd F schools show within-group differences, and these are more pronounced among F schools. In fact, the difference between higher and lower scoring F schools in writi ng is 0.23 representing an effect size of 0.23/0.39 = 0.6, substantially larger than the l argest voucher effect Greene reports (an effect size of 0.41 in writing, see Table 3 in Gree ne's report)!
8 of 13 The within group analysis needs to be ref ined further as we change lens to zoom in on the details of patterns of gains within the diff erent grade groups. Figure 3 shows the scatter plots of the 1999 and 2000 scores with the linear fits superimposed and depicting the overall trends in the data. Table 2 complements the graphs by giving the standardized regression coefficients corresponding to the trend lines.Table 2 Standardized Regression Coefficients of Gains Predi cted from 1999 ScoresGrade Reading Math Writing A -0.23-0.09 0.07 B -0.26-0.14 0.01 C -0.27-0.20 0.02 D -0.28-0.19-0.39 F -0.28-0.26-0.54Figure 3. Gains as a Function of 1999 Scores by Sch ool Grade
9 of 13 The reading scores behave as expectedÂ—a m oderate negative correlation in all grade groups between the score achieved in 1999 and the gain realized one year later. Consistent with the patterns we identified in the c ruder comparisons of Figure 2, the link between prior scores and gains becomes stronger as grades go down, a pattern most pronounced in writing. The findings for writing are striking. The amount of gain in F schools, and to a lesser extent D schools, is stron gly determined by how low their scores were in 1999; the standardized regression coefficie nt is -0.54, representing the effect size of the mean gain difference for schools that scored one standard deviation apart from each other in 1999 (closely resembling the effect s ize value for lower and higher scoring F schools we calculated before). This pattern is co mpletely absent for A, B, and C schools, whose 1999 scores provide no information o n their expected gain.
10 of 13The writing on the wall The seemingly curious pattern of gains fo r writing has, in fact, a simple explanation. If there was a clear mark on the writi ng score scale that D and F schools set up to reach, not more nor less, then lower scoring schools would have to close a wider gap to reach the mark, giving rise to a strong nega tive correlation between where they started and how far they had to go (their gain). Fi gure 4 clearly demonstrates this phenomenon. It shows, for the entire school populat ion, the relationships between 1999 scores and 2000 mean scores and gains. The lines re present the best fitted nonlinear trend lines (using the "loess" technique, see Chamb ers & Hastie, 1991, pp. 309-376).Figure 4. Writing 2000 Scores and Gains as a Functi on of 1999 Scores
11 of 13 ReferencesCampbell, D. T., & Kenny, D. A. (1999). A primer of regression artifacts New York: Guilford Press.Chambers, J. M. and T. J. Hastie, Eds. (1991). Statistical models in S Pacific Grove, CA: Wadsworth & Brooks/Cole.Cronbach, L.J. and Associates. (1980). Toward reform of program evaluation. San Francisco CA: Jossey-BassGreene, J. P. (2001). An Evaluation of the Florida A-Plus Accountability and School Choice Program New York: The Manhattan Institute.About the AuthorHaggai Kupermintz School of EducationUniversity of Colorado at BoulderEmail: email@example.comHaggai Kupermintz is an Assistant Professor of rese arch and evaluation methodology at the University Colorado at Boulder, School of Educa tion. His specializations are educational measurement, statistics, and research m ethodology. His current work examines the structure, implementation, and effects of large-scale educational accountability systems.Copyright 2001 by the Education Policy Analysis ArchivesThe World Wide Web address for the Education Policy Analysis Archives is epaa.asu.edu General questions about appropriateness of topics o r particular articles may be addressed to the Editor, Gene V Glass, firstname.lastname@example.org or reach him at College of Education, Arizona State University, Tempe, AZ 8 5287-0211. (602-965-9644). The Commentary Editor is Casey D. C obb: email@example.com .EPAA Editorial Board Michael W. Apple University of Wisconsin Greg Camilli Rutgers University John Covaleskie Northern Michigan University Alan Davis University of Colorado, Denver Sherman Dorn University of South Florida Mark E. Fetler California Commission on Teacher Credentialing
12 of 13 Richard Garlikov firstname.lastname@example.org Thomas F. Green Syracuse University Alison I. Griffith York University Arlen Gullickson Western Michigan University Ernest R. House University of Colorado Aimee Howley Ohio University Craig B. Howley Appalachia Educational Laboratory William Hunter University of Calgary Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Green Mountain College Dewayne Matthews Western Interstate Commission for HigherEducation William McInerney Purdue University Mary McKeown-Moak MGT of America (Austin, TX) Les McLean University of Toronto Susan Bobbitt Nolen University of Washington Anne L. Pemberton email@example.com Hugh G. Petrie SUNY Buffalo Richard C. Richardson New York University Anthony G. Rud Jr. Purdue University Dennis Sayers Ann Leavenworth Centerfor Accelerated Learning Jay D. Scribner University of Texas at Austin Michael Scriven firstname.lastname@example.org Robert E. Stake University of IllinoisÂ—UC Robert Stonehill U.S. Department of Education David D. Williams Brigham Young UniversityEPAA Spanish Language Editorial BoardAssociate Editor for Spanish Language Roberto Rodrguez Gmez Universidad Nacional Autnoma de Mxico email@example.com Adrin Acosta (Mxico) Universidad de Guadalajaraadrianacosta@compuserve.com J. Flix Angulo Rasco (Spain) Universidad de Cdizfelix.firstname.lastname@example.org Teresa Bracho (Mxico) Centro de Investigacin y DocenciaEconmica-CIDEbracho dis1.cide.mx Alejandro Canales (Mxico) Universidad Nacional Autnoma deMxicocanalesa@servidor.unam.mx Ursula Casanova (U.S.A.) Arizona State Universitycasanova@asu.edu Jos Contreras Domingo Universitat de Barcelona Jose.Contreras@doe.d5.ub.es
13 of 13 Erwin Epstein (U.S.A.) Loyola University of ChicagoEepstein@luc.edu Josu Gonzlez (U.S.A.) Arizona State Universityjosue@asu.edu Rollin Kent (Mxico)Departamento de InvestigacinEducativa-DIE/CINVESTAVrkent@gemtel.com.mx email@example.com Mara Beatriz Luce (Brazil)Universidad Federal de Rio Grande do Sul-UFRGSlucemb@orion.ufrgs.brJavier Mendoza Rojas (Mxico)Universidad Nacional Autnoma deMxicojaviermr@servidor.unam.mxMarcela Mollis (Argentina)Universidad de Buenos Airesmmollis@filo.uba.ar Humberto Muoz Garca (Mxico) Universidad Nacional Autnoma deMxicohumberto@servidor.unam.mxAngel Ignacio Prez Gmez (Spain)Universidad de Mlagaaiperez@uma.es Daniel Schugurensky (Argentina-Canad)OISE/UT, Canadadschugurensky@oise.utoronto.ca Simon Schwartzman (Brazil)Fundao Instituto Brasileiro e Geografiae Estatstica firstname.lastname@example.org Jurjo Torres Santom (Spain)Universidad de A Coruajurjo@udc.es Carlos Alberto Torres (U.S.A.)University of California, Los Angelestorres@gseisucla.edu