USF Libraries
USF Digital Collections

Educational policy analysis archives


Material Information

Educational policy analysis archives
Physical Description:
Arizona State University
University of South Florida
Arizona State University
University of South Florida.
Place of Publication:
Tempe, Ariz
Tampa, Fla
Publication Date:


Subjects / Keywords:
Education -- Research -- Periodicals   ( lcsh )
non-fiction   ( marcgt )
serial   ( sobekcm )

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
usfldc doi - E11-00272
usfldc handle - e11.272
System ID:

This item is only available as the following downloads:

Full Text
xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 10issue 24series Year mods:caption 20022002Month May5Day 66mods:originInfo mods:dateIssued iso8601 2002-05-06

xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c20029999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00272
0 245
Educational policy analysis archives.
n Vol. 10, no. 24 (May 06, 2002).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c May 06, 2002
Lake Woebeguaranteed : misuse of test scores in Massachusetts, part I / Walt Haney.
x Research
v Periodicals.
2 710
Arizona State University.
University of South Florida.
1 773
t Education Policy Analysis Archives (EPAA)
4 856


1 of 20 Education Policy Analysis Archives Volume 10 Number 24May 6, 2002ISSN 1068-2341 A peer-reviewed scholarly journal Editor: Gene V Glass College of Education Arizona State University Copyright 2002, the EDUCATION POLICY ANALYSIS ARCHIVES Permission is hereby granted to copy any article if EPAA is credited and copies are not sold. Articles appearing in EPAA are abstracted in the Current Index to Journals in Education by the ERIC Clearinghouse on Assessment and Evaluation and are permanently archived in Resources in Education .Lake Woebeguaranteed: Misuse of Test Scores in Massachusetts, Part I Walt Haney 1 Boston CollegeCitation: Haney, W. (2002, May 6). Lake Woebeguaran teed: Misuse of test scores in Massachusetts, Part I. Education Policy Analysis Archives 10 (24). Retrieved [date] from of test results in Massachusetts largely gua rantees woes for both students and schools. Analysis of annual test score averages for close to 1000 Massachusetts schools for four years (1998–200 1) shows that test score gains in one testing period tend to be follow ed by losses in the next. School averages are especially volatile in re latively small schools (with less than 150 students tested per grade). One of the reasons why scores fluctuate is that the Massachusetts state te st has been developed using norm-referenced test construction procedures so that items which all students tend to answer correctly (or incorrect ly) are excluded from operational versions of the test. This article conc ludes with a summary of other reasons why results from state tests, like th at in Massachusetts, ought not be used in isolation to make high-stakes decisions about


2 of 20students or schools. Lake Wobegon is the mythical town in Minnesota popu larized by Garrison Keillor in his National Public Radio program "A Prairie Home Compa nion." It is the town where "all the children are above average" (and "all the women are strong, and all the men, good-looking"). In the late 1980s, it became appare nt that Lake Wobegon had come to schools nationwide. For according to a 1987 report by John Cannell, the vast majority of school districts and all states were scoring above average on nationally normed standardized tests (Cannell, 1987). Since it is log ically impossible for all of any population to be above average on a single measure, it was clear that something was amiss, that something about nationally normed stand ardized tests or their use had been leading to false inferences about the test scores o f students in the nation's schools. As a result, people came to refer to inflated test resul ts as the Lake Wobegon phenomenon. I do not try here to recap the story of Cannell's wor k on the Lake Wobegon phenomenon and how independent researchers came to verify the phenomenon. (The story is recounted in chapter 7 of Haney, Madaus & Lyons, 19 93, for anyone interested). Rather, my purpose is to introduce a place consider ably east of Lake Wobegon; namely, Lake Woebeguaranteed. In this place, the use of sta te test results in isolation to make important decisions about schools and students pret ty well guarantees woes will follow. For as I will explain, such uses of results from wh at is essentially a norm-referenced test constitute ill-conceived mis uses of test results. Before proceeding to this lar ger story, I recap how the work reported here evolved.After reading Kane & Staiger (2001), and Bolon (200 1), I undertook an analysis of school average scores on the Massachusetts Comprehe nsive Assessment System (MCAS) grade 4 mathematics tests for 1998, 1999, 20 00 and 2001. After summarizing these previous works, I describe the sources of dat a used in the present analysis, the means by which data were merged from different sour ces, the analyses undertaken, and the results. The latter confirm the findings by Kan e & Staiger (2001) and Bolon (2001); namely, that changes in school average test scores from one year to the next are unreliable indicators of school quality. Next I dis cuss three reasons this is so, and why misuse of results of the Massachusetts state test v irtually guarantees woes for schools and students. 2 BackgroundThe works that prompted the analyses reported here were Kane & Staiger (2001), and Bolon (2001). The first of these works focused on s tate test results in North Carolina. North Carolina has an extensive system of testing s tudents, not just with state "competency" tests in grades 3—11, but also with no rm-referenced tests in grades 5 and 8. (CCSO, 1998, pp. 19, 21, 22, 24). Students must pass state competency tests in reading and math to graduate from high school. Scho ols in North Carolina are publicly rated in terms of student test results. However, th e paper by Kane and Staiger (2001) from the National Bureau of Economic Research (http :// shows how misleading these ratings tend to be.Kane and Staiger analyzed six years worth of studen t assessment data from the entire


3 of 20state of North Carolina (for nearly 300,000 student s in grades 3 through 5). They showed that, regardless of whether results were analyzed i n terms of annual results or year to year changes, the test results are mainly random no ise—resulting from the particular samples of students who are in tested grades in par ticular years, and the vagaries of annual test content and administration—not meaningf ul indication of school quality. Kane and Staiger concluded with the following four "lessons": Incentives targeted at schools with test scores at either extreme–rewards for those with very high scores or sanctions for those with v ery low scores–primarily affect small schools and imply very weak incentives for la rge schools. 1. Incentive systems establishing separate thresholds for each racial/ethnic subgroup present a disadvantage to racially integrated schoo ls. In fact, they can generate perverse incentives for districts to segregate thei r students. 2. As a tool for identifying best practice or fastest improvement, annual test scores are generally quite unreliable. There are more effi cient ways to pool information across schools and across years to identify those s chools that are worth emulating. 3. When evaluating the impact of policies on changes i n test scores over time, one must take into account the fluctuations in test sco res that are likely to occur naturally. (Kane & Staiger, April 2001). 4. The second work prompting the analyses reported bel ow is Bolon's "Significance of test-based ratings for metropolitan Boston schools" (2001, in Education Policy Analysis Archives v9n42/. Also, see Miche lson, 2002; Willson & Kellow, 2002 and Bolon, 2002 for discussion of the orginal Bolon article). In this study Bolon examined 1998, 1999, and 2000 MCAS mathematic s scores for 47 academic high schools in 32 metropolitan communities in the great er Boston area (vocational high schools were excluded on the grounds that they have a substantially different mission than academic high schools). Bolon found that schoo l average grade 10 MCAS math scores generally changed little over this interval (+1.3 points from 1998 to 1999; and +5.9 points from 1999 to 2000) relative to the rang e in school average scores (in 1999, for example, school averages ranged from 203 to 254 on the MCAS scale of 200 to 280.) Bolon does note, however, that according to d ata released by the Massachusetts Department of Education, between 1998 and 2000 grad e 10 MCAS math scores rose substantially more than English or science scores ( see Bolon's Table 1-1). Bolon then examined the extent to which seven schoo l characteristics, plus community income (1989), might be used to predict school aver age grade 10 MCAS math scores. He found that three variables (percent Asian or Pac ific Islander, percent limited English proficiency, and per-capita community income) were the only ones statistically significantly related to school average scores (Tab le 2-12). Together these three variables accounted for 80% of the variance in scho ol average scores. After excluding schools in Boston (for which separate community inc ome data were not available), Bolon found that "by far the strongest factor in pr edicting tenth grade MCAS mathematics scores is 'per capita community income (1989).' For the schools outside the City of Boston, this factor alone performed nearly as well as all available factors combined, associating 84 percent of the variance co mpared with 88 percent when all available factors were used."


4 of 20The study reported here builds on both of the works just discussed. For example, an analytical approach applied by Kane and Staiger to data from North Carolina, namely comparing school size with changes in annual score averages, is employed here. Additionally, while Bolon examined average MCAS sco res for Massachusetts high schools, this inquiry addresses MCAS averages for e lementary schools. There are three broad reasons why elementary school average test scores might be more useful indicators of school quality than test avera ges for high schools. First is the simple fact that there are more elementary schools than hi gh schools. In his study, Bolon analyzed test scores for less than 50 high schools. In contrast, MCAS scores are available for around 1000 elementary schools in Mas sachusetts. A larger sample offers greater potential to discern meaningful differences in school quality. The second reason for hypothesizing that grade 4 te st scores may be better indicators of school quality than grade 10 test scores is the ext ent of institutional experience that they may reflect. Children typically enter school in Mas sachusetts in kindergarten. This means that by spring of grade 4, they have almost f ive years of education in a particular elementary school (presuming, of course, they did n ot switch schools). In contrast, grade 10 test scores typically reflect just two years' ex perience in high school. So on this count, grade 4 test score averages clearly have more poten tial to reflect differences in school quality than grade 10 score averages.The third reason for thinking that grade 4 test sco res may be better indicators of school quality than grade 10 test scores is that by grade 10 (roughly age 16) individuals' standardized test scores have become relatively fix ed, whereas test scores of young children are relatively malleable. This may be illu strated by reference to Benjamin Bloom's classic (1964) work, Stability and Change in Human Characteristics. In this book, Bloom reviewed a wide range of evidence on ho w a number of human characteristics, including height, weight and test scores, tend to change as people age. He showed, for example, that height in the early ch ildhood years tends to be a moderately good predictor of height at maturity, wi th correlations between height at ages 6 10 years and height at age 18 falling in the ra nge of 0.75 to 0.85 for both males and females. Interestingly, height at ages 11-13 for fe males and 13-15 for males is a less good predictor of height at maturity. This is, of c ourse, due to variation in the ages at which children experience growth spurts as they go through puberty. In contrast to the physical characteristic of heigh t, mental abilities of young children as measured by standardized tests show relatively litt le power to predict mental abilities at maturity. Not until around grade 3 or 4 (or age 8 – 9) do children's test scores become relatively reliable predictors of future performanc e. To provide one example, reading test scores at age 6 (or grade 1) correlate with re ading test scores in grade 8 only about 0.65 (Bloom, 1964, p. 98). As Bloom himself put it, "We may conclude from our results on general achievement, reading comprehension and v ocabulary development that by age 9 (grade 3) . 50% of the general achievemen t pattern at age 18 (grade 12) has been developed" (Bloom, 1964, p. 105). The relative mall eability of young children's test scores suggests that there may be more potential fo r grade 4 test scores to be affected by school quality, as compared with high schools' effe cts on grade 10 test scores. In sum, while Bolon found that school average score s on the Massachusetts' grade 10 state test (MCAS) were not sound indicators of scho ols quality, there are several reasons


5 of 20for hypothesizing that school average scores for gr ade 4 might be better indicators of school quality. To test this possibility, the data and analyses described below were employed.Data SourcesThe data used in this study were drawn from four so urces. MCAS results for 1998, 1999 and 2000 were drawn from CD data disks issued by th e Massachusetts Department of Education entitled "School, District and State MCAS Results, Grades 4, 8 and10, Tests of May 1998," "School, District and State MCAS Resu lts, Grades 4, 8 and10, Tests of May 1999," and "School, District and State MCAS Res ults, Grades 4, 8 and10, Tests of Spring 2000." The MCAS results for 2001 were drawn from an Excel file named "MCAS2001pub_g4sch01.xls" downloaded from http://bo on November 9, 2001. The files from these four sources contain MCAS results for all schools and districts in Massachusetts for 1998, 1999, 2000 and 2001. From these results, grade 4 MCAS math averages were extracted for all schools i n Massachusetts. Math rather than English Language Arts (ELA) test s cores were selected for study for two reasons. First, it is reasonably well-establish ed that schools have more influence on math test scores than on English (or at least readi ng) test scores (Haney, Madaus & Lyons, 1993). Second, it is apparent that there hav e been a number of problems in past years in the scaling of MCAS grade 4 ELA scores.The numbers of records for which MCAS grade 4 avera ge results are available from each of the sources mentioned above are as follows: Year No. of Records1998 13361999 13552000 13662001 1049 The reason for more records in 1999 and 2000 than i n 1998 is the creation of a number of new elementary schools (mostly charter schools). The file for 2001 is smaller than those for previous years because it included only s chool average, but not district average scores. Merging records from these four data files proved more difficult than anticipated. Labels for some variables were changed across the y ears and names for some schools are reported inconsistently in these four sets of data. Nonetheless after examining pairs of records for 1998, 1999, 2000 and 2001, I was able t o create a merged data file of MCAS grade 4 math results (and numbers of students teste d) for 1998-2001. A copy of this data file is appended to this article for anyone interes ted in secondary analysis ( see Appendix ). The merged data file of grade 4 MCAS math school av erages, after deletion of district averages, contained records for 977 schools. Table 1 shows summary descriptive statistics for this data set. As can be seen, the n umbers of fourth graders tested per school in these three years ranged from just 10 to 328. The school average MCAS scores ranged from a low of 206 in 1998 to a high of 263 i n 2001. Over the four years of MCAS testing, on average, there were initially slig ht increases in average MCAS


6 of 20 scores—a 1.5 point increase, on average, between 19 98 and 1999, and an increase of 0.5 of a point between 1999 and 2000, but then level sc ores between 2000 and 2001. Changes in score averages for individual schools fr om year to year ranged from a low of—22 to +18 points. As Bolon found with regard to grade 10 MCAS scores, these school average changes in grade 4 MCAS scores are c onsiderably smaller than the range in school average scores, which varied by 50 points or more in all four years of test administration.Table 1 Summary Statistics on Grade 4 MCAS Math School Averages, 1998–20011998199920002001 Number tested per schoolMinimum11101011Maximum317309320328Mean72.072.973.272.6Median63656564SD41.141.442.342.5Average MCAS scoreCount977977977977Minimum206208210213Maximum261260260263Mean233.1234.6235.1235.1Median233235236235SD9.679.199.198.32Change in MCAS Average 1998 to 19991999 to 20002000 to 2001 Minimum -14-17-22Maximum 251718Mean 1.50.5-0.04Median 1.01.00SD Figure 1 shows a scatter plot of how 1999 school av erages compared with those from 1998. As can be seen, there is a fairly strong rela tionship between 1998 and 1999 averages. Schools with higher MCAS grade 4 test sco re averages in 1998 tended to have higher averages in 1999. The correlation between sc ore averages in 1998 and 1999 was 0.860. The regression relationship between score av erages in 1999 and 1998 is: Gd4MCASAvg99 = 44.03 + 0.81(Gd4MCASAvg98)


7 of 20For statistically inclined readers, it may be noted that these correlation and regression relationships are statistically significant—that is extremely unlikely that they might occur by chance. Figure 1. Scatter Plot of School Average Grade 4 MC AS Math Scores 1998 vs. 1999 Figure 2 shows the relationship between grade 4 mat h MCAS score averages in 1999 and 2000. As can be seen, the relationship between year 2000 MCAS grade 4 math averages and those for 1999 is similar to the 19981999 relationship, but even slightly stronger. The correlation between score averages in 2000 and 1999 is 0.875. The regression relationship of average scores in 2000 a nd 1999 is: Gd4MCASAvg00 = 29.54 + 0.88(Gd4MCASAvg99)


8 of 20 Figure 2. Scatter Plot of School Average Grade 4 MC AS Math Scores 1999 vs. 2000 Figure 3 shows the relationship between score avera ges in 2001 and 2000. As can be seen, the relationship between score averages in 20 01 and 2000 is highly similar to the relationships evident in the previous two pairs of years. The correlation between 2001 and 2000 score averages is. 0. 866. The regression relationship of averages in 2001 and 2000 is:Gd4MCASAvg01 = 50.8 + 0.78(Gd4MCASAvg00)


9 of 20 Figure 3. Scatter Plot of School Average Grade 4 MC AS Math Scores 2000 vs. 2001 Next let us consider, la Kane and Staiger the rel ationship between school size and change in score averages from one year to the next. For these analyses school size has been calculated simply as the average number of stu dents tested in the two years across which change is calculated.Figure 4 shows the relationship between change in a verage MCAS grade 4 scores between 1998 and 1999 and school size (defined as t he average of the numbers of students tested in the two years). As can be seen, schools with less than


10 of 20 Figure 4. Change in MCAS Grade 4 Math Average Score 1998 to 1999 v. School Size 100 or so students tested show changes in MCAS aver age scores of as much as 15-20 points. However schools with more than 150 students tested per year show much smaller changes—generally less than 5 points.Figure 5 shows analogous results for 1999 to 2000 s core changes. As can be seen, the pattern shown here is similar to that shown in Figu re 4. Schools with smaller numbers of students tested tended to have much more "volatilit y" (to use Kane and Staiger's phrase) in average scores than schools with larger numbers of students tested.


11 of 20 Figure 5. Change in MCAS Grade 4 Math Average Score 1999 to 2000 v. School Size Figure 6 shows the relationship between school size and change in average grade 4 MCAS scores between 2000 and 2001. The pattern is v ery similar to that apparent in the previous two figures. Schools with less that 100 st udents tested showed much larger swings in test score averages than schools with lar ger numbers of students tested.


12 of 20Figure 6. Change in MCAS Grade 4 Math Average Score 2000 to 2001 v. School Size Given the political prominence of high stakes testi ng in Massachusetts (as elsewhere), it is not surprising that various observers have tried to use changes in school MCAS scores from one year to the next to identify high quality or "exemplary" schools. For example, in a high profile ceremony at the Massachusetts Sta te House in December 1999, five school principals were presented with gifts of $10, 000 each "for helping their students make significant gains on the MCAS" ( pr.html) (accessed November 15, 2001). Though the cash awards were donated by a private foundation, the ceremony recognizing the five schools was attended by the Ma ssachusetts Governor, Lieutenant Governor and Commissioner of Education. The press r elease for the event stated: "The schools were recognized as having the highest perce ntage improvement in overall MCAS scores between 1998 and 1999 in English Langua ge Arts, Mathematics and Science and Technology" ( pr.html) Anyone with even a modest knowledge of statistics w ill note the absurdity of this statement. Since the MCAS scale of 200 to 280 is ar bitrary and has no meaningful zero point, it is meaningless to calculate percentage in creases in scores. This indicates that whoever in the Massachusetts Department of Educatio n wrote this press release is fundamentally ignorant of statistics—or to be less politically incorrect, in need of improvement in knowledge of statistics. For anyone who has not studied statistics lately and hence may not appreciate the absurdity of calcu lating percentage increases on arbitrary test score scales, I suggest the followin g exercise. Calculate the percentage increase in temperature going from 50 degrees Fahre nheit to 68 degrees Fahrenheit. Next, figure out the equivalent temperatures on the Celsius scale and calculate the percentage increase on the Celsius scale. Finally, ask yourself which "percentage increase" is correct.Four of the five schools receiving the so-called Ed gerly awards in 1999 were elementary schools, namely, Riverside Elementary School in Dan vers, Franklin D. Roosevelt Elementary School in Boston, Abraham Lincoln Elemen tary School in Revere, Kensington Elementary School in Springfield. Figure 7 is a recasting of Figure 3, but with these four 1999 Edgerly award schools shown wi th circles.


13 of 20 Figure 7. Change in MCAS Grade 4 Math Average Score 1998 to 1999 v. School Size, with Award Schools Marked As can be seen, the four award schools share two ch aracteristics. First, they are all relatively small schools, each with less than 100 s tudents tested. Second, they showed unusually large score changes from 1998 to 1999. Th is is not surprising since large MCAS score gains from 1998 to 1999 served as basis for their receiving awards. Figure 8 recasts Figure 4, again with the 1999 "awa rd" schools marked with circles.


14 of 20 Figure 8. Change in MCAS Grade 4 Math Average Score s 1999 to 2000 v. School Size, with 1999 Award Schools Marked As can be seen in Figure 8, three out of the four 1 999 award schools showed declines in average grade 4 MCAS math scores from 1999 to 2000.Figure 9 is a variant of Figure 5, showing the rela tionship between average numbers of students tested in 1999 and 2000 versus the change in average grade 4 math scores between 1999 and 2000. In Figure 9, all of the scho ols showing a 10 or more point gain in average MCAS scores are marked with circles.


15 of 20 Figure 9. Change in MCAS Grade 4 Math Average Score 1999 to 2000 v. School Size, with Schools showing Gain of 10 Points or Mor e Highlighted with Circles What happened to these schools the next year? Figur e 10 shows change from 2000 to 2001, but with the schools having largest gains fro m 1999 to 2000 again marked with circles. As can be seen, there were a few schools s howing largest gains from 1999 to 2000 that continued to show gains in 2001. But most of the large gain schools from 1999 to 2000, showed declines in 2001. Several of them s howed declines from 2000 to 2001 that were just about as large (9-10 points) as were the gains from 1999 to 2000


16 of 20 Figure 10. Change in MCAS Grade 4 Math Average Scor e 2000 to 2001 v. School Size with Schools showing Gain of 10 Points or More '98 to'99 Highlighted with Circles Note that almost all of these schools showing large gains in average scores one year, but then large declines the next year, are ones with re latively small numbers of students tested.The relationship between changes in average scores across pairs of years can be seen more clearly in Figures 11 and 12. Figure 11 shows how change in school average grade 4 MCAS scores between 1998 and 1999 compares with t he change between 1999 and 2000. Figure 12 shows how the change between 1999 t o 2000 compares with the change between 200 and 2001. As can be seen, there is a ne gative relationship between change in one interval and changes the next. Schools that show large gains in one interval tend to show losses in the next interval. The correlatio n between change from 1998 to 1999 and change 1999 to 2000 is -0.388. The correlation for the next pair of years, that is change 1999 to 2000 versus change 2000 to 2001 is 0.396. These negative correlations are both statistically significant.


17 of 20 Figure 11. Change in MCAS Grade 4 Math Average Scor e 1998 to 1999 v. Change 1999 to 2000, with Schools showing Gain of 10 Point s or More Highlighted with Circles Figure 12. Change in MCAS Grade 4 Math Average Scor e 1999 to 2000 v. Change 2000 to 2001, with Schools showing Gain of 10 Point s or More '98 to '99 Highlighted with Circles


18 of 20These results are simply a manifestation of the kin d of volatility that Kane and Staiger (2001) found in school average test scores in other states. As they found for North Carolina, we have seen above with MCAS scores. Scho ol average test scores are particularly volatile for relatively small schools. Moreover schools that show relatively large gains in score averages from one year to the next tend to show losses the following year. sThus, it is clear that school average test s cores, or changes in averages from one year to the next, represent poor measures of school quality. Why are MCAS score averages poor indicators of scho ol quality? School average test results fluctuate from year to year for several reasons. The most obvious is that one year's class of students will d iffer from the next. Especially in relatively small schools, with less than 100 studen ts tested per grade, having a few especially test savvy, or not so savvy, students ma y skew results from one year to the next.A second likely cause of volatility in school avera ge scores on the Massachusetts test is that the MCAS is of dubious technical merit. When I first examined the 1998 grade 4 English Language Arts (ELA) test, for example, I wa s surprised to find many poorly worded questions and reading questions for which on e did not actually have to read the passage on which they were ostensibly based in orde r to answer the question (that is, the questions lacked passage dependency). More recently the Massachusetts DOE implicitly acknowledged defects in the 2001 grade 10 ELA and m ath exams when it dropped one item from each from scoring( .html). The defective items on the 2001 test were discovered not by the test's dev eloper or state officials but by students (Lindsay, 2001).More recently, Gallagher (2001) undertook a review of grade 10 MCAS math questions from the 2000 and 2001 test administrations. Gallag her, a professor of Environmental, Coastal and Ocean Sciences, at the University of Ma ssachusetts, Boston, concluded that there were serious problems with 10 to 15% of the g rade 10 MCAS math questions. He identified some questions as having wrong answers, some as having more than one correct answer and some as misaligned with the Mass achusetts curriculum frameworks. "Overall, my review of these tests indicates that t here are serious failures in the choice and review of MCAS questions" (Gallagher, 2001, p. 5,, ac cessed December 3, 2001). More generally, the MCAS is not a good indicator of school quality because it has been constructed as a norm-referenced test. Many people assume that the MCAS (and other state-sponsored tests) are criterion referenced tes ts—that is, tests of well-specified bodies of knowledge and skills. In a paper prepared for an October, 2001 conference at the John F. Kennedy Institute at Harvard University for example, Kurtz wrote: The MCAS, which is known as a "criterion-referenced exam," tests knowledge of a set curriculum and gives students sc ores based on their level of mastery, in contrast to national "norm-reference d tests," which grade a student's performance in relation to other students (Kurtz, 2001, p. 6). However examination of the technical manuals for th e MCAS tests, reveals that items have been selected for inclusion on MCAS tests by u sing norm-referenced test


19 of 20 Copyright 2002 by the Education Policy Analysis ArchivesThe World Wide Web address for the Education Policy Analysis Archives is General questions about appropriateness of topics o r particular articles may be addressed to the Editor, Gene V Glass, or reach him at College of Education, Arizona State University, Tempe, AZ 8 5287-2411. The Commentary Editor is Casey D. Cobb: .EPAA Editorial Board Michael W. Apple University of Wisconsin Greg Camilli Rutgers University John Covaleskie Northern Michigan University Alan Davis University of Colorado, Denver Sherman Dorn University of South Florida Mark E. Fetler California Commission on Teacher Credentialing Richard Garlikov Thomas F. Green Syracuse University Alison I. Griffith York University Arlen Gullickson Western Michigan University Ernest R. House University of Colorado Aimee Howley Ohio University Craig B. Howley Appalachia Educational Laboratory William Hunter University of Calgary Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Green Mountain College Dewayne Matthews Education Commission of the States William McInerney Purdue University Mary McKeown-Moak MGT of America (Austin, TX) Les McLean University of Toronto Susan Bobbitt Nolen University of Washington Anne L. Pemberton Hugh G. Petrie SUNY Buffalo Richard C. Richardson New York University Anthony G. Rud Jr. Purdue University Dennis Sayers California State University—Stanislaus Jay D. Scribner University of Texas at Austin Michael Scriven Robert E. Stake University of Illinois—UC Robert Stonehill U.S. Department of Education David D. Williams Brigham Young University


20 of 20 EPAA Spanish Language Editorial BoardAssociate Editor for Spanish Language Roberto Rodrguez Gmez Universidad Nacional Autnoma de Mxico Adrin Acosta (Mxico) Universidad de J. Flix Angulo Rasco (Spain) Universidad de Teresa Bracho (Mxico) Centro de Investigacin y DocenciaEconmica-CIDEbracho Alejandro Canales (Mxico) Universidad Nacional Autnoma Ursula Casanova (U.S.A.) Arizona State Jos Contreras Domingo Universitat de Barcelona Erwin Epstein (U.S.A.) Loyola University of Josu Gonzlez (U.S.A.) Arizona State Rollin Kent (Mxico)Departamento de InvestigacinEducativa-DIE/ Mara Beatriz Luce (Brazil)Universidad Federal de Rio Grande do Sul-UFRGSlucemb@orion.ufrgs.brJavier Mendoza Rojas (Mxico)Universidad Nacional Autnoma deMxicojaviermr@servidor.unam.mxMarcela Mollis (Argentina)Universidad de Buenos Humberto Muoz Garca (Mxico) Universidad Nacional Autnoma deMxicohumberto@servidor.unam.mxAngel Ignacio Prez Gmez (Spain)Universidad de Daniel Schugurensky (Argentina-Canad)OISE/UT, Simon Schwartzman (Brazil)Fundao Instituto Brasileiro e Geografiae Estatstica Jurjo Torres Santom (Spain)Universidad de A Carlos Alberto Torres (U.S.A.)University of California, Los