USF Libraries
USF Digital Collections

Educational policy analysis archives


Material Information

Educational policy analysis archives
Physical Description:
Arizona State University
University of South Florida
Arizona State University
University of South Florida.
Place of Publication:
Tempe, Ariz
Tampa, Fla
Publication Date:


Subjects / Keywords:
Education -- Research -- Periodicals   ( lcsh )
non-fiction   ( marcgt )
serial   ( sobekcm )

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
usfldc doi - E11-00363
usfldc handle - e11.363
System ID:

This item is only available as the following downloads:

Full Text
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c20049999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00363
0 245
Educational policy analysis archives.
n Vol. 12, no. 14 (April 07, 2004).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c April 07, 2004
How feasible is Adequate Yearly Progress (AYP)? Simulations of school AYP uniform averaging and safe harbor under the No Child Left Behind Act / Jaekyung Lee.
x Research
v Periodicals.
2 710
Arizona State University.
University of South Florida.
1 773
t Education Policy Analysis Archives (EPAA)
4 856

xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 12issue 14series Year mods:caption 20042004Month April4Day 77mods:originInfo mods:dateIssued iso8601 2004-04-07


1 of 19 A peer-reviewed scholarly journal Editor: Gene V Glass College of Education Arizona State University Copyright is retained by the first or sole author, who grants right of first publication to the EDUCATION POLICY ANALYSIS ARCHIVES EPAA is a project of the Education Policy Studies Laboratory. Articles appearing in EPAA are abstracted in the Current Index to Journals in Education by the ERIC Clearinghouse on Assessment and Evaluation and are permanently archived in Resources in Education Volume 12 Number 14April 7, 2004ISSN 1068-2341How Feasible is Adequate Yearly Progress (AYP)? Sim ulations of School AYP “Uniform Averaging” and “Safe Harbor” under the No Child Left Behind Act Jaekyung Lee SUNY at BuffaloCitation: Lee, J., (2004, April 7). How Feasible is Adequate Yearly Progress (AYP)? Simulations of School AYP “Uniform Averaging” and “Safe Harbor” un der the No Child Left Behind Act. Education Policy Analysis Archives, 12 (14). Retrieved [Date] from a/v12n14/.AbstractThe No Child Left Behind Act of 2001 (NCLB) require s that schools make “adequate yearly progress” (AYP) towar ds the goal of having 100 percent of their students become prof icient by year 2013-14. Through simulation analyses of Maine and K entucky school performance data collected during the 1990s, this study investigates how feasible schools would have met th e AYP targets if the mandate had been applied in the past with “uniform averaging (rolling averages)” and “safe harbor” opt ions that have potential to help reduce the number of schools need ing improvement or corrective action. Contrary to some expectations, the applications of both options would do little to reduce the risk of massive school failure due to unreasonably high AYP targets for all student groups. Implications of the results for the NCLB school accountability system and possible ways to m ake the


2 of 19 current AYP more feasible and fair are discussed. The reauthorized Elementary and Secondary School Ac t (ESEA), No Child Left Behind Act of 2001 (NCLB), requires standards-based accountability for schools receiving Title I funds. One major component of thi s accountability policy is to report whether the schools are making “adequate yea rly progress” (AYP) based on performance targets set by their state (i.e., 10 0% of students become proficient within 12 years from the baseline year). Since the passage of the NCLB, much concern has been raised about the AYP ma ndates and their possible consequences for schools that repeatedly f ail to meet their AYP target (Linn, 2003).Previous studies pointed out that some critical pro blems with AYP-based school accountability policies foreshadow technical challe nges that lie ahead (Hill, 1997; Kane & Staiger, 2002; Kim & Sunderman, 2004; La Marca, 2003; Lee, 2003; Lee & Coladarci, 2002; Linn & Haug, 2002; Thu m, 2002). While the studies raised technical issues such as reliability and validity with regard to AYP measures or pointed out policy implementation probl ems such as the lack of capacity and resources, the options available for s chools to take advantage of under the NCLB have not been studied and discussed systematically. Specifically, there are two options available under the current NCLB legislation, that is, (1) uniform averaging (NCLB, 2001, Section 1111(b)(2)(J)) and (2) safe harbor (NCLB, 2001, Section 1111(b)(2)(I)), that mi ght not only help improve the reliability or fairness of the AYP measure but also help save schools from failing to meet the AYP target. It remains to be examined w hether and how those options might affect the feasibility of AYP that sh ould be the most pressing issue for schools.First, the uniform averaging procedure is designed to address a reliability issue: Does AYP measure schools’ academic progress with su fficient consistency and stability? The typical school AYP measures tend to be highly vulnerable to fluctuation as they rely on comparison of successiv e cohort groups (as opposed to tracking the same cohort of students); it is par ticularly problematic in small schools which might have very few students for cert ain demographic category. In light of this difficulty, the NCLB permits aggre gating data from multiple years to increase sample size for more reliable estimatio n of the target group’s performance. While the term “uniform averaging” has not been clearly defined in either statistical or policy terms, it was inter preted as allowing for multiple approaches to aggregating multiple years’ data and being able to use the techniques for either or both, status or/and improv ement evaluations (Marion et al., 2002). For example, schools can average test s cores from the current school year with test scores from the preceding two years, and this rolling average is designed to mitigate the fact that stude nt performance can vary widely from year to year due to factors beyond a sc hool’s control such as changes in the demographic composition of student p opulations (“Raising The Bar,” 2002). While the primary purpose of using thi s rolling average option is to make the school AYP measure more reliable, it can a lso help improve the fairness of school accountability system by reducin g the chance that small schools or small subgroups within schools would be left out of reporting due to the states’ minimum group size (N) requirement. Mo reover, it needs to be noted that the uniform averaging option also has so me potential to help


3 of 19 struggling schools meet the AYP target under the ci rcumstance of declining test scores. Does this option really work to save a scho ol with downward performance trend from being identified by the stat e as failing AYP? Second, the safe harbor provision is designed to ad dress a fairness issue: Does AYP measure school progress in a way that different groups of students in the same school can meet the same performance target at different rates? Basically, the law requires that schools disaggrega te the test results into subgroups (e.g., major racial/ethnic groups, econom ically disadvantaged students, students with disabilities, English Langu age Learners) and have all of them meet the same AYP target. This requirement has the danger of assuming that all categories will move forward at the same r ates (NECEPL, 2002). However, the NCLB also gives schools the option of a “safe harbor”, which is designed to lesson the difficulty of reaching the s ame AYP target for all groups of students at the same rates and give academically viable schools a second chance. For school where the performance of one or more student subgroups on one or both of reading and math assessments fail s to meet AYP targets, the school will be considered to have reached AYP under this provision if the percentage of students in that group who failed to reach proficiency decreased by 10 percent from the preceding year and also the group made progress on another academic indicator. Is this option powerful enough to save an at-risk school from being identified by the state as failin g AYP? It was estimated that up to 80 percent of schools i n some states could be targeted as needing improvement or corrective actio n in the first few years (Marion et al., 2002; Olson, 2002, April 18). These earlier predictions from state simulations used only student assessment results wi thout looking at test participation rates, other academic indicators, or “safe harbor” provisions under the NCLB (Marion et al., 2002). Since those earlier predictions came before the U.S. Department of Education’s guidance or regulati ons for AYP, it was pointed out that some of the interpretations states have us ed in building their projections may not have taken advantage of all the options ava ilable (Olson, 2002, April 18). Therefore, we need new predictions with the op tions enabled, and the result may or may not differ from the earlier predi ctions. In this paper, I focused on the issue of feasibilit y and investigate several “what if” questions through simulation analyses of the da ta collected from Maine and Kentucky schools during the 1990s: how the NCLB’s A YP formula would have worked if we had applied it to past school performa nce data and what would have happened if we had applied options that the cu rrent formula permits. Specifically, the objective of this study was to (a ) investigate the feasibility of the current AYP requirements for schools and (b) explor e the impact of using “uniform averaging (rolling average)” and “safe har bor” options on the AYP results. I examined whether and how application of “rolling average” and “safe harbor” provisions improve the chance of meeting AY P target over the long run and at the same time reduce the risk of failing to meet the AYP for 2-5 consecutive years. The answer to questions of who m ight win or lose from the current AYP race and how we can make this measureme nt-driven accountability strategy more realistic and fair for all may provid e insight that will guide policymaking.


4 of 19 Data and MethodsAggregate school performance data from all public s chools in two states, Kentucky and Maine, were collected and examined. Ea rly on, both states (a) established student assessment systems to monitor t heir schools’ academic progress and (b) made a greater effort to align the ir assessments with their content and performance standards (Lee & McIntire, 2002). Despite these common characteristics, the two states’ assessments differed significantly in terms of the stakes attached to the assessment resu lts: high-stakes test in Kentucky vs. low-stakes test in Maine. The 8th grad e mathematics achievement data collected from the two states’ student assessm ents were used for analysis: the Kentucky Instructional Results Information Syst em (KIRIS) for the 1993-98 period and the Maine Educational Assessment (MEA) f or the 1995-98 period. Because both states changed their state assessments since 1999, and the results were not directly comparable to old ones, a ll of these analyses were restricted to the pre-1999 period. Using only data collected after the NCLB legislation was not considered to be a viable optio n, because the data were available for only one or two years and they were n ot sufficient for an estimation of the longer-term consequences.In congruence with the NCLB AYP requirements, stand ards-based interpretation of the test results were applied to determine academic performance of students against the performance sta ndards set by the state. For the Maine data, the percentage of students scor ing at or above “Advanced” level on the 1995-1998 MEA was used; for Kentucky, the percentage of students scoring at or above “Proficient” on the 19 93-1998 KIRIS was used. Both “Advanced” and “Proficient” levels were next t o the highest among four achievement levels and can be regarded as meeting s tate performance standards. Indeed, these two states’ proficiency st andards were set at a highly comparable level (in Kentucky) or at an even higher level (in Maine) than their corresponding proficiency standard on the National Assessment of Educational Progress (NAEP). For example, the percentages of 8t h grade students in Kentucky who turned out to perform at or above Prof icient level in mathematics as of 1996 were 16 on the NAEP and 14 on the KIRIS; the corresponding percentages in Maine were 31 on the NAEP and 9 on t he MEA. First of all, the current AYP rules were used to de termine baseline and annual AYP targets in each state: the percentage of studen ts proficient in a school at each state’s 20th percentile rank in the first avai lable year was used as the baseline AYP target. On top of that baseline, equa l increments were made every year so that the AYP target becomes 100 in 12 years. Therefore, the baseline AYP target for Maine schools was set to be zero in 1995, and the subsequent AYP target added an increment of 8.3 eve ry year to make its ultimate target equal to 100. Likewise, the baselin e AYP target for Kentucky was set to be 8.8 in 1993, and the subsequent AYP targe t added an increment of 7.6 every year to reach 100 in 12 years from the ba seline. Given such hypothetical AYP target lines, Figure 1 and Figure 2 show the distributions of school AYP measures (i.e., the per centage of 8th grade students deemed proficient on the state math assessment) res pectively in Maine and


5 of 19 Kentucky. In Maine, schools made very modest amount of gain, that is, about 1 percent gain per year on average so that they got f arther and farther behind the AYP target over time (see Figure 1). In 1996 (Year 2), more than half of the schools in Maine were already performing below the AYP target, and a large majority of schools were so two years later (Year 4 ). While schools in Kentucky made relatively larger achievement gains (on averag e 3 percent gain per year) than their counterparts in Maine during the period, they also could not have caught up with the AYP target that grew more rapidl y (see Figure 2). Figure 1. Maine Schools’ 1995-98 Performance Trends against Hypothetical AYP Targets in 8th Grade MEA Mathemati cs


6 of 19 Figure 2. Kentucky Schools’ 1993-98 Performance Tre nds against Hypothetical AYP Targets in 8th Grade KIRIS Mathema ticsAssessing the effect of the “Rolling Average” optio n on school AYPUnder the “rolling average” (uniform averaging proc edure) provision, it is assumed that schools can average test scores from t he current school year with test scores from the preceding one or two years. Th is works in a school’s favor when test scores decline but it works against a sch ool when scores rise. If this rolling average option is used every time regardles s of individual schools’ variable growth patterns, it can result in a greate r number of schools being identified as failing to meet AYP every year. This could have happened in both Maine and Kentucky because their schools on average made progress over the course of 4 or 6-year periods. In this study, it wa s assumed that the rolling average procedure was used by schools only when the y obviously benefited from the option (i.e., when school performance decl ined). According to Scott Marion, who was the co-chair of the Joint Study Gro up on Adequate Yearly Progress (AYP) and co-authored a report (Marion et al., 2002), this assumption may not be unreasonable: “To be fair, schools shoul dn't be able pick and choose when they can use the multi-year average. H owever, we've suggested that the state set up an appeal process whereby sch ools that miss AYP because of the earlier years included in the multiyear average be granted an appeal. So it is sort of like picking and choosing when to apply multi-year averages, but it occurs through the appeal process. ” (Personal communication, March 18, 2004). Nevertheless, whether states would actually allow schools to use the rolling average option in such a flexible w ay remains an open question


7 of 19 (see Erpenbarch, Forte-Fast, Potts, 2003 for exampl es of state plans). The following rule was employed in this simulation’ s determination of using rolling average for AYP calculation: If the rolling average score (i.e., the mean of scores from current year plus preceding two years) is greater than current year score, then the rolling average is used; otherwise the current year score is used instead. Simple averaging method was used without a ny weighting. If (Xt-2 + X t-1 + X t)/3 > Xt, then AYP = (Xt-2 + X t-1 + X t)/3 Otherwise AYP = X twhere X t-2 = Percent proficient at year t-2, X t-1 = Percent proficient at year t-1, X t = Percent proficient at year t (current year) Simulation analyses of the estimates of schools tha t would have failed to meet the AYP target with or without this rolling average procedure were conducted. Because sanctions may apply to schools which fail t o meet AYP for two or more consecutive years, the focus of this analysis was s chools that belong to this high-risk category. Some schools which may fail oft en but not in a row would not be designated as “in need of improvement” according to the regulation. Odds ratio was computed to compare the relative risk of failure with vs. without using the rolling average option.Assessing the effect of the “Safe Harbor” option on school AYPThe “safe harbor” provision applies to schools in w hich one or more of the subgroups of students fail to reach their uniform, schoolwide AYP target. According to the provision, the school shall be con sidered to have made adequate yearly progress if the percentage of stude nts in that group who did not meet or exceed the proficient level of achievement on the state assessments for that year decreased by 10 percent of that percentag es from the preceding school year and that group made progress on one or more of academic indicators. Although this option implies giving som e recognition to schools which have made certain minimum level of progress for eve ry subgroup despite its uneven success among different subgroups, the amoun t of progress required for this safe harbor application varies among subgr oups; the school has to demonstrate a greater progress for a subgroup which performs at a relatively lower level in terms of its percent proficient stud ents. While the uniform averaging procedure can also be used to combine mul tiple years’ data for the safe harbor review, there are variations among diff erent states in their approaches to addressing the inherent instability o f gain scores (see Erpenbarch, Forte-Fast, Potts, 2003 for examples of state plans). To examine how the “Safe Harbor” option would work for low-income students, one of the subgroups as identified by students who were eligible for free or reduced-price lunch, was chosen. Before the NCLB le gislation, disaggregated student performance data was hardly available. The school aggregate performance data collected from both Maine and Kent ucky was not an exception to this conventional reporting pattern as they did not break down the aggregated results by demographic subgroups. In the absence of school-level


8 of 19 data on the achievement of students in free/reduced school lunch program, the statewide average achievement results based on the NAEP 1996 8th grade state math assessment were used for estimation. At the same time, the absence of data on another academic indicator (e.g. performance on another type of test or retention/promotion rate) precluded an application of the requirement.The percent students at or above the NAEP proficien t level was 23 for non-eligible students and 4 for eligible ones in Ke ntucky. Likewise, the percent students at or above the NAEP proficient level was 35 for non-eligible students and 18 for eligible ones in Maine. For the sake of simplifying calculations, the 21-point difference was assumed to be uniform acros s all schools and constant over time in Maine (see equation 1.1 below); In cas e of Kentucky, 21 in equation 1.1 was replaced by 17. In addition, the e ntire school AYP measure was specified as a function of summing each subgrou p’s rolling-AYP measure weighted by the percentage of students in each cate gory (see equation 1.2 below). The following simultaneous equations were s olved together to estimate each school’s percent proficient free/reduced lunch students: Xi – Yi = 21 (1.1) (Xi Pxi + Yi Pyi)/100 = Zi (1.2)where Xi = percent proficient students among those who are not eligible for free/reduced lunch in school i; Yi = percent profic ient students among those who are eligible for free/reduced lunch in school i ; Zi = percent proficient students total in school i; Pxi = percent students who are not eligible for free/reduced lunch in school i; Pyi = percent stude nts who are eligible for free/reduced lunch in school i (i.e., 100 – Pxi).In the above equations, Zi, Pxi, and Pyi are known variables available from the data and their values are used to estimate Xi and Y i. With the estimated percentage of free/reduced lunch students who are p roficient in each school at year t (Y t), the following safe harbor rule was applied to sc hools which otherwise would fail to meet the AYP target for fre e/reduced lunch students: If ((100 – Y t) – (100 – Y t-1)) (100 – Y t)/10, then schools would be regarded as meeting the AYP target for free/reduced lunch stude nts. It was assumed that the group made progress on another academic indicat or. Odds ratio was computed to compare the relative risk of failure wi th vs. without using the safe harbor option.ResultsWhen using the current AYP goal and timeline (100% proficient within 12 years) on retrospective school performance data (1993-98 i n Kentucky and 1995-98 in Maine), the percentage of schools that would meet t heir AYP target overall turned out to decrease exponentially over the cours e of the first few years (see Table 1). In Kentucky, it was 80 percent in the fir st year, plummeted to 36 percent in the 4th year, and further down to 10 per cent in the 6th year. In Maine, it started as 100 percent in the first year (becaus e baseline AYP goal was set to 0), became 44 percent in the 2nd year, and dropped down to 6 percent in the


9 of 19 4th year. This implies that most schools would have enormous difficulty meeting the NCLB AYP requirement that appears to be an unre alistic expectation given a relatively high performance standard (proficient) and a relatively short time line (12 years).Even when the rolling average option was used, it w ould have only slightly increased the chance of schools’ meeting the AYP ta rget (see Table 1). The odds of meeting AYP target with the rolling average was only 1.06 1.24 times greater than the odds of meeting AYP target without the rolling average. With the rolling averaging option, the percentage of sch ools that would meet their AYP target in the 2nd year, for example, may increa se from 44.3 to 46 in Maine and from 35.9 to 39.5 in Kentucky. This implies tha t the rolling average has very weak potential to save schools from being identifie d as failing when their scores decline. Table 1. Percentage of Maine and Kentucky Schools t hat would Meet AYP Target with vs. without Rolling Average Option MaineKentuckyYear RollingORNo RollingRollingOR1100.0 80.4 244.346.01.0766.169.51.17310.512.71.2461.462.81.0645.97.21.2435.939.51.175 28.330.51.11 6 10.310.91.07 Note: OR is the odds ratio of given percentages, i. e., the ratio of the odds of schools meeting the AYP target for all students eac h year with a rolling average of their corresponding odds of passing without the rolling average option. The percentage of schools that would fail to meet A YP for two consecutive years at least once was very high: 75 percent in Ke ntucky and 87 percent in Maine (see Table 2). While the risk tends to drop s ignificantly for the longer periods, it still remains a substantial threat to m ost schools. The failure rate for three years in a row would be as high as 57 percent in Kentucky and 52 percent in Maine. Although the failure rate for 5 consecuti ve years was less than 10 percent in Kentucky for the 6-year period, the risk would have been much greater for full 12-year cycle. Table 2. Percentage of Maine and Kentucky Schools t hat would Fail to Meet AYP Target for 2-5 Consecutive Years with vs. without Rolling Average Option MaineKentuckyFrequency RollingORNo RollingRollingOR


10 of 19 2 Years87.386.5.9375.273.3.913 Years51.950.2.9357.155.9.954 Years0.00.0 17.717.1.965 Year Note: OR is the odds ratio of given percentages, i. e., the ratio of the odds of schools failing to meet the AYP target for free/red uced lunch students for 2-5 years in a row with safe harbor to their correspond ing odds of consecutive failure without the safe harbor option.The use of the rolling average procedure helps redu ce consecutive failure rates in both states. As with the single-time failure rat e, however, the degree of this risk reduction tends to be very small (see Table 2) The odds of failing to meet AYP target for consecutive years with the rolling a verage is .91 1.04 times greater than the odds of failing without the rollin g average. Applying the AYP target to a subgroup of low-income students (i.e., students who receive free/reduced lunch in this analysis) in creases the risk of school failure about two to three times. The percentage of schools that would meet the AYP target for this particular disadvantaged group in Year 2 is only 6 in Maine and 32 in Kentucky (see Table 3). These figures we re much smaller than corresponding figures estimated with the entire gro up of students in each school (cf. Table 1). Table 3. Percentage of Maine and Kentucky Schools t hat would Meet AYP Target for Low-Income Students (Eligible for Free/R educed Lunch) with vs. without Safe Harbor Option MaineKentuckyYearNo Safe HarborSafe HarborORNo Safe HarborSafe H arborOR 1100.0 36.7 10.920.22.07 6 3.429.111.66 Note: OR is the odds ratio of given percentages, i. e., the ratio of the odds of schools’ meeting the AYP target for all students ea ch year with safe harbor to their corresponding odds of passing without safe ha rbor option. Using the “safe harbor” option increases the chance that schools would meet the AYP target for free/reduced lunch students (see Table 3). The odds ratio for meeting AYP target with the safe harbor ranges from 1.22 to 11.66. However, this might have overestimated the effect because th e requirement of making progress on another academic indicator was not cons idered. At the same time,


11 of 19 using the safe harbor option reduces the risk of be ing identified as a failing school for consecutive years and facing undesirable consequences (see Table 4). The odds ratio for failing to meet AYP target for 2-5 years in a row with the safe harbor ranges from .23 to .75. Even with this option, however, the risk remains high, and up to 90 percent of schools will be regarded as needing improvement. While this estimation was based on onl y one subgroup, that is, economically disadvantaged students, simultaneous e valuation of other subgroups including students with learning disabili ties and LEP/ELL students may result in greater failure rates. Table 4. Percentage of Maine and Kentucky Schools t hat would Fail to Meet AYP Target for Low-Income Students (Eligible f or Free/Reduced Lunch) for 2-5 Consecutive Years with vs. without S afe Harbor Option MaineKentuckyYearNo Safe HarborSafe HarborORNo Safe HarborSafe H arborOR 2 Years98.694.3.2394.186.9.423 Years93.891.9.7584.166.8.384 Years0.00.0 Year Note: OR is the odds ratio of given percentages, i. e., the ratio of the odds of schools’ failing to meet the AYP target for free/re duced lunch students for 2-5 years in a row with safe harbor to their correspond ing odds of consecutive failure without safe harbor option.Now we can compare all the results of this simulati on analysis under four different scenarios: (1) applying AYP to the entire group of students schoolwide without using the rolling average and safe harbor o ptions, (2) applying AYP to the entire group of students schoolwide with the ro lling average option only, (3) applying AYP to the entire group of students school wide as well as the subgroup of free/reduced lunch students with the ro lling average option but without the safe harbor option, and (4) applying AY P to the entire group of students schoolwide as well as the subgroup of free /reduced lunch students with both the rolling average and the safe harbor o ptions. The results that would be obtained under the above-mentioned four differen t scenarios are compared in Figure 3 and Figure 4 with abbreviated labels of each scenario: (1) No Rolling Average, (2) Rolling Average, (3) No Safe Harbor, a nd (4) Safe Harbor.


12 of 19 Figure 3. Percentages of schools in Maine and Kentu cky that would meet AYP Target under different options.


13 of 19 Figure 4. Percentages of schools in Maine and Kentu cky that would fail to meet AYP for 2-5 years in a row.First of all, we apply the AYP target to the entire body of students but not to subgroups in each school and do not use the rolling average and safe harbor options (see “No Rolling Average” lines in Figure 3 and Figure 4). Such schoolwide application of the AYP formula without l ooking into subgroups was what the states typically did for evaluating school AYP before the NCLB legislation. By using the rolling average option sc hoolwide, we can show some improvement in the chance of schools meeting AYP ea ch year and for consecutive years as well, but the difference is hi ghly marginal (see “Rolling Average” lines in Figure 3 and Figure 4). Now by ap plying AYP to a group of low-income students as the NCLB requires, we see su bstantial increases in the risk of school failure (see “No Safe Harbor” lines in Figure 3 and Figure 4). By and large, the comparison shows the benefit of usin g the safe harbor option, but it also reveals that the option is not strong enoug h to save many struggling disadvantaged schools from the risk (see “Safe Harb or” lines in Figure 3 and Figure 4).DiscussionPolicy implications of this study need to be discus sed carefully given the fact that the findings are based on the simulation analy sis of the past school performance data in a single grade and a single sub ject area from two selected


14 of 19 states. It needs to be noted that the study has so me unwarranted assumptions about school AYP measures and targets within the pa rameters of the NCLB and that the actual results can be quite different if t he two states make different choices (e.g., using an index measure of AYP, incre asing the AYP target in a nonlinear, stepwise fashion). Whatever estimation m ethods used, this study might underestimate or overestimate the schools’ fu ture progress expected under this new legislation, NCLB. The results may h ave been different if schools had faced in the past the stronger incentives embod ied in current AYP rules. Moreover, the results might be different if the per formance standard used in the past is significantly higher or lower than the curr ent performance standard adopted under new testing systems in both states. H owever, the comparison of Kentucky and Maine (high-stakes testing vs. low-sta kes testing environments with their commonly challenging state assessments a nd high performance standards) can give us an insight into possible con sequences of the NCLB AYP policy for schools across the nation.With these caveats in mind, the results of this sim ulation analysis turn out to provide very gloomy projections of schools’ chance to meet the AYP target, warning federal and state education policymakers ag ainst massive school failure under the NCLB. It does not appear to be fe asible for many schools across the nation to meet the current AYP target wi thin its given 12-year timeline. It is not realistic to expect schools to make unreasonably large achievement gains compared with what they did in th e past. Many schools are doomed to fail unless drastic actions are taken to modify the course of the NCLB AYP policy or slow its pace. Contrary to some expectations, using both rolling average and safe harbor options does not wo rk to reduce the risk of massive school failure. Although the rolling averag e can help improve more stable estimation of school performance, it hardly reduces the risk of school failure. The safe harbor option also fails to provi de a strong safety net to at-risk schools despite what its name implies.When a majority of schools fail, there will not be enough model sites for benchmarking nor enough resources for capacity buil ding and interventions. This situation can raise a challenging question to the policymakers: is it school or policy that is really failing? There is a potent ial threat to the validity of the NCLB school accountability policy ultimately if suc h prevailing school failure occurs as an artifact of policy mandates with unrea listically high expectations that were not based on scientific research and empi rical evidence. One approach that policymakers can consider to make the AYP targets more realistic and fair might be to use an effect size m easure for guidance. For example, one might reasonably expect that schools s hould make progress every year by say 20% of the standard deviation of school-level percent proficient measure; this amounts to about 2.5 3.0 percent in Kentucky and 1.5 – 2.0 percent in Maine. This amount of progress may be regarded as small by conventional statistical standard (Cohen, 1977), bu t it is exactly what an average school in both states managed to accomplish in the past. In a similar vein, one can consider setting the safe harbor thre shold for a subgroup at certain percentage of the standard deviation (e.g., reduce the percentage of non-proficient low-income students by 10% of the st andard deviation). A similar suggestion along with the use of scale score rather than percent proficient was


15 of 19 made by other analysts (Linn, Baker, & Betebenner, 2002). While using an effect size metric with scale scores may help set more realistic performance targets and better recognize schools’ a cademic progress, it is not permissible under the current law. This idea also r aises questions as to whether to use standard deviation of student-level test sco res or school-level average test scores and whether to derive the standard devi ation from original test score variance or residual variance with adjustments for demographic differences among students and their schools. In Maine and Kent ucky, the school-level standard deviation was only 40 percent of the stude nt-level standard deviation of mathematics achievement scores. Once the differe nces among schools in their students’ racial and socioeconomic background characteristics, the adjusted school-level variance of residuals is redu ced further down to the half of original school-level variance (see Lee & Coladarci 2002 for the analysis of within-school vs. between-school math achievement d istributions in Maine and Kentucky).Using different methods with different measures wou ld produce different results and, consequently, different conclusions. Whether o ne prefers a criterion-referenced or norm-referenced approach to setting AYP target and evaluating school progress, the ultimate concern is not simply improving the feasibility of schools’ meeting their AYP targets i n the short term but rather enhancing the schools’ capacity for sustained acade mic improvement over the long haul. Given limited amount of resources availa ble from the federal government and limited capacity of the state agenci es as well, reducing the identification of schools in need of improvement wo uld help states provide more targeted assistance to a smaller number of disadvan taged schools which have a large number of at-risk students. Nevertheless, a pplying the AYP options such as rolling averages and safe harbor had better not be compromised by future prospect of limited support and short-term interest s in reducing school identifications. The long-term success of school ac countability system does not depend on the number of passing schools but on the results of student achievement.NoteThis article is based upon work supported in part b y the National Science Foundation under Grant No. 9970853. Any opinions, f indings, and conclusions or recommendations expressed in this material are t hose of the author(s) and do not necessarily reflect the views of the Nationa l Science Foundation. This study simply utilizes the past school performance d ata from Maine and Kentucky for simulation analyses, but all assumptio ns, results, and interpretations given in the article have nothing t o do with the two states’ current AYP policies and outcomes. An earlier version of th is paper was presented at the 2003 AERA annual meeting in Chicago. E-mail JL2 for correspondence about this manuscript.ReferencesCohen, J. (1977). Statistical power analysis for the behavioral scien ces New York: Academic Press.


16 of 19 Erpenbarch, W. J., Forte-Fast, E., Potts, A. (2003) Statewide Educational Accountability Under NCLB: Central Issues Arising from an Examination of State Accountability Workbooks and U.S. Department of Education Reviews Under the No Child Left Behind Act of 2001. Washington, DC: Council of Chief State School Officers. Available a t Hill, R. (1997). Calculating and reducing errors as sociated with the evaluation of adequate yearly progress. Paper presented at the annual assessment conference of the Council of Chief State School Officers (ERIC Publication No. ED 414307). Kane, T. J., & Staiger, D. O. (2002). Volatility in school test scores. In D. Ravitch (Ed.). Brookings Papers on Education Policy 2002 (pp. 235-284). Washington, DC: Brookings Institut ion. Kim, J., & Sunderman, G. L. (2004). Large Mandates and Limited Resources: State Respons e to the No Child Left Behind Act and Implications for Accou ntability Cambridge, MA: The Civil Rights Project at Harvard La Marca, P. M. (2003). Factors affecting the state wide implementation of an adequate yearly progress model. Paper presented at the annual meeti ng of the American Educational Research Association in Chicago. Lee, J. (2003). Evaluating Rural Progress in Mathem atics Achievement: Threats to the Validity of “Adequate Yearly Progress.” Journal of Research in Rural Education, 18 (2), 67-77. Lee, J. & Coladarci, T. (2002). Using Multiple Measures to Evaluate the Performance of Students and Schools: Learning from the Cases of Kentucky an d Maine Orono, ME: University of Maine. Available at Lee, J, & McIntire, W. (2002). Using National and State Assessments to Evaluate th e Performance of State Education Systems: Learning from the Cases of Kentucky and Maine Orono, ME: University of Maine. Available at Linn, R. L., Baker, E. L., & Betebenner, D. W. (200 2). Accountability systems: Implications of requirements of the No Child Left Behind Act of 200 1. Educational Researcher, 31 3-16. Linn, R. L. (2003). Accountability: Responsibility and reasonable expectations. Educational Researcher 32(7), 3-13. Linn, R. L., & Haug, C. (2002). Stability of school -building accountability scores and gains. Educational Evaluation and Policy Analysis 24(1), 29-36. Marion, S. F., White, C., Carlson, D., Erpenbach, W J., Rabinowitz, S., Sheinker, J. (2002). Making valid and reliable decisions in the determination o f adequate yearly progress. A Paper in the Series: Implementing The State Accountability Syste m Requirements Under the No Child Left Behind Act of 2001. Washington, DC: Council of Chie f State School Officers. Available at New England Center for Educational Policy and Leade rship (2002). Implementing the No Child Left Behind Act of 2001: A tool kit for New England stat e policy makers. Storrs, CT: Author. No Child Left Behind Act of 2001, Pub. L. No. 107-1 10. Olson, L. (2002, April 18). ‘Inadequate’ yearly gai ns are predicted. Education Week Available at yp.h21. Raising the bar: The complexities of “adequate year ly progress.” (2002) Education Assessment Insider, 1(5), 5. Thum, Y. (2002). Design of School Performance and S chool Productivity Indicators: Measuring Student and School Progress with the California API Working draft.About the AuthorJaekyung Lee


17 of 19 Graduate School of EducationSUNY at BuffaloE-mail: JL224@buffalo.eduJaekyung Lee is an assistant professor of education at University at Buffalo, the State University of New York. He was National Acade my of Education/Spencer Postdoctoral Fellow and Principal Investigator of N SF Statewide Systemic Initiatives (SSI) study. His current research focus es on the issues of educational accountability and equity. The World Wide Web address for the Education Policy Analysis Archives is Editor: Gene V Glass, Arizona State UniversityProduction Assistant: Chris Murrell, Arizona State University General questions about appropriateness of topics o r particular articles may be addressed to the Editor, Gene V Glass, or reach him at College of Education, Arizona State Un iversity, Tempe, AZ 85287-2411. The Commentary Editor is Casey D. Cobb: .EPAA Editorial Board Michael W. Apple University of Wisconsin David C. Berliner Arizona State University Greg Camilli Rutgers University Linda Darling-Hammond Stanford University Sherman Dorn University of South Florida Mark E. Fetler California Commission on TeacherCredentialing Gustavo E. Fischman Arizona State Univeristy Richard Garlikov Birmingham, Alabama Thomas F. Green Syracuse University Aimee Howley Ohio University Craig B. Howley Appalachia Educational Laboratory William Hunter University of Ontario Institute ofTechnology Patricia Fey Jarvis Seattle, Washington Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Green Mountain College Les McLean University of Toronto Heinrich Mintrop University of California, Los Angeles Michele Moses Arizona State University Gary Orfield Harvard University


18 of 19 Anthony G. Rud Jr. Purdue University Jay Paredes Scribner University of Missouri Michael Scriven University of Auckland Lorrie A. Shepard University of Colorado, Boulder Robert E. Stake University of Illinois—UC Kevin Welner University of Colorado, Boulder Terrence G. Wiley Arizona State University John Willinsky University of British ColumbiaEPAA Spanish and Portuguese Language Editorial BoardAssociate Editors for Spanish & Portuguese Gustavo E. Fischman Arizona State Universityfischman@asu.eduPablo Gentili Laboratrio de Polticas Pblicas Universidade do Estado do Rio de Janeiro pablo@lpp-uerj.netFounding Associate Editor for Spanish Language (199 8-2003) Roberto Rodrguez Gmez Universidad Nacional Autnoma de Mxico Adrin Acosta (Mxico) Universidad de J. Flix Angulo Rasco (Spain) Universidad de Teresa Bracho (Mxico) Centro de Investigacin y DocenciaEconmica-CIDEbracho Alejandro Canales (Mxico) Universidad Nacional Autnoma Ursula Casanova (U.S.A.) Arizona State Jos Contreras Domingo Universitat de Barcelona Erwin Epstein (U.S.A.) Loyola University of Josu Gonzlez (U.S.A.) Arizona State Rollin Kent (Mxico) Universidad Autnoma de Puebla Mara Beatriz Luce (Brazil) Universidad Federal de Rio Grande do Javier Mendoza Rojas (Mxico)Universidad Nacional Autnoma deMxico Marcela Mollis (Argentina)Universidad de Buenos


19 of 19 Humberto Muoz Garca (Mxico) Universidad Nacional Autnoma Angel Ignacio Prez Gmez (Spain)Universidad de DanielSchugurensky (Argentina-Canad) OISE/UT, Simon Schwartzman (Brazil) American Institutes forResesarch–Brazil (AIRBrasil) Jurjo Torres Santom (Spain) Universidad de A Carlos Alberto Torres (U.S.A.) University of California, Los EPAA is published by the Education Policy Studies Laboratory, Arizona State University