USF Libraries
USF Digital Collections

Educational policy analysis archives


Material Information

Educational policy analysis archives
Physical Description:
Arizona State University
University of South Florida
Arizona State University
University of South Florida.
Place of Publication:
Tempe, Ariz
Tampa, Fla
Publication Date:


Subjects / Keywords:
Education -- Research -- Periodicals   ( lcsh )
non-fiction   ( marcgt )
serial   ( sobekcm )

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
usfldc doi - E11-00318
usfldc handle - e11.318
System ID:

This item is only available as the following downloads:

Full Text


1 of 21 A peer-reviewed scholarly journal Editor: Gene V Glass College of Education Arizona State University Copyright is retained by the first or sole author, who grants right of first publication to the EDUCATION POLICY ANALYSIS ARCHIVES EPAA is a project of the Education Policy Studies Laboratory. Articles appearing in EPAA are abstracted in the Current Index to Journals in Education by the ERIC Clearinghouse on Assessment and Evaluation and are permanently archived in Resources in Education Volume 11 Number 20July 8, 2003ISSN 1068-2341A Multilevel, Longitudinal Analysis of Middle School Math and Language Achievement Keith Zvoch Albuquerque (NM) Public Schools Joseph J. Stevens University of New MexicoCitation: Zvoch, K. and Stevens, J. J. (July 8, 200 3). A multilevel, longitudinal analysis of middle s chool math and language achievement. Education Policy Analysis Archives, 11 (20). Retrieved [date] from performance of schools in a large urban school district was examined using achievement data from a longitudinal ly matched cohort of middle school students. Schools were eval uated in terms of the mean achievement and mean growth of students in mathematics and language arts. Application of multilevel, longi tudinal models to student achievement data revealed that 1) school pe rformance varied across both outcome measures in both subject areas, 2) significant proportions of variation were associated with schoo l-to-school differences in performance, 3) evaluations of schoo l performance differed depending on whether school mean achieveme nt or school mean growth in achievement was examined, and 4) sch ool mean achievement was a weak predictor of school mean gro wth. These


2 of 21 results suggest that assessments of school performa nce depend on choices of how data are modeled and analyzed. In pa rticular, the present study indicates that schools with low mean scores are not always “poor performing” schools. Use of student gr owth rates to evaluate school performance enables schools that wo uld otherwise be deemed low performing to demonstrate positive ef fects on student achievement. Implications for state accountability systems are discussed. With the enactment of the No Child Left Behind (NCL B; No Child Left Behind Act, 2002) legislation, states are now required to devel op content-based standards in mathematics and reading or language arts and have t ests that are linked to those standards in grades 3 through 8. The new legislatio n also requires states to set a proficiency standard for performance on those conte nt-aligned tests. The proficiency standard will enable states to identify “probationary” schools, monitor their performance, and intervene if adequate yearly progress toward the standard does not occur. The increased emphasis that NCLB br ings to the assessment of state content standards and the measurement of scho ol effectiveness is intended to ensure that all students have access to an equit able and comprehensive education. However, the assessment of student perfo rmance and the measurement of school effectiveness are neither sim ple nor straightforward (Linn, 2000; Stevens, 2000). There are many complex issues involved in the development and implementation of accountability systems that a re not acknowledged or considered in the fervor of political and public di alogue and policy discussion on educational reform. One issue of substantial import ance is how the analytic methods used in an accountability system may impact the evaluation of school effectiveness. At its heart, the measurement of stu dent learning and school effectiveness poses some challenges in research des ign that must be met if the effects of teachers and schools are to be validly e stimated. One of the most difficult challenges in evaluating school performance is to separate the effects of schooling from the intake characteri stics of the students who attend the school (Raudenbush & Willms, 1995; Willms, 1992 ). The evaluative challenge stems from the manner in which students come to att end particular schools. Nonrandom selection processes sort families into ne ighborhoods and students into schools. The unequal distribution of student charac teristics that follows tends to give schools with challenging intakes a competitive disadvantage in most accountability systems. Schools with disadvantaged intakes are at particular risk of unfavorable evaluation if the state accountability system fails to use statistical methods that properly account for student backgroun d and the hierarchical nature of school accountability data. State accountability models that use school means or medians as the primary or only indicator of school effectiveness are particularly problematic. Common practice is to aggregate studen t data to the level of the school in these models. However, if relevant data o ccur at different levels or for different sampling "units" as in the measurement of both students and schools, then aggregating data to the school level may be in appropriate. This issue is often referred to as a "unit of analysis" problem. In sta tistical terminology, students are nested within schools and analysis should incorpora te the nested structure of the data into the design through the use of multilevel analysis methods (see Aitken & Longford, 1986; Brown, 1994; Burstein, 1980; Cronba ch, 1976; Cronbach & Webb, 1975; Goldstein, 1988; Raudenbush & Willms, 1995). Yet, few state systems


3 of 21 appear to use multilevel methods. Accountability sy stems that fail to properly model the nested structure of data tend to confound schoo l intake with school practice and policy and are probably biased in their estimat ion of school effects. Another issue of importance in the measurement of s chool performance is the over-reliance in accountability systems on the comp arison of successive cohorts of students as a measure of “change”. States that use the successive cohort approach (e.g., the mean performance of 6th graders in 2001 is compared to the mean performance of 6th graders in 2002) to measure and evaluate school performance attempt to mitigate school differences in student intake by focusing on the year-to-year change in student achievement scor es. The comparison of successive student cohorts enables states to evalua te schools in terms of proficiency gains instead of absolute performance l evels. However, the use of different cohorts of students to measure school pro gress or school improvement is problematic for evaluative purposes. Recent investi gations of the successive cohort approach demonstrate that estimates of year-to-year gains in proficiency are affected in large part by sampling variation, measu rement error, and unique, non-persistent factors that are not associated with school size or school practice (Linn & Haug, 2002; Kane & Staiger, 2002). The lack of systematic variation in the successive cohort change score puts states at risk of assessing school performance on the basis of fluctuations in student cohorts or test administration conditions instead of actual changes in student per formance (Linn & Haug, 2002). Evidence that school performance cannot be estimate d without bias when student test scores are aggregated at a single point in tim e or with precision when successive student cohorts are compared has led a n umber of authors to argue for the use of longitudinal analyses of individual stud ent performance as a more direct and accurate estimate of school effects (Barton & C oley, 1998; Bryk & Raudenbush, 1988; Linn & Haug, 2002). For example, Goldstein (1991, p. 14), describing school effectiveness studies in Britain, stated that “...It is now recognised...that intake achievement is the single most important factor af fecting subsequent achievement, and that the only fair way to compare schools is on the basis of how much progress pupils make during their time at school.” Student progress can be measured by comparing year-to-year differences in individual performance, but the most appropriate methodology f or measuring changes in student achievement is through estimation of indivi dual growth trajectories by means of the multilevel model (Bryk & Raudenbush, 1 992, 1987; Willett, 1988; Willms, 1992). In this approach, student test score s are linked across time. A regression function is then fit to the outcome data obtained on each student. The resulting growth trajectories index the rate at whi ch students acquire certain academic competencies. A measure of school performa nce follows from averaging the individual growth trajectories within each scho ol. Multilevel, longitudinal analyses of individual stu dent performance may allow the conceptualization of the most relevant and direct o utcome measures of school effectiveness by facilitating estimation of the add ed benefit or “value” that students receive by attending a particular school (Boyle & W illms, 2001; Bryk & Raudenbush, 1988; Willms, 1992). The multilevel, lo ngitudinal model facilitates value-added school performance estimates by providi ng a degree of control over a wealth of confounding factors that otherwise compli cate the evaluation of school effectiveness. When longitudinal models are used, e ach student serves as his/her


4 of 21 own control for confounding factors that are stable characteristics of the student over time (Sanders & Horn, 1994; Stevens, 2000). Th erefore, the confounding effects of factors like socio-economic status, limi ted English proficiency, and ethnic and cultural differences may be largely controlled through the application of a matched longitudinal design.Despite the potential of using multilevel, longitud inal models to measure and evaluate school performance (Boyle & Willms, 2001; Teddlie & Reynolds, 2000), only a few reported studies have applied these mode ls to student achievement data (e.g., Bryk & Raudenbush, 1988; Willms & Jacobsen, 1990). Given the lack of published examples, the purpose of the present stud y was to provide a demonstration of the use of multilevel, longitudina l models to estimate school effectiveness using a sample of middle school stude nts. We were interested in examining the following research questions: 1) How does student achievement performance vary over a three year period? 2) How m uch of the variation in performance is associated with individual differenc es among the students and how much with differences from school to school? and, 3 ) How does the evaluation of school performance differ based on an examination o f school mean achievement vs. an examination of school average rates of growt h in achievement?Method Participants Standardized test data from middle school students in a large urban school district located in the southwestern United States were anal yzed in the present study. The school district that provided the data has over 100 schools and serves close to 90,000 students annually. The district has a divers e student body. In recent years, the student population has been approximately 46% H ispanic, 44% Anglo, 4% Native American, 3% African American, 2% Asian and 1% other. The district serves many students who are not fully English proficient. On average, twenty percent of the students who attend district schools are classi fied as Limited English Proficient (LEP). The district is also impacted by widespread poverty. In any given year, approximately 35% of the district’s middle school s tudents receive a free or a reduced price lunch.At the middle school level, the school district has 24 schools that serve over 20,000 students in grades 6 through 8. All sixth, seventh, and eighth grade students are tested annually on a norm-referenced achievement te st, the TerraNova/CTBS5 Survey Plus (CTB/McGraw-Hill, 1997). Approximately 6,500 students in each grade take the test each spring. Achievement data from st udents who were in sixth grade in 1998-99, seventh grade in 1999-00, and eighth gr ade in 2000-01 were analyzed in the present study. Middle school students were u sed because they provided the only cohort on which three consecutive years of dat a were available. Mathematics and Language scores were used to provide a demonstr ation using core subject areas and those content areas required in the NCLB legislation. All students who completed an examination in all three study years w ere selected ( N = 4,918). Since the purpose of the study was to examine school effe cts, 800 students who did not attend the same middle school in all three years we re excluded resulting in a sample of 4,118 students. Any student who did not h ave a mathematics or


5 of 21 language composite score in all three study years w as excluded as well as any student who received a modified test administration in any of the three years. This resulted in a final sample of 3,299 students nested within 24 middle schools. The sample consisted of almost equal numbers of mal es and females. Fifty-one percent of the sample were female ( N = 1,698); forty-nine percent were male ( N = 1,601). Representation of ethnic groups was more va riable. Forty-six percent ( N = 1,524) of the sample were Anglo, 45% ( N = 1,495) were Hispanic, 3% ( N = 87) were African American, 3% ( N = 86) were Native American, and 2% (N = 67) were of Asian descent. The ethnic background of 40 stude nts was not identified. Thirty-five percent ( N = 1,152) of the sample received a free or a reduce d price lunch, 12% ( N = 390) were classified as LEP, and 3% ( N = 98) were special education students. In most respects, the backgroun ds of students in the analytic sample were representative of the students who atte nd district middle schools. However, the exclusion of students who did not part icipate in all three test administrations, students who transferred schools, and students who received at least one modified test administration did lower th e percentage of free lunch recipients and the percentage of LEP and special ed ucation students below district averages by 1%, 8%, and 18%, respectively. Nonethel ess, the disproportionate exclusion of students from special populations did not affect the pattern of school mean achievement. Correlations between school mean achievement in grade 6 for the original and the analytic sample were .98 for m athematics and .99 for language. Measures Achievement data used in the study were student sco res on the TerraNova/CTBS5 Survey Plus, a standardized, norm referenced achiev ement test (CTB/McGraw-Hill, 1997). The Survey Plus is a test battery that spans grades 2 through 12. All test items are selected-response. The Survey Plus tests students in Reading, Language Arts, Mathematics, Science, Social Studies, Word An alysis, Vocabulary, Language Mechanics, Spelling, and Mathematics Computation. C TB/McGraw-Hill calculates an IRT derived score for each student in each subje ct area. CTB/McGraw Hill also provides each student with a weighted composite sco re in Reading, Mathematics, and Language. Student scale scores on the Mathematics and Languag e composites were analyzed in the present study. The mathematics comp osite score is derived from the 31-item Mathematics and the 20-item Mathematics Computation subtests. According to the publisher, the Mathematics subtest measures a student’s ability to apply grade appropriate mathematical concepts and p rocedures to a range of problem-solving situations. The Mathematics Computa tion subtest measures a student’s ability to perform arithmetic operations on grade appropriate number types. CTB/McGraw-Hill reported a KR-20 reliability estimate of .86 for the Mathematics subtest in the 6th, 7th, and 8th grade norming samples. For Mathematics Computation, KR-20 was .83 in grade 6, .80 in grade 7, and .85 in grade 8. For the Mathematics composite, KR-20 was r eported at .91 in grade 6, .90 in grade 7, and .92 in grade 8 (CTB/McGraw-Hill, 19 97). The Language composite was also derived from a weig hted combination of two subtests. The 22-item Language subtest is intended to measure a student’s ability to understand the structure and usage of words in s tandard written English. The


6 of 21 20-item Language Mechanics subtest is designed to m easure a student’s ability to edit and proofread standard written English. CTB/Mc Graw-Hill reported KR-20 reliability estimates of .86 in grade 6, .84 in gra de 7, and .81 in the grade 8 norm groups. For Language Mechanics, KR-20 was reported as .84 in grades 6 and 7 and .85 in grade 8. For the Language composite, KR20 was .91 in grades 6 and 7 and .90 in grade 8 (CTB/McGraw-Hill, 1997).Analytic Procedures Multilevel modeling techniques were used to estimat e a mean achievement score and a mean growth trajectory for each school. The H ierarchical Linear Modeling (HLM) program, version 5.04 (Raudenbush, Bryk, Cheo ng, & Congdon, 2001) was used to estimate three-level longitudinal models. L evel-1 was composed of a longitudinal growth model that fitted a linear regr ession function to each individual student’s achievement scores over the three years s tudied (grades 6, 7, and 8). Equation 1 specifies the level-1 model: (1) Ytij = 0ij + 1ij(Year)+ etij As written, tij is the outcome (i.e., mathematics or language achi evement) at time t for student i in school j, 0ij is the initial status of student ij (i.e., 6th grade performance), 1ij is the linear growth rate across grades 6-8 for st udent ij and etijis a residual term representing unexplained variati on from the latent growth trajectory. Levels 2 and 3 in the HLM model estimat e mean growth trajectories in terms of both initial status and growth rate across students (equations 2a and 2b) and across schools (equations 3a and 3b): (2a) 0ij = 00j + r0ij (2b) 1ij = 10j + r1ij (3a) 00j = 000 + u00j (3b) 10j = 100 + u10jThe initial status and growth of student achievemen t in equations 2a and 2b is conceived as a function of the school average achie vement or school average slope and residual. Similarly, the initial status a nd growth by school in equations 3a and 3b is conceived as a function of the grand mean achievement or the grand mean slope and residual. Equations 3a and 3b were u sed to calculate estimates of school mean achievement and school mean growth repo rted in the present study.Results Model Assumptions Visual examination of univariate frequency distribu tions and a check of summary statistics revealed that mathematics and language a chievement scores were distributed normally (i.e., skew and kurtosis value s < 1) in all three study years. A check of within-subject bivariate plots revealed li near relationships between


7 of 21 achievement scores across the three study years in both mathematics and language. After checking model assumptions, three S PSS data files were transferred to the HLM program for analysis. The Le vel-1 data file contained student and school identifiers, three years of stud ent mathematics and language composite scale scores, and a field for year. This file contained 9,897 records (i.e., three records for each of 3,299 students). The Leve l-2 data file contained student and school identifiers (N = 3,299). The Level-3 dat a file contained only a school identifier (N = 24). Mathematics Table 1 presents the results of the three-level HLM model for mathematics. In the upper portion of the table, the results of the fixe d effects regression model are presented. The first estimate shown, the grand mean (000), is the intercept or the average 6th grade mathematics scale score for all s tudents in the sample. The second estimate, the grand slope (100), is the average yearly growth rate of the students. Thus, in this sample, the average mathema tics score is estimated as 659.43 and on average, student mathematics achievem ent is expected to increase by 18.40 scale score points per year.Table 1 Three-Level Unconditional Model for Mathematics Ach ievementFixed EffectCoefficientSEtSchool MeanAchievement, 000659.432.97222.20* School Mean Growth,10018.400.9319.86* Random Effect Variance Component dfChi-square Individual Achievement, r0ij766.0032758828.40* Individual Growth, r1ij26.6632753791.30* Level-1 Error, etij310.87 School Mean Achievement, u00j203.03 23704.70* School Mean Growth, u10j19.11 23 359.31* Level-1 Coefficient Percentage of Variation Between Schools


8 of 21 Individual Achievement, 0ij21.0 Individual Growth, 1ij41.8 Note. Results based on data from 3,299 students dis tributed across 24 middle schools.* p < .001 Figure 1. Relationship between mathematics achievem ent and mathematics growth by school.Estimates of student and school-level parameter var iance are presented next in Table 1. Chi-square tests of the hypotheses that st udents and schools differ in level and growth of mathematics achievement indicate that there was statistically significant variation across all parameters. Both s tudents and schools differ significantly in initial achievement levels and the rate of achievement growth. This indicates that there are individual differences fro m one student to another in mathematics achievement initially in grade 6 as wel l as in the rate of growth in mathematics achievement throughout middle school. I n addition, inspection of the variance components presented at the bottom of Tabl e 1 show that the amount of between school variance in mathematics mean achieve ment (21.0%) and mean achievement growth (41.8%) is also relatively large and statistically significant.


9 of 21 Thus, over and above individual differences, there are systematic differences from one school to another in mean mathematics achieveme nt initially in grade 6 and in the school average rate of growth in mathematics ac hievement from the 6th to the 8th grade.To further illustrate these observed school level d ifferences in mathematics achievement, Empirical Bayes (EB) estimates of the 24 middle-school mathematics mean achievement and mean growth rates are presente d in the scatterplot in Figure 1 on the vertical and horizontal axes, respe ctively. The horizontal line in the interior of the figure represents the grand mean ac hievement in mathematics. The vertical line in the interior of the figure represe nts the grand mean growth in mathematics. The two grand mean reference lines are used to classify schools into four quadrants of school performance. The upper rig ht quadrant contains schools with above average mean achievement in grade 6 and above average growth from grades 6 to 8. The lower right quadrant contains sc hools with below average mean scores, but above average growth. The two quadrants on the left side of the figure contain schools with below average growth and eithe r high or low mean achievement. A number of interesting results can be seen in Figure 1. First, two schools (22 and 13) with low mean scores record the highest growth in the district. Strong growth is also evident in high scoring schoo l 7. Also evident in Figure 1 is a school (8) with a low mean score and very poor math ematics growth. Relatively poor mathematics growth also occurs in two schools with high 6th grade mathematics achievement. Schools 21 and 18, second and third in 6th grade mean achievement, are noticeably below the district aver age in mathematics growth. Overall, a slight positive relationship exists betw een mathematics mean achievement and mathematics mean growth (tb = .14). On average, schools with low mean scores record less growth than schools wit h high mean scores. Appendix A presents the individual school mean and school gr owth estimates on which Figure 1 is based. Figure 2 displays the school estimates in growth tr ajectory form. Each line in Figure 2 shows the average mathematics achievement at one of the 24 middle schools. As can be seen, there is a good deal of variation from school-to-school both in initial status (i.e., grade 6 mean achievement) and in the average rates of growth over time. The variation in average mathematics growth r ates is indicated in part by the number of crossing lines in the figure. Alternative line styles are used to highlight schools with exceptionally high or low growth rates Schools with a high growth rate are represented by the broken dot pattern. Schools with a low growth rate are represented by the broken line pattern.


10 of 21 Figure 2. Mean mathematics achievement as a functio n of grade level and school location. In Figure 2, the strong growth of two of the school s with low 6th grade achievement levels can be clearly seen. The school with the low est 6th grade mean score (24th in rank) shows average achievement growth 1.4 times the overall average for mathematics growth. By 8th grade, the rank of the s chool has changed from 24th to 18th in average mathematics achievement. Similarly, the school that ranks 21st in mean performance in 6th grade, also shows average a chievement growth 1.4 times the average and moves to a rank of 16th by the end of 8th grade. Strong growth is also apparent in some of the schools with high 6th grade mean scores. For example, the 10th ranked school in 6th grade mathem atics achievement becomes the 6th ranked school in 8th grade mathematics achi evement. Schools with lower than average mathematics growth are also readily ap parent in Figure 2. The third ranked school in 6th grade mathematics performance falls to seventh ranked in 8th grade performance. In addition, the 22nd ranked sch ool in 6th grade achievement not only becomes the lowest ranked school by the en d of 8th grade, but by achieving at only 39% of the overall average for ma thematics growth, also falls far behind the achievement level of all other middle sc hools in the district. Language The same three-level, longitudinal HLM model was ap plied to the language achievement scores of the same sample of students. Table 2 presents these results. As can be seen in Table 2, model results f or language achievement were similar to those for mathematics achievement. Excep t for variation in student growth rates, all parameters of the three-level HLM model were statistically significant. The average language achievement for a ll students across the 24 middle schools was 661.43 in grade 6 and the averag e yearly growth in language achievement was 12.30 score points. Inspection of t he variance components from the language model shows that while there is statis tically significant individual


11 of 21 variation in students’ initial language achievement in grade 6, individual language growth rates do not differ statistically. Table 2 a lso shows that school growth rates in language are less variable than school growth ra tes in mathematics. Of the variation that does exist in language growth rates, 84% was between school variance. Thus, in the case of language achievement students differed in their average language achievement at grade 6 but showed rates of growth in language achievement that did not differ significantly. At t he school level, there were statistically significant differences in average ac hievement at grade 6 and in average rates of growth in language achievement thr ough grade 8.Table 2 Three-Level Unconditional Model for Language Achiev ementFixed EffectCoefficientSEtSchool MeanAchievement, 000661.432.58256.55* School Mean Growth,10012.300.4527.44* Random Effect Variance Component dfChi-square Individual Achievement, r0ij699.3532758836.36* Individual Growth, r1ij0.6832753226.22+ Level-1 Error, etij332.11 School Mean Achievement, u00j151.66 23588.34* School Mean Growth, u10j3.51 23 91.84* Level-1 Coefficient Percentage of Variation Between Schools Individual Achievement, 0ij17.8 Individual Growth, 1ij83.8 Note. Results based on data from 3,299 students dis tributed across 24 middle schools.* p < .001Empirical Bayes estimates of the 24 middle-school l anguage means and growth


12 of 21 rates are displayed in Figure 3. Instances of all f our patterns of achievement described for the mathematics achievement results a re also present in Figure 3. School 22 demonstrates high growth relative to its 6th grade mean language achievement while growth is low for school 8 in lan guage as it was in mathematics. As with the mathematics results, school 18 again de monstrates low growth relative to 6th grade mean achievement while school 7 shows a high growth rate in language achievement. Overall, the relationship bet ween language mean achievement and language mean growth is positive (tb = .41). On average, schools with low language mean scores showed less growth th an schools with high language mean scores. School language achievement m eans and growth rate estimates are presented in Appendix A. Figure 3. Relationship between language achievement and language growth by school. Figure 4 displays these results in growth trajector y form. Alternative line styles are again used to represent schools with exceptionally high or low growth rates. The figure shows that, while middle schools differ subs tantially in mean language achievement at grade 6, the rate of language achiev ement growth is more similar across schools as evidenced by the parallel pattern of the growth trajectories. Relative to mathematics, fewer schools change rank position. Some possibly important differences in growth rate still exist ho wever. The third ranked school in 6th grade language performance increased its relati ve standing over other schools in the district. Conversely, the 23rd ranked school in 6th grade language


13 of 21 performance becomes the lowest ranked school by the end of 8th grade. Figure 4. Mean language achievement as a function o f grade level and school location.Discussion The purpose of the present study was to apply multi level, longitudinal models in an analysis of school effectiveness in mathematics and language achievement and to demonstrate how assessments of school performance c an differ based on how data are modeled and analyzed. The present study de monstrated that assessments of school performance varied across mat hematics and language achievement measures. Estimates of the proportions of variance in achievement associated with individual students and with middle schools showed that significant proportions of variation were associated with schoo l-to-school differences in performance. In the current sample, 21 percent of t he unadjusted variation in mathematics achievement and 42 percent of the unadj usted variation in mathematics growth were attributable to between-sch ool differences. For language, 18 percent of the unadjusted variation in school ac hievement means and 84 percent of the unadjusted variation in school growt h trajectories were associated with school-to-school differences.The present study also showed that evaluations of s chool performance differ depending on whether school mean achievement or sch ool mean growth in achievement are examined. There was significant var iation in the mean achievement of students in mathematics and language both from student-to-student and from school-to-school. The a nalyses also showed that there was significant variation in the rate of achievemen t growth from student-to-student and from school-to-school for mathematics and from school-to-school for language. Using results from the multilevel, longitudinal mod els, mean achievement and


14 of 21 mean growth were estimated for each middle school i n mathematics and language. Evaluation of these estimates showed that the schoo l mean level of performance was not strongly predictive of the school mean rate of growth. Correlations of school mean and school growth estimates were only 14 for mathematics and .41 for language. Inspection of Figures 1 and 3 showed that characterization of school performance is substantially different depending on whether mean achievement or mean growth is examined. In several cases, schools with low mean scores were not always “poor performing” schools. In fact, scho ols with low mean scores were in many cases the schools with the largest growth rate s. Conversely, a high mean achievement score was not always a clear indicator of “good performance”. In several instances, schools with high mean scores ha d growth rates below the district average.The demonstration that school performance can vary on the basis of the analytic model applied suggests that it is essential to use evaluative models that do not unfairly reward or penalize schools for factors tha t are beyond the control of school personnel (Hanushek & Raymond, 2001; Ladd, 2001). C urrent evaluative practice often falls short of this goal. The accountability systems now in use in many states apply evaluative methods that cannot separate schoo l level variation from student level variation or validly disentangle school effec ts from factors that are outside the control of educational policy and practice at the s chool (Stevens, Estrada & Parkes, 2000). One of the most common approaches in state a ccountability systems is the use of the school mean as the only or the key compo nent in the evaluation of school effectiveness. As an evaluative measure, the school mean has widespread appeal. School means are easily calculated and are readily understood. However, the school mean is also a biased indicator of schoo l performance (Heck, 2000). School means reflect all influences on student perf ormance, including those exogenous to the school (e.g., family background, p rior achievement, community context). As a result, the school mean often provid es a misleading picture of school performance. Schools with advantaged intakes tend t o be evaluated more favorably than schools with disadvantaged intakes, regardless of the impact the school has on students over time (Stevens et al., 2000).Another option for assessing school performance is available to those states or districts that collect comprehensive data on studen t background. School means can be adjusted on the basis on student characteristics prior achievement levels, and community characteristics in an attempt to arrive a t a mean value that isolates the contribution of school practice and policy (Clotfel ter & Ladd, 1996; Raudenbush & Willms, 1995; Willms, 1992). However, these data ar e often difficult for states and districts to adequately and accurately collect and analyze. Adjusted school means also present states and districts with two unwanted concerns. One concern has to do with public response to having a lower standard of performance for certain special student populations. The second stems from the difficulty of having to convey the meaning of complex statistical adjustmen ts to parents, teachers, and school administrators (Clotfelter & Ladd, 1996; Elm ore et al., 1996). An alternative to the adjusted school mean is a mea sure based on changes in students’ academic achievement over time. As a meas ure of school performance, school mean growth may offer a more tractable and a ccurate method of adjusting for socio-demographic characteristics by providing control over confounding influences associated with the stable characteristi cs of students (Haertel, 1999;


15 of 21 Lane & Stone, 2002; Stevens et al., 2000). Repeated measurement of individual students provides control over the background and i ntake characteristics that strongly impact the level at which a student performs by shifting the measur ement process from indexing student performance at a sing le point in time to tracking the rate of pupil progress over time (Sanders & Horn, 1 994). The calculation of student growth rates thus enable schools to be evaluated in terms of the gains students make instead of the level at which students start, thereby enabling a more valid comparison of schools that differ in the intake cha racteristics of their student bodies. Despite the promise of using multilevel, longitudin al models to measure school performance, very few states have accountability sy stems that track individual students over time or use analytic methods that acc ount for the hierarchical structure of accountability data (Council of Chief State School Officers, 2001; Education Week, 2001, 2002). However, the importanc e of basing an accountability system on an outcome measure that can be impacted b y school practice and policy cannot be overstated. If school effectiveness is no t evaluated in a way that actually reflects school practices and policies but instead reflects student intake characteristics, the state system for evaluating sc hool performance can become a source for flawed decision-making and a target of c riticism and possible litigation by disgruntled stakeholders (Parkes & Stevens, in pres s). Misguided assessment policy can thus stall attempts at constructive scho ol-based change and effectively undermine the intent of the accountability system.The present study demonstrated that assessments of school performance vary on the basis of the analytic methods applied to the da ta. Depending on whether schools were evaluated in terms of mean achievement or mean growth, assessments of school performance were shown to dif fer dramatically. In some cases, schools with low mean scores had high growth rates and schools with high mean scores had low growth rates. These results sug gest that states should not rely on the school mean as the sole indicator of sc hool effectiveness. Instead, consideration should be given to incorporating into school accountability systems measures that track student learning or growth over time. The importance of assessing student growth is further underscored by the amount of variation in growth rates that can be attributed to school-to-sc hool differences. In the present study, school differences in mean growth were two t imes greater than school differences in mean achievement in mathematics and over four times greater than school differences in mean achievement in language. Identification of large amounts of school level variation in growth rates s uggests that schools can have substantial influence on student achievement. Futur e research that examines the influence of school demographic factors, school cli mate, and school practice and policy on school growth trajectories will begin to facilitate our understanding of why some schools are more effective at promoting studen t growth in achievement than others.ReferencesAitkin, M. & Longford, N. (1986). Statistical model ing issues in school effectiveness studies. Journal of the Royal Statistical Society, Series A, 149 1-43. Barton, P. & Coley, R. (1998). Growth in school: Achievement gains from the fourth


16 of 21 to the eighthgrade Princeton, NJ: Educational Testing Service. Boyle, M.H., & Willms, J.D. (2001). Multilevel mode ling of hierarchical data in developmental studies. Journal of Child Psychology and Psychiatry, 42 141-162. Brown, S. (1994). School effectiveness research and the evaluation of schools. Evaluation and Research in Education, 8 55-68. Bryk, A.S., & Raudenbush, S.W. (1987). Application of hierarchical linear models to assessing change. Psychological Bulletin, 101 147-158. Bryk, A.S., & Raudenbush, S.W. (1988). Toward a mor e appropriate conceptualization of research on school effects: A three-level hierarchical linear model. In R.D. Bock (Ed.), Multilevel analysis of educational data (pp. 159-204). San Diego: CA: Academic Press.Bryk, A.S., & Raudenbush, S.W. (1992). Hierarchical linear models: Applications and data analysis methods Newbury Park: SAGE. Burstein, L. (1980). The analysis of multilevel dat a in educational research and evaluation. In D.C. Berliner (Ed.), Review of research in education (vol.8). Washington, DC: American Educational Research Assoc iation Clotfelter, C.T., & Ladd, H.F. (1996). Recognizing and rewarding success in public schools. In H.F. Ladd (Ed.), Holding schools accountable: Performance-based reform in education (pp. 23-63). Washington, DC: The Brookings Institu tion. Council of Chief State School Officers (2001). Annual Survey: State Student Assessmenti Programs, Vols. 1 and 2 (1999-2000 data ) .Washington, D. C.: Council of Chief State School Officers.Cronbach, L.J. (1976). Research on classrooms and schools: Formulation of questions, design, andanalysis An Occasional Paper, Stanford, CA: Stanford Evaluation Consortium.Cronbach, L. & Webb, (1975). Between and within cla ss effects in a reported aptitude by treatment interaction: Reanalysis of a study by G.L. Anderson. Journal of Educational Psychology, 6, 717-724. CTB/McGraw-Hill (1997). TerraNovaTechnical Bulletin 1 Monterey, CA: Author. Education Week. (2001). Quality counts 2001: A better balance (Volume 20, Number 17, January 11, 2001). Bethesda, MD: Author.Education Week. (2002). Quality counts 2002: Building blocks for success (Volume 21, Number 17, January 10, 2002). Bethesda, MD: Author. Elmore, R. F., Abelmann, C.H., & Fuhrman, S.H. (199 6). The new accountability in state education reform: From process to performance In H.F. Ladd (Ed.), Holding schools accountable: Performance-based reform in education (pp. 65-98). Washington, DC: The Brookings Institution.


17 of 21 Goldstein, H. (1991). Better ways to compare school s? Journal of Educational Statistics, 16 (2), 89-92. Goldstein, H.I. (1988). Comparing schools. In H. To rrance (Ed.). National assessment and testing: A research response London: BERA. Haertel, E. H. (1999). Validity arguments for highstakes testing: In search of the evidence. Educational Measurement: Issues and Practice, 18 (4), 5-9. Hanushek, E.A., & Raymond, M.E. (2001). The confusi ng world of educational accountability. National Tax Journal, 54 365-384. Heck, R.H. (2000). Examining the impact of school q uality on school outcomes and improvement: A value-added approach. Educational Administration Quarterly, 36 513-552.Kane, T.J., & Staiger, D.O. (2002). Volatility in s chool test scores: Implications for test-based accountability systems. Brookings Papers on Educational Policy, 1 235-283.Ladd, H.F. (2001). School-based educational account ability systems: The promise and the pitfalls. National Tax Journal, 54 385-400. Lane, S., & Stone, C. A. (2002). Strategies for exa mining the consequences of assessment and accountability programs. Educational Measurement: Issues and Practice, 21 (1), 23-30. Linn, R.L. (2000). Assessments and accountability. Educational Researcher, 29 4-16.Linn, R.L., & Haug, C. (2002). Stability of schoolbuilding accountability scores and gains. Educational Evaluation and Policy Analysis, 24 (1), 29-36. No Child Left Behind Act of 2001, Pub. L. No. 107-1 10 (2002). Parkes, J. & Stevens, J. J. (in press). Legal issue s in school accountability systems. Applied Measurement in Education Raudenbush, S.W., Bryk, A.S., Cheong, Y.F., & Congd on, R.T. (2001). HLM 5: Hierarchical linearand nonlinear modeling Chicago: Scientific Software International.Raudenbush, S.W., & Willms, J.D. (1995). The estima tion of school effects. Journal of Educational and Behavioral Statistics, 20 307-335. Sanders, W.L., & Horn, S.P. (1994). The Tennessee v alue-added assessment system (TVAAS): Mixed-model methodology in educatio nal assessment. Journal of Personnel Evaluation in Education, 8 299-311. Stevens, J. J. (2000). Educational accountability systems: Issues and recommendations for New Mexico (Technical Report). New Mexico State Department of Education.


18 of 21 Stevens, J. J., Estrada, S., & Parkes, J. (2000). Measurementissues in the design of state accountabilitysystems Paper presented at the annual meeting of the American Educational Research Association,Willett, J.B. (1988). Questions and answers in the measurement of change. In E. Rothkopf (Ed.), Review of research in education 1988-89 (pp. 345-422). Washington: American Educational Research Associati on. Willms, J.D. (1992). Monitoring school performance: A guide for educator s Washington, DC: FalmerPress.Willms, J.D. & Jacobsen, S. (1990). Growth in mathe matics skills during the intermediate years: Sex differences and school effe cts .International Journal of Educational Research, 14 157-174. About the AuthorsKeith ZvochAlbuquerque Public SchoolsAlbuquerque, NMEmail: zvoch@aps.eduKeith Zvoch earned a doctorate in Quantitative Meth ods from the Educational Psychology Program at the University of New Mexico in 2001. He is currently a research scientist for the Albuquerque Public Schoo l District. Dr. Zvoch also teaches research methods and statistics at the Univ ersity of New Mexico on an adjunct basis. His current research interest is the measurement and assessment of school effects.Joseph J. StevensUniversity of New MexicoCollege of EducationEducational Psychology ProgramSimpson HallAlbuquerque, NM 87131Email: jstevens@unm.eduJoseph J. Stevens is a Professor of Educational Psy chology at the University of New Mexico. Dr. Stevens' research concerns applicat ions and validity of large-scale assessment systems, the evaluation of accountabilit y systems and school effectiveness, and the assessment of cognitive dive rsity.Appendix A Sample Sizes and Empirical Bayes Achievement Mean a nd Slope Estimates by School and Content Area MathematicsLanguage


19 of 21 SchoolMeanSlopeMeanSlopeN 1653.4017.34656.6712.65712642.7816.15649.2910.261043645.2416.32649.5213.271154668.9216.16667.7111.041655639.6718.12645.189.831686647.6213.98656.4513.50687670.8224.22678.1215.161788640.747.15641.769.391299691.4819.83687.5114.15168 10652.4911.90651.859.228811672.4921.99678.7513.3123212665.5516.95662.5713.0913113636.9025.33641.2012.8310314662.6423.15667.0213.6221615660.3617.55659.8210.8211816672.4722.75672.4113.2124217663.6320.41664.2213.6110918678.3514.64674.6011.0012519663.0622.02669.6613.3117220657.5918.23653.2411.6913621679.5016.27673.8312.3016422642.4624.94654.6614.409723656.6419.08660.7012.9112324661.6317.07657.6310.5777 The World Wide Web address for the Education Policy Analysis Archives is Editor: Gene V Glass, Arizona State University


20 of 21 Production Assistant: Chris Murrell, Arizona State University General questions about appropriateness of topics o r particular articles may be addressed to the Editor, Gene V Glass, or reach him at College of Education, Arizona State Un iversity, Tempe, AZ 85287-2411. The Commentary Editor is Casey D. Cobb: .EPAA Editorial Board Michael W. Apple University of Wisconsin David C. Berliner Arizona State University Greg Camilli Rutgers University Linda Darling-Hammond Stanford University Sherman Dorn University of South Florida Mark E. Fetler California Commission on Teacher Credentialing Gustavo E. Fischman California State Univeristy–LosAngeles Richard Garlikov Birmingham, Alabama Thomas F. Green Syracuse University Aimee Howley Ohio University Craig B. Howley Appalachia Educational Laboratory William Hunter University of Ontario Institute ofTechnology Patricia Fey Jarvis Seattle, Washington Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Green Mountain College Les McLean University of Toronto Heinrich Mintrop University of California, Los Angeles Michele Moses Arizona State University Gary Orfield Harvard University Anthony G. Rud Jr. Purdue University Jay Paredes Scribner University of Missouri Michael Scriven University of Auckland Lorrie A. Shepard University of Colorado, Boulder Robert E. Stake University of Illinois—UC Kevin Welner University of Colorado, Boulder Terrence G. Wiley Arizona State University John Willinsky University of British Columbia


21 of 21 EPAA Spanish Language Editorial BoardAssociate Editor for Spanish Language Roberto Rodrguez Gmez Universidad Nacional Autnoma de Mxico Adrin Acosta (Mxico) Universidad de J. Flix Angulo Rasco (Spain) Universidad de Teresa Bracho (Mxico) Centro de Investigacin y DocenciaEconmica-CIDEbracho Alejandro Canales (Mxico) Universidad Nacional Autnoma Ursula Casanova (U.S.A.) Arizona State Jos Contreras Domingo Universitat de Barcelona Erwin Epstein (U.S.A.) Loyola University of Josu Gonzlez (U.S.A.) Arizona State Rollin Kent (Mxico) Universidad Autnoma de Puebla Mara Beatriz Luce (Brazil) Universidad Federal de Rio Grande Javier Mendoza Rojas (Mxico)Universidad Nacional Autnoma Marcela Mollis (Argentina)Universidad de Buenos Humberto Muoz Garca (Mxico) Universidad Nacional Autnoma Angel Ignacio Prez Gmez (Spain)Universidad de Daniel Schugurensky (Argentina-Canad) OISE/UT, Simon Schwartzman (Brazil) American Institutes forResesarch–Brazil (AIRBrasil) Jurjo Torres Santom (Spain) Universidad de A Carlos Alberto Torres (U.S.A.) University of California, Los EPAA is published by the Education Policy Studies Laboratory, Arizona State University

xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c20039999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00318
0 245
Educational policy analysis archives.
n Vol. 11, no. 20 (July 08, 2003).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c July 08, 2003
Multilevel, longitudinal analysis of middle school math and language achievement / Keith Zvoch [and] Joseph J. Stevens.
x Research
v Periodicals.
2 710
Arizona State University.
University of South Florida.
1 773
t Education Policy Analysis Archives (EPAA)
4 856

xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 11issue 20series Year mods:caption 20032003Month July7Day 88mods:originInfo mods:dateIssued iso8601 2003-07-08