xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c20069999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00504
Educational policy analysis archives.
n Vol. 14, no. 31 (November 20, 2006).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c November 20, 2006
No More Aggregate NAEP Studies? / Sherman Dorn.
Arizona State University.
University of South Florida.
t Education Policy Analysis Archives (EPAA)
xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 14issue 31series Year mods:caption 20062006Month November11Day 2020mods:originInfo mods:dateIssued iso8601 2006-11-20
Readers are free to copy display, and distribute this article, as long as the work is attributed to the author(s) and Education Policy Analysis Archives, it is distributed for noncommercial purposes only, and no alte ration or transformation is made in the work. More details of this Creative Commons license are available at http://creativecommons.org/licen ses/by-nc-nd/2.5/. All other uses must be approved by the author(s) or EPAA EPAA is published jointly by the Mary Lou Fulton College of Education at Arizona State Universi ty and the College of Educ ation at University of South Florida. Articles are indexed by H.W. W ilson & Co. Please contribute commentary at http://epaa.info/wordpress/ and send errata notes to Sherman Dorn (email@example.com). EDUCATION POLICY ANALYSIS ARCHIVES A peer-reviewed sc holarly journal Editor: Sherman Dorn College of Education University of South Florida Volume 14 Number 31 Novemb er 20, 2006 I SSN 1068 No More Aggregat e NAEP Studies? Sherman Dorn, Editor Education Policy Analysis Archives Citation: Dorn, S. (2006). No more ag gregate NAEP studie s? [editorial]. Education Policy Analysis Archives, 14 (31). Retrieved [date] from http://epaa.asu.edu/epaa/v14n31/. Abstract This editorial reviews rece nt studies of accountabilit y policies using National Assessment of Educational Progress (N AEP) data and compares the use of aggregate NAEP data to the availability of individual-level data from NAEP. While the individual-level NAEP da ta sets are restricted-access and do not give accurate point-estimates of achievement, they non etheless provide grea ter opportunity to conduct more appropriate multi-level anal yses with state polic ies as one set of variables. Policy analysts using NAEP data should still look at exclusion rates and the non-longitudinal nature of the NAEP data sets. Keywords: accountability; multi-level analysis; mult iple imputation; National Assessment of Educational Progress (NAEP). Resumen Este trabajo editorial examina estudios recientes sobre polticas de responsabilidad de gestin que usan datos de la Evaluacin Nacional del Progreso Educativo (NAEP) y compara el uso de datos agregados de la NAEP con datos por nivel individual de la misma NAEP An cuando los sets de datos por nivel individual de la NAEP son de acceso restringido y no proporcionan puntos de estimacin de logro acadmico precisos, estos datos pr oporcionan una buena oportunidad para realizar anlisis multinivel de las polticas educativas estatales constituidas como un set de variables. Los qu e hacen anlisis de polticas usando los datos proporcionados por la NAEP deben siempre tener el cuid ado de observar las tasas
Education Policy Analysis Archives Vol. 14 No. 31 2 de exclusin y la naturaleza no longitudin al de los sets de datos de la NAEP. Palabras clave: responsabilidad de gestin; anlisis mult inivel; imputaci n mltiple; Evaluacin Nacional del Pr ogreso Educativo (NAEP). With Marchant, Paulson, and Shunks (2006) an alysis of National Assessment of Educational Progress (NAEP) results aggregated at the state level, Education Policy Analysis Archives publishes its tenth article that analyzes education accountability policy usi ng state-level NAEP data (see Amrein & Berliner, 2002; Amrein-Beardsley & Ber liner, 2003; Braun, 2004; Camilli, 2000; Klein, Hamilton, McCaffrey, & Stecher, 2000, 2005; Nichol s, Glass, & Berliner, 2006; Rosenshine, 2003; Toenjes, 2005). Until recently, individual-level da ta were unavailable, and aggregate NAEP data has served as a fundamental basis for policy di scussions of high-stakes accountability. Research using the aggregate-level data has ex panded both our knowledge of accountabilitys effects and the questions that are worth investig ating. While test scores do not capture all the consequences of high-stakes accountability, analyzing student achievement is important in deciding whether the policies have face validitydoes high-stakes accountability influence what its advocates think is important? Carnoy and Loeb (2002) and Grissmer, Flanagan, Kawata, and Williamson (2000) used aggregate NAEP level to claim beneficial effects for high-stakes accountability. Klein, Hamilton, McCaffrey, and Stecher (2000) focused specifically on Texas, suggesting that Grissmer et al.s analysis overesti mated the effects. Amrein and Berliner (2002) argued that quasi-longitudinal measures of ac hievement on NAEP with two-group measures of stakes did not suggest positive consequences of high-stakes accountability. Rosenshine (2003) disagreed and the original study authors responded (Amrein-Beardsley & Berliner, 2003). Nichols, Glass, and Berliner (2006) and now Marchant, Paulson, and Shunk (2006) suggest that national evidence of the effects of high-stakes accountability is relatively weak, especially for reading, and that the only NAEP aggregate evidence supporting effects from high-stakes accountability (either for raising achievement in general or for closing the achievement gap) appears for math (also see Hanushek & Raymond, 2006, for math only). There are four sticking points with NAEP re search cited above. One methodological and substantive issue is the definition and measurem ent of high-stakes accountability. Amrein and Berliner (2002), Carnoy and Loeb (2002), Clarke et al. (2003), Pedulla et al. (2003), Swanson & Stevenson (2002), and Nichols et al. (2006) have wo rked with different measures of accountabilitys consequences for students and educators. State accountability policies are shifting, complex entities; measures of stakes will always include a qualitat ive measure of judgment combining both written policies and evidence of perceived pressures by educators (as street-level bureaucrats; Weatherly & Lipsky, 1977). The most intensive efforts by Ni chols et al. (2006) used Torgersons (1960) method of distilling comparative judgments into a single scale. While they had the resources to calculate such judgments for a set of state policies, they did not have long-range, year-by-year judgments. Nichols et al. then used an experts judgment whose gene ral judgments by state correlated highly with the Torgerson measure of accountability pressures. Given the methodological difficulties and multiple perspectives, Nichols et al. replicated a portion of their analysis using Carnoy and Loebs (2002) scale, a step that responsible researchers in this area should follow. A second sticking point is the non-longitudin al nature of NAEP. The National Assessment of Educational Progress samples students in each state, and there is no follow-up with individual students from assessment to assessment. Analysts have tackled this problem in various ways. The approach of Marchant et al. (2006) is perhaps typic al, looking at single cross-sections, changes in a
No More Aggregate NAEP Studies? 3 single grade from assessment to assessment, and qua si-cohort measures from fourth grade to eighth grade four years later. The implicit reasoning of multiple approaches is that if multiple slices of NAEP lead to similar results, then those diffe rent slices provide confirming evidence for a conclusion. None of those approaches has the advantages of a longitudinal sampling design, but NAEP does not afford that luxury. A third sticking point is the differential rate of exclusions from NAEP samples. To some extent, differences in aggregate achievement measur es are an artifact of changing exclusion rates (Carnoy & Loeb, 2002). This conflation of excl usion rates with underlying achievement makes comparisons more difficult, whether between states, between years within a state and an individual grade, or between years and grades within an individual state (the quasi-cohort approach). Whether via multiple imputation (Rubin, 1987) or through econometric selection models, modeling the selection bias of differential rates of exclusion depends on individual-level data, which are not accessible for state-level analyses. The fourth sticking point is the aggregate na ture of freely-accessible NAEP data. The only unit of analysis available (the state) may not be appropriate either for the most commonly implied research question or for more sophisticated policy analyses. While the main research question of this growing body of research is at the state leveldo state-level high-stakes testing policies lead to higher achievement?the context of the research does not make clear whether the key measure of interest should be at the state level (aggregate achievement or some summary of the achievement gap) or whether it should be at the individual leve l, whether individual student achievement in itself or measures of achievement gaps at the individual level. State-level analysis phrased in terms of individual achievementwhether high-stakes testing leads to higher achievement or lower gaps for individual studentswould be perhaps an expected slip but an ecological fallacy nonetheless. In addition, recent research on accoun tability strongly suggests that the local context is crucial in determining educators responses to high-stakes accountability (e.g., Carnoy Elmore, & Siskin, 2003; Mintrop, 2004; Mintrop & Trujillo, 2005). State-le vel analyseswhich are important for questions about overarching policycannot address local context. The last two sticking points are directly related to the aggregate nature of the existing analyses, tied to the sampling design of NAEP assessments and the perception that such sampling restricted the relevant unit of analysis to the st ate level. Such restrictions no longer exist. The National Assessment on Educational Progress now makes individual-level data available for restricted access by researchers. While point estima tes of individual achievement are not available, the data still are useful: To reduce the test-taking burden on individual students, NAEP administers only a subset of items to each student. Hence, individual students achievement is not measured reliably enough to be assigned a single score. Instead, using Item Response Theory (IRT), NAEP estimates a distribution of plausible values for each students proficiency, based on the students responses to administered items and other student char acteristics. When analyzin g NAEP achievement data, separate analyses are conducted with the five plausible values assigned to each student. The five sets of results are then sy nthesized, followi ng Rubin (1987) on the analysis of multiply-imputed data. (Lubienski, 2006, p. 8). While securing access to and working with rest ricted-access data is mo re onerous and requires greater infrastructure support than researchers working with aggr egate data, recent research in other areas suggests the viability of using the new individual-level data for policy research (e.g., Lubienski, 2006). New software, such as AM (American Institutes of Research, n.d.), has the facility to work with the new individual-level sets.
Education Policy Analysis Archives Vol. 14 No. 31 4 Using the individual-level plausible-value data sets for NAEP would address the ecological problems of existing analyses. To some extent, th e individual-level data may also address selection problems and contextual effects by allowing more sophisticated modeling and multi-level analyses. The existence of individual-level data is not a panacea. Modeling the exclusion bias will still be difficult, and the sampling design of NAEP makes id entifying the proper level of contextual analysis difficult. Nor does individual-level data solve the non-longitudinal nature of NAEP assessments, and in some ways makes them worse by reducing most (but not all) analyses to cross-sections. I will leave the solutions of such problems to more sophist icated researchers. In addition, the availability of individual-level data sets does not address the question of how one measures high stakes Nonetheless, regardless of the questions and problems involved, the existence of individuallevel data for NAEP creates a burden of proof fo r researchers who continue to rely on aggregate data. As an editor, I will look for manuscripts that use the n ew form of NAEP data as an opportunity to conduct more sophisticated analys es. This desire to see quantitative policy researchers use individual-level data does not imply that Education Policy Analysis Archives will only publish individual-level analyses in the future but it does mean that the editor and reviewers will be looking for an acknowledgment of individual -level data and a justification for why aggregatelevel analyses are superior. I suspect editors and reviewers of other journals will have similar reactions. References American Institutes of Research. (n.d.). AM statistical software [program]. Washington, DC: Author. Retrieved Octo ber 30, 2006, from http://am.air.org/ Amrein, A.L. & Berliner, D.C. (2002). High-stakes testing, uncertainty, and student learning Education Policy Analysis Archives, 10 (18). Retrieved Oct ober 30, 2006, from http://epaa.asu.edu/epaa/v10n18/ Amrein-Beardsley, A., & Be rliner, D. (2003). Re-analysis of NAEP math and reading scores in states with and without high-stakes tests: Response to Rosenshine. Education Policy Analysis Archives, 11 (25). Retrieved October 30, 2006, from http://epaa.asu.edu/epaa/v11n25/ Braun, H. (2004). Reconsidering th e impact of high-stakes testing. Education Policy Analysis Archives 12 (1). Retrieved October 30, 2006, from http://epaa.asu. edu/epaa/v12n1/ Camilli, G. (2000). Texas gain s on NAEP: Points of light? Education Policy Analysis Archives 8 (42). Retrieved Octo ber 30, 2006, from http://epaa.asu.edu/epaa/v8n42.html Carnoy, M., Elmore, R., & Siskin, L. S., (Eds.). (2003). The new accountability: High schools and high-stakes testing New York: RoutledgeFarmer. Carnoy, M., & Loeb, S. (2002). Does external accountability affect student outcomes? A CrossState Analysis. Educational Evaluation and Policy Analysis, 24 (4), 3051. Clarke, M., Shore, A., Rhoades, K., Abrams, L., Miao, J., & Li, J. (2003). Perceived effects of statemandated testing programs on teaching and lear ning: Findings from interviews with educators
No More Aggregate NAEP Studies? 5 in low-, medium-, and high-stakes states Boston, MA: Boston College, National Board on Educational Testing and Public Policy. Retrieve d October 30, 2006, from http://www.bc.edu/research/ nbetpp/statements/nbr1.pdf Grissmer, D., Flanagan, A., Kawata J., & Williamson, S. (2000). Improving student achievement: What state NAEP test scores tell us. Santa Monica, CA: RAND, 2000. Hanushek, E. A., & Raymond, M. E. (2006). Early returns from sc hool accountability. In P. E. Peterson, Ed., Generational change: Clos ing the test score gap (pp. 143-166). Lanham, MD: Rowman & Littlefield Publishers, Inc. Klein, S. P., Hamilton, L. S., Mc Caffrey, D. F., & Steche r, B. M. (2000). What do test scores in Texas tell us? Education Policy Analysis Archives, 8 (49). Retrieved October 30, 2006, from http://epaa.asu.edu/epaa/v8n49/ Klein, S. P., Hamilton, L. S., McCaffrey, D. F., & Stecher, B. M. (2005). Respons e to "What do Klein et al. tell us about te st scores in Texas?" Education Policy Analysis Archives, 13(37). Retrieved Oct ober 30, 2006, from http://epaa.asu.edu/epaa/v13n37/ Lubienski, S. T. (2006). Examining instruct ion, achievement, and equity with NAEP mathematics data. Education Policy Analysis Archives, 14 (14). Retrieved October 30, 2006, from http://epaa.asu.edu/epaa/v14n14/ Marchant, G. J., Paulson, S. E., & Shunk, A. (200 6). Relations between high-stakes testing policies and student achievemen t after controlling for demogra phic factors in aggregated data. Education Policy Analysis Archives, 14(30), retrieved November 15, 2006, from http://epaa.asu.edu/epaa/v14n30/ Mintrop, H. (2004). Schools on probation: How accountabi lity works (and doesn't work). New York: Teachers College Press. Mintrop, H. & Trujillo, T.M. (2005).Corrective action in low performing schools: Lessons for NCLB implementation from first-generation accountability systems. Education Policy Analysis Archives, 13 (48). Retrieved October 30, 2006, from http://epaa.asu.edu/epaa/v13n48/ Nichols, S. L., Glass, G. V, & Berliner, D. C. (2006). High-stakes testing and student achievement: Does acco untability pressure increa se student learning? Education Policy Analysis Archives, 14 (1). Retrieved October 30, 2006, from http://epaa.asu.edu/epaa/v14n1/ Pedulla, J. J., Abrams, L. M., Madaus, G. F., Ru ssell, M. K., Ramos, M. A., & Miao, J. (2003). Perceived effects of state-mandated testing prog rams on teaching and learning: Findings from a national survey of teachers Boston, MA: Boston College, National Board on Educational Testing and Public Policy. Retrieve d October 30, 2006, from http://www.bc.edu/research/ nbetpp/statements/nbr2.pdf Rosenshine, B. (2003). High-sta kes testing: Another analysis. Education Policy Analysis Archives, 11 (24). Retrieved Octo ber 30, 2006, from http://epaa.asu.edu/epaa/v11n24/
Education Policy Analysis Archives Vol. 14 No. 31 6 Rubin, D. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley. Swanson, C. B., & Stevenson, D. L (2002). Stan dards-based reform in practice: Evidence on state policy and classroom instructio n from the NAEP st ate assessments. Educational Evaluation and Policy Analysis, 24 (1), 1. Toenjes, L. A. (2005). What do Klein et al. tell us about test scores in Texas? Education Policy Analysis Archives, 13 (36). Retrieved October 30, 2006, from http://epaa.asu.edu/epaa/v13n36/ Torgerson, W. S., (1960). Theory and methods of scaling. New York: John Wiley. Weatherley, R., & Li psky, M. (1977). Street-level bureaucrats and institutional innovation: Implementing special educat ion reform in Massachusetts Cambridge, MA: Joint Center for Urban Studies of the Massachusetts Institut e of Technology and Harvard University. About the Author Sherman Dorn University of South Florida Email: firstname.lastname@example.org Sherman Dorn is editor of Education Policy Analysis Archives and a member of the social foundations faculty in the University of South Florida College of Education.
No More Aggregate NAEP Studies? 7 EDUCATION POLICY ANALYSIS ARCHIVES http://epaa.asu.edu Editor: Sherman Dorn, University of South Florida Production Assistant: Chris Murre ll, Arizona State University General questions about ap propriateness of topics or particular articles may be addressed to the Editor, Sherman Dorn, email@example.com. Editorial Board Michael W. Apple University of Wisconsin David C. Berliner Arizona State University Robert Bickel Marshall University Gregory Camilli Rutgers University Casey Cobb University of Connecticut Linda Darling-Hammond Stanford University Gunapala Edirisooriya Youngstown State University Mark E. Fetler California Commission on Teacher Credentialing Gustavo E. Fischman Arizona State Univeristy Richard Garlikov Birmingham, Alabama Gene V Glass Arizona State University Thomas F. Green Syracuse University Aimee Howley Ohio University Craig B. Howley Ohio University William Hunter University of Ontario Institute of Technology Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Green Mountain College Les McLean University of Toronto Heinrich Mintrop University of California, Berkeley Michele Moses Arizona State University Anthony G. Rud Jr. Purdue University Michael Scriven Western Michigan University Terrence G. Wiley Arizona State University John Willinsky University of British Columbia
Education Policy Analysis Archives Vol. 14 No. 31 8 EDUCATION POLICY ANALYSIS ARCHIVES English-language Graduate -Student Editorial Board Noga Admon New York University Jessica Allen University of Colorado Cheryl Aman University of British Columbia Anne Black University of Connecticut Marisa Cannata Michigan State University Chad d'Entremont Teachers College Columbia University Carol Da Silva Harvard University Tara Donahue Michigan State University Camille Farrington University of Illinois Chicago Chris Frey Indiana University Amy Garrett Dikkers College of St. Scholastica Misty Ginicola Yale University Jake Gross Indiana University Hee Kyung Hong Loyola University Chicago Jennifer Lloyd University of British Columbia Heather Lord Yale University Shereeza Mohammed Florida Atlantic University Ben Superfine University of Michigan John Weathers University of Pennsylvania Kyo Yamashiro University of California Los Angeles
No More Aggregate NAEP Studies? 9 Archivos Analticos de Polticas Educativas Associate Editors Gustavo E. Fischman & Pablo Gentili Arizona State University & Universidade do Estado do Rio de Janeiro Founding Associate Editor for Spanish Language (1998003) Roberto Rodrguez Gmez Editorial Board Hugo Aboites Universidad Autnoma Metropolitana-Xochimilco Adrin Acosta Universidad de Guadalajara Mxico Claudio Almonacid Avila Universidad Metropolitana de Ciencias de la Educacin, Chile Dalila Andrade de Oliveira Universidade Federal de Minas Gerais, Belo Horizonte, Brasil Alejandra Birgin Ministerio de Educacin, Argentina Teresa Bracho Centro de Investigacin y Docencia Econmica-CIDE Alejandro Canales Universidad Nacional Autnoma de Mxico Ursula Casanova Arizona State University, Tempe, Arizona Sigfredo Chiroque Instituto de Pedagoga Popular, Per Erwin Epstein Loyola University, Chicago, Illinois Mariano Fernndez Enguita Universidad de Salamanca. Espaa Gaudncio Frigotto Universidade Estadual do Rio de Janeiro, Brasil Rollin Kent Universidad Autnoma de Puebla. Puebla, Mxico Walter Kohan Universidade Estadual do Rio de Janeiro, Brasil Roberto Leher Universidade Estadual do Rio de Janeiro, Brasil Daniel C. Levy University at Albany, SUNY, Albany, New York Nilma Limo Gomes Universidade Federal de Minas Gerais, Belo Horizonte Pia Lindquist Wong California State University, Sacramento, California Mara Loreto Egaa Programa Interdisciplinario de Investigacin en Educacin Mariano Narodowski Universidad To rcuato Di Tella, Argentina Iolanda de Oliveira Universidade Federal Fluminense, Brasil Grover Pango Foro Latinoamericano de Polticas Educativas, Per Vanilda Paiva Universidade Estadual Do Rio De Janeiro, Brasil Miguel Pereira Catedratico Un iversidad de Granada, Espaa Angel Ignacio Prez Gmez Universidad de Mlaga Mnica Pini Universidad Nacional de San Martin, Argentina Romualdo Portella do Oliveira Universidade de So Paulo Diana Rhoten Social Science Research Council, New York, New York Jos Gimeno Sacristn Universidad de Valencia, Espaa Daniel Schugurensky Ontario Institute for Studies in Education, Canada Susan Street Centro de Investigaciones y Estudios Superiores en Antropologia Social Occidente, Guadalajara, Mxico Nelly P. Stromquist University of Southern California, Los Angeles, California Daniel Suarez Laboratorio de Politicas Publicas-Universidad de Buenos Aires, Argentina Antonio Teodoro Universidade Lusfona Lisboa, Carlos A. Torres UCLA Jurjo Torres Santom Universidad de la Corua, Espaa