Educational policy analysis archives

Educational policy analysis archives

Material Information

Educational policy analysis archives
Arizona State University
University of South Florida
Place of Publication:
Tempe, Ariz
Tampa, Fla
Arizona State University
University of South Florida.
Publication Date:


Subjects / Keywords:
Education -- Research -- Periodicals ( lcsh )
non-fiction ( marcgt )
serial ( sobekcm )

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
E11-00485 ( USFLDC DOI )
e11.485 ( USFLDC Handle )

Postcard Information



This item has the following downloads:

Full Text
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c20069999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00485
0 245
Educational policy analysis archives.
n Vol. 14, no. 12 (May 02, 2006).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c May 02, 2006
Gathering evidence on an after-school supplemental instruction program: design challenges and early findings in light of NCLB / Madhabi Chatterji, Young Ae Kwon [and] Clarice Sng.
x Research
v Periodicals.
2 710
Arizona State University.
University of South Florida.
1 773
t Education Policy Analysis Archives (EPAA)
4 856

xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 14issue 12series Year mods:caption 20062006Month May5Day 22mods:originInfo mods:dateIssued iso8601 2006-05-02


Readers are free to copy display, and distribute this article, as long as the work is attributed to the author(s) and Education Policy Analysis Archives, it is distributed for noncommercial purposes only, and no alte ration or transformation is made in the work. More details of this Creative Commons license are available at http:/ / ses/by-nc-nd/2.5/. All other uses must be approved by the author(s) or EPAA EPAA is published jointly by the Colleges of Education at Arizona State University and the Universi ty of South Florida. Articles are indexed by H.W. Wilson & Co. Comment on this article at http:/ / and send errata notes to Sherman Dorn ( EDUCATION POLICY ANALYSIS ARCHIVES A peer-reviewed scholarly journal Editor: Sherman Dorn College of Education University of South Florida Volume 14 Number 12 May 1, 2006 ISSN 1068–2341 Gathering Evidence on an After-School Supplemental Instruction Program: Design Challenges and Earl y Findings in Light of NCLB1 Madhabi Chatterji Teachers College, Co lumbia University Young Ae Kwon Kwon Learning Center Clarice Sng Teachers College, Co lumbia University Citation: Chatterji, M., Kwon, Y.A., & Sng, C. (2006). Gathering evidence on an afterschool supplemental instruction program: Design challenges and early findings in light of NCLB. Education Policy Analysis Archives, 14 (12). Retrieved [date] from http://epaa.asu.ed u/epaa/v14n12/. Abstract The No Child Left Behind (NCLB) Act of 2001 requires that public schools adopt research-supported programs and practice s, with a strong recommendation for randomized controlled trials (RCTs) as th e “gold standard” for scientific rigor in empirical research. Within that policy fram ework, this paper compares the relative utility of federallyrecommended RCT versus the demonstrated extended term mixed-method (ETMM) designs as options fo r monitoring effects of novel 1 The empirical study embedded in this paper was cond ucted at the request of the supplemental instruction program provider. The first au thor thanks the program providers and school leaders for their support and facilitation during the conduct of the study. An earlier version of this paper was presented at the Annual Meeting of the American Educational Research Association held at San Diego, CA on April 13, 2004. Name s are not released to honor client confidentiality.


Education Policy Analysis Archives Vol. 14 No. 12 2 programs in real-time field settings. Guided by the prog ram’s theory of action, a year-long, two-phase study was conducted to monitor the context, processes and early outcomes of an after-school supplem ental program in a New York elementary school. In both phases, the design comb ined a matched-groups, quasi-experiment with qualitative classroom observations and descriptive surveys. Early findings showed some positive, albeit “gross” pr ogram effects. Although findings are tentative, the ETMM approach enhanced interpretations by shedding light on relevant environmental variables, cause s for program instabilities and sample attrition, and factors affecting treatmen t fidelity and scalingup of the program beyond the pilot year. Keywords: research evidence; supplemental instructional programs; rigorous evaluation methods. The No Child Left Behind (NCLB) Act of 2001 requires that public institutions adopt research-supported programs, practices and policies, with a strong recommendation for the use of randomized controlled trials (RCT) as the “gold stan dard” for attaining scientific rigor in empirical research efforts (U.S. Department of Education, 2003). Within that policy framework, this paper compares the relative utility of federally -recommended RCT versus the demonstrated extended term mixed-method (ETMM) design (Chatterji, 2005) as options for monitoring effects of novel programs in real-time field settings. To demonstrate the me rits and demerits of the alternate ETMM approach, this article details the design concepts and empirical procedures employed to monitor early processes and effects of a supplemental program in reading and mathematics, as implemented in one elementary school in New York City. Design challenges that were faced along the way and modifications made to the original design are discussed against the body of information that was obtained on conclusion of the two-phase, mixed-met hod investigation. To allow for a comparative appraisal of the utility of the demonstrated ETMM approach against RCT by readers, a federallyfunded national evaluation of another supplemental instruction program, the 21st Century Community Learning Centers (21st CCLC) (Mathematica Policy Research, Inc. & Decision Information Resources, Inc., 2003) is used as a benchmark for discussion. Theoretical Framework As a preamble to a detailed presentation of the specific ETMM application with the New York City supplemental program evaluation, the ar ticle begins with a discussion of three topics. First, it examines the NCLB context for school s to adopt programs and practices supported by research evidence, the meaning of “scientific ri gor” as given in federal documents, and the difficulties in implementing sound experiments in field settings. Second, it describes a concurrent federal recommendation that emerged under the a uspices of the NCLB, namely, that failing schools should utilize supplemental instruction services and extended day schooling to improve student achievement outcomes. Lastly, it describes the 21st CCLC study, which was supported by federal funds and where evaluation researchers attempt ed to implement RCTs on a national scale.


Gathering Evidence 3 Federal Mandate for “Scientific Rigor” and Di fficulties in Mounting Rigorous Experiments Soon after the passage of NCLB in 2001, th e Coalition for Evidence-based Policy under the DOE’s Institute of Education Sciences (IES) re leased formal guidelines on identifying and implementing evidence-based practices in K-12 systems. Calling on educational practitioners to comply with the NCLB mandate for using “scientifically-based research” to guide their decisions about programs and interventions to implement (U.S. Department of Education, 2003), the document identified randomized controlled trials (R CTs) as the “gold standard” for obtaining strong and rigorous evidence on the effects of field-ba sed programs and interventions. RCTs were defined as empirical studies that measure comparative e ffects of an intervention by randomly assigning individuals to the new program and to a control condition. Several providers, independent researchers and research agencies have since made valiant attempts to respond to the federal requirement for executing randomized experiments on educational and other programs in public institut ions. However, barriers in field settings have been numerous. Due to organizational, political, and day-to-day operational complexities in schools and districts, true experiments are difficult to mo unt—whether in the case of supplemental or mainstream school innovations (see Cook, 2002, for a list of barriers). Quasi-experimental, timeseries, and regression discontinuity designs have been suggested as alternatives for making generalized causal inferences on educational prog rams (Shadish, Cook & Campbell, 2002). Some quasi-experimental designs have limited applicability to particular classes of problems (for example, regression discontinuity approaches are best applied when differential placement of subjects is a part of the treatment program design). All experimental designs, however, tend to emphasize outcomes. Further, they assume that “treatments” can ea sily be standardized in and across field sites, and that effects can be fairly measured and compared once “treatment fidelity” is obtained and inter-pupil differences equalized in treatment and c ontrol groups, holding all else constant in the environment as long as the experiment continues. In actuality, it is not easy to gather defini tive empirical evidence of treatment fidelity in typical school settings, because educational treatmen ts are not singular, narrowly-scripted entities. Even when gathered, qualitative differences in day-to-day operational definitions of a program make it difficult to draw conclusive causal infere nces between a program and measured outcomes, particularly when a program is new. Further, wh ile effective random assignment of subjects (the sine qua non of the “true” experiment) may statistically e qualize pre-existing differences in pupils, the procedure cannot erase interfering effects of potenti al contextual contaminants. Multiple and often dissimilar initiatives are commonly in operation in open, complex, hierarchical systems that schools represent, all often targeting the same outcomes in the same groups of children. Control conditions often overlap and are not markedly dissimilar in operation from the treatments in early implementation phases. In cases where similar groups of pupils can be assigned to treatment and control conditions and the treatment delivered in a stable manner, two added sets of factors must be taken into consideration when designing school-based studie s on supplemental or mainstream services. The first deals with the time needed for the critical, operational components of a program to settle down and for the program to take shape at a given site. The second deals with environmental dynamics during the course of a study that may alter the operational definitions of treatment, control, and other confounding conditions in complex orga nizations. Because they are added instructional opportunities appended to an array of regular-day in itiatives, the design challenges are particularly acute when studying effects of supplemental in struction programs on student achievement levels.


Education Policy Analysis Archives Vol. 14 No. 12 4 Chatterji (2004, 2005) thus recently asserted that comparative experiments by themselves are inadequate designs for studying school-based initiatives and proposed broader ETMM designs as an alternative. ETMM designs complement experimental designs with other methods, and use a phased approach in executing the research in order to better study environmental, treatment and control variables in situ while allowing the program to take hold. Historically, methodological scholars have given ample attention to the need for more comprehensive and systemic designs to properly study the effects of complex interventions in school settings. Recommendations of Donald Ca mpbell (1981) and Lee Cronbach and associates (1980), in particular, speak to the utility in mixing various research methods, and in employing “before” and “after” studies that build on one another over time to address questions of program impact. Such writings point to a clear need for researchers to judiciously combine comparative, qualitative or descriptive research methods to properly answer questions on how a novel program might work, what it looks like in operation in early and later stages of implementation, the conditions under which it influences particular outcome measures, and the likelihood that it will work in the same way with other students, across settings and over time. Federal Recommendations for Schools to Us e Supplemental and Extended Day Services Supplemental programs. The U.S. has had a long history of providing supplementary education via schools, community organizations, chur ches, for-profit education providers and other agencies to students in all achievement and so cio-economic brackets. However, the press for schools to use supplemental instruction as a strategy to be nefit economically disadvantaged, low-achieving minority students heightened in the past decade of standards-based education reforms in the U.S. The No Child Left Behind Act of 2001 (PL 107–110) expanded the range of service options for parents whose children attended Title 1 schools that were flagged as needing improvement. NCLB defines supplemental educational services as tutoring and “research-based” academic enrichment programs that supplement, but do not replace, instruction provided by schools during the school day. Among the choices offered under the law, children from low-income families enrolled in schools not making adequate yearly progress (AYP) for two consecutive years are eligible to receive supplemental educational services, including tutoring, remediation, and other academic instruction. Under the NCLB Act, supplemental education service provision is to be overseen by states. To facilitate state-level implementation in 2002–03, the U.S. Department of Education (DOE) issued non-regulatory guidelines to assist schools an d school districts in selecting and monitoring supplemental service providers as well as in gath ering evidence of program/provider effectiveness ( ) NCLB’s broader strategy for fostering school improvement and accountability calls for under-performing schools to offer “supplemental ed ucational services” for students failing to meet standards on external accountability tests admini stered by states. Approved programs, funded through Title I and provided to students in schools that do not make AYP for three consecutive years, are required to show increases in student achievement levels, with schools attaining correspondingly higher performance standards set acco rding to state criteria (P.L 107–110, 115 Stat. 1425, 2002). A recent federal report released data on the implementation status of supplemental instruction programs by states under the NCLB Act (Anderson & Weiner, 2004). The study used a telephone survey method and found that generally, st ates were complying with DOE guidelines in selecting supplemental providers; districts and sc hools were making strides towards implementation;


Gathering Evidence 5 but little evidence was found of any systematic effort s to monitor provider effectiveness at either the state, district, or school level. Other than the NCLB, a spotlight on supplemental education is also found in recent recommendations of the National Task Force on Minority High Achievement convened by The College Board (1999). The Task Force’s report ca rries a clear message that a viable means for poorly-achieving minority students to improve thei r academic achievement is by employing afterschool supplemental strategies that have proven su ccess with “educationally sophisticated or savvy” parents and student groups (p.18). Schools have se veral options when it comes to commerciallydistributed supplemental instruction products, including the one investigated in the present study. Extended day programs An associated reform initiative prompted by NCLB is extended-day schooling. Extended-day programs generally take the form of schools adding an hour or two of supervised schooling during which all or selected grou ps of students are provided with after-school care and/or tutoring services in academic subj ects. Based on the Schools and Staffing Survey data collection conducted by the National Center for Educ ational Statistics between 1990–94, DeAngelis and Rossi (1997) reported that extended-day progra ms have increased greatly in U.S. elementary schools over time and are now serving greater numbers of minority and high-poverty students. However, such programs were fewer in number in rural than in urban schools, and among private institutions, their availability is greater in Catholic schools. Not all extended day programs provide supplemental instruction, devoting time instead to supervised extra-curricular activities. There is so me descriptive evidence fr om a number of large efforts, including the Big Brothers and Big Sis ters of America mentoring program, that show improved academic achievement on standardized tests such as the Stanford Achievement Tests (9th Edition), better school attendance, and improved psychological and behavioral outcomes for at-risk youth, such as reduced gang-related behaviors, vi olence, or drug use (University of California at Irvine, 2001; Aguirre International, 2000; Huang et al., 2000; Grossman et al, 2000). To achieve success on academic outcomes, Owens and Vallercam p (2003) isolated the following five major factors that extended day programs should embody: addressing identified needs within a school; building on a shared vision among the school and larger community; fostering staff ownership; having ties to state curriculum standards; and me asuring and sharing results across the community. Available evidence on the effectiveness of va rious supplemental instruction programs and the best models for their delivery in urban school s and large city school systems is still somewhat sparse. Few rigorous evaluations exist, according to a recent report of a national Task Force on promotion of minority achievement (The College Board, 1999). The success of supplemental programs, according to Cohen (2003), is predicated on several factors, such as a strong parent, tutor, and teacher connection; experienced providers and developers; proven methods of instruction; customized instruction; measurable results based on time on task; and positive learning environments. Although choices exist, availabl e information on program efficacy is still mostly anecdotal, with formally-gathered research evid ence limited on effects of various supplemental programs in different populations. One large-scale federally-supported study, discussed next, is an exception. The 21st Century Community Le arning Centers (21st CCLC) Evaluation Evaluation design and findings To raise achievement levels in disadvantaged and struggling students, the Elementary and Secondary Educ ation Act supported supplemental center-based programs in over 360 rural and inner city school s in 34 states in 1998. Labeled as the 21st Century Community Learning Centers (21st CCLC) initiative, this program of supplemental education was


Education Policy Analysis Archives Vol. 14 No. 12 6 reauthorized under the auspices of NCLB in 20 02, with an additional one billion dollars. In 2003, DOE released its first year findings from the 21st CCLC national evaluation examining program characteristics and outcomes (Mathematica Po licy Research, Inc. & Decision Information Resources, Inc., 2003). This study, although labele d as “first year findings” was conducted after the initiative received three years of funding. The national evaluation of the 21st CCLC utilized a randomized experimental design to ascertain effects in some if not all centers (Mathematica Policy Research, Inc. & Decision Information Resources, Inc., 2003). The evaluation’s design incorporated separate studies with middle and elementary school students. The elementary study used random assignment of students to treatment and control groups in 14 school districts with 34 centers; the first year study focused on data from 7 of the districts grantees that could implement the experimental design; data from 1000 randomly assigned students were analyzed (Mathematica Policy Research, Inc. & Decision Information Resources, Inc., 2003, p.13). Th e middle school study used matched samples of students in treatment and comparison groups; it focused on 62 centers in 34 school districts. Evaluators collected baseline and follow-up data on 4400 middle school students from 32 of the district grantees. In addition, 2–4 day site visits were conducted to gather supporting data on program profiles in both elementary and middl e school studies. Outcomes were measured on students’ perceptions of safety, attendance, test sc ores and grades in academic subjects, and teacher satisfaction with homework or class work completion. Implementation findings showed that progra ms were staffed by school-day teachers on additional pay and offered 4–5 days a week but lacked in academic content. Markedly, programs posted low student attendance rates (an average of 2 days per week) and were limited by inadequate plans for sustainability, according to the authors. Little or no differences were found between the treatment and comparison students on any of th e outcomes at both elementary and middle school levels at the end of the third year of implementation. The 21st CLCC evaluation design and interp retive constraints with results The authors of the 21st CCLC report describe their study as “one of th e few” that are consistent with NCLB criteria for scientific rigor because of their use of random ized trials (Mathematica Policy Research, Inc. & Decision Information Resources, Inc., 2003, p.xi v). At the same time, they admit to many shortcomings of even their elementary-level in vestigation where they reported the use of RCTs. Among others, their reported concerns surround th e lack of sample representativeness, limited generalizability of results, cohort differences by year over the period of implementation, and student similarities/dissimilarities stemming from nes tedness in school-based centers across multiple districts (Mathematica Policy Research, Inc. & D ecision Information Resources, Inc., 2003, p.13). Other methodologists or stakeholders coul d raise additional questions. First, because selection of control students was dependent on surplus enrollments at funded centers--a logistical barrier—the researchers could only employ RCT at the elementary level. Second, there were no significant effects after 3 years of program impl ementation nationally, but interpretations of the effects were difficult to make based on the limited information collected on ongoing program inputs, processes, local environmental dynamics and variables. Finally, the effort sought definitive information on effects without any built-in attempt at providing formative feedback to strengthen program delivery as the centers became established. Thus, while the scope of the information targeted by the study as a whole was huge and the costs of a multi-site, multi-year national evaluation enormous, the evidence obtained within and across sites was superficial at best—constrained by the scale of the effort. Too much faith had been placed on the “magic” of randomization in the 21st CLCC elementary level investigation. There was no empiri cal verification of sample equivalence over time nor of contextual irregularities or variability in treatment and control conditions within and across


Gathering Evidence 7 sites over three years of implementation. Multiple cohorts appeared to be mixed up in that study. Data on program characteristics were gathered post-hoc through brief site visits. No first-hand documentation or data existed on qualitative differe nces in various models of program delivery as they emerged in actual school environments; no di rect links could be made between particular program characteristics and particular outcomes. So me centers may have been more effective than others, and some may have had better attendance than others, but such differences were clouded in the results. While the researchers did a good job of docume nting several limitations in their procedures; randomization as a procedure got severely compromised in the field application and did not help them in their cause to gather high quality evidence on program effectiveness. Besides the documentation that participation rates had been uneven and low—other factors that may have explained the disappointing results remained in a “black box”. Almost immediately after the release of the study, federal funding for the 21st CCLC was cut by 40%. The drastic action catalyzed interest in developing a stronger “research and evaluation agenda” that allows for continuous improvement of similar innovations as well as accountability to funders (Harvard Family Research Project, 2003, p. 1). Essential Elements of the ETMM Approach While RCTs (like the one described) often target multiple sites across the nation to obtain statistically desirable sample sizes for hypothesi s testing, they give minimal attention to program processes and environmental factors in their desi gn. ETMM designs, in contrast, are guided by a program’s theory of action and mix research methods. They complement field experiments with ongoing observations, interviews or survey research to better gauge how relevant variables might affect outcomes. The aim of such designs is to docu ment relevant facets of a program as it operates in its natural environment, as systemically and co mprehensively as resources will allow. The research plan in ETMM designs deliberately targets a signif icant portion of the life of an intervention for study, incorporating two self-contained phases of work: an exploratory, formative investigation, followed by a confirmatory, summative investigation. The formative phase is used to provide feedback to program participants to shape progra m delivery, to better study the treatment, control conditions and the environment, as well as to im prove the research design as more is learned empirically about the larger context in which a new program operates. The summative phase incorporates more formal experimentation. To gether, the two phases in an ETMM design are intended to yield a comprehensive body of evidence that permit researchers to make sound determinations of impact with knowledge of cond itions under which the effects were manifested (see Chatterji, 2004, 2005, for design principles). A Demonstration of the ETMM Approa ch with a Supplemental Program Evaluation The present ETMM application was constrained by limited resources and is thus a less than “ideal” implementation example. However, it still yiel ded a corpus of evidence that facilitated a more holistic appraisal of likely effects of the supplemen tal program under similar conditions than would a traditional RCT. The research involved a year-l ong study and combined a matched-groups, quasiexperiment with classroom observations and sur veys. This design was implemented in two successive phases of research. A 14-week fo rmative phase explored the program and its


Education Policy Analysis Archives Vol. 14 No. 12 8 environment in depth and was aimed towards prov iding feedback to developers, program personnel and school staff so as to stabilize treatment deliver y and improve fidelity. That was followed by a 16week summative study of short-term and very early impacts, where findings of the first phase were used to tighten the data-gathering and analytic design in specific ways. Details of the context, methodology and findings follow. Context of the Evaluation Study The present study was conducted during the 20 01–02 academic year and was a pilot of the program in New York City schools. The treatment program was delivered as a component of the extended time schooling initiative already under way at the school site. The school, located at Harlem, had been marked as a school under review by the city board of education in the previous year. The school administration hoped to improve stud ent performance on state and city tests in all grades from Pre-K through 5. The program was one of several reform initia tives concurrently being implemented by the school to achieve this objective. The research was initiated in response to a request from the program developer. The broader stakeholder group included the principal, teachers, students and parents of the school, all of whom were engaged in the deliveryor utilization of evaluati on results to some degree during the pilot year, along with the provider. The primary goal of the research was to comprehensively examine how well the program performed in a New York public school environment. The more typical setting for the treatment program consisted of after-school community centers, where participating children were from the middle to high socioeconomic brackets, an d active parent volunteers ran the program. For the first time, the program was being tested with ethnic minorities in New York City, all of whom were enrolled in the free or reduced lunch program at the Harlem public school (i.e., in the low socioeconomic bracket). Most were struggling in reading, mathematics, or both subjects. Treatment Program Characteristics The program (referred to as the treatment program hereafter) is described by the developers as being among the world’s largest providers of supplemental education materials. The method emphasizes computation in mathematics and basi c reading skills, the development of speed and accuracy skills through practice and repetition, in dependent learning, and self-paced mastery of graduated materials in basic mathematics and reading. The program incorporates some characteristics associated with potentially su ccessful supplemental programs mentioned by Cohen (2003), in that it attempts to involve both pa rents and teachers in school-based delivery models, allocates blocks of work time for students, and matches student levels to materials through initial placement testing. Others have noted that th e program aims to make basic skills, such as computation, automatic by promoting over-learnin g shaped by feedback, and uses timed conditions that mimic conditions of standard ized testing (Weischadle, 2002). The supplemental curriculum in reading and mathematics was delivered in 20-minute work blocks in each subject, three days per week, during the extended hour of the school day in treatment classrooms of the school site. That is, it was sele ctively delivered as a co mponent of the extended day schooling initiative already in operation at the sc hool, in particular treatment classes. Teachers in treatment classes volunteered to participate during the pilot year following schoolwide training and orientation activities that occurred in the preced ing summer. In comparison classes, by contrast,


Gathering Evidence 9 students did not receive the supplemental program during the extended hour of schooling or at any other time. The supplemental curriculum consisted of sequenc ed sets of multi-item worksheets (referred to as assignments by the developers), founded on the philosophy of its developers. To start, children were given placement tests and started by the developers at levels that matched their ability levels on specific subjects. Children were expected to progre ss at individualized paces through the leveled assignments on their own, with minimal guidance from teachers/ facilitators. They followed a set daily routine, where they were expected complete assignments under timed conditions. Before each session, they reviewed their homework, re-did or corrected missed problems from the previous session, and moved on to the new worksheet assigned. Per program theory—or the underlying assumptions on which the program was built--expected outcomes were higher levels of reading and mathematics achievement, self-efficacy as evid enced in their self-reports and confidence in attempting more tasks/items, better completi on times, and independent work habits. Nine classrooms, ranging from Pre-K through Grade 5 and including one, mixed-grade special education class, participated in the program during the year of the study. Treatment Program’s Underlying Theory The design of the study began with an analysis of the supplemental program’s theory of action or the set of explicit or implicit assumpt ions that suggested how the desired outcomes would be affected by variables in their context and th e program inputs and processes (after Bickman, 2000). The major components of the supplemental pr ogram’s theory were extr acted by the research team based on a qualitative review of the program materials, videos, documentation supplied, and ongoing consultations with staff of the curricu lum corporation. These findings were organized under Program Inputs (resources and services allo cated to set up and run the program at the site), Program Processes (activities that were expected to occur as a result of the inputs), and student and program outcomes that were expected to ensue. The logic model (Figure 1) depicting the treatment program ’s theory shows that the supplemental program aimed for the same achi evement outcomes as the regular school-day’s programs in reading and mathematics. Critical cont ext variables to consider in the design, delivery and analysis of the supplemental program were stud ent characteristics and the urban location of the school, along with its status as a school under revi ew in the city system. As shown, multiple schoolwide initiatives were concurrently in effect to raise student achievement at the school when the study commenced. The key ones included smaller class sizes (a structural/organizational intervention), the regular-day reading ( Success for All ) and mathematics curriculum (curriculum/instruction interventions), school-wide parent involvement in centives and an after-school snack program for children during the extended hour of school (student services/support interventions). In terms of inputs, the additional total cost of the treatment program in a given subject area per child was reported to be approximately $300 in a 9-month school year. More specifically, inputs during the after-school sessions for children receiving supplemen tal education could be classified under five major headings. Placement testing To begin the program, students were placed at a level in which they were most likely to succeed in a particular subject ar ea supplemental curriculum. Placement tests were administered to each participating student and scored by the developer’s staff to achieve this purpose. Materials The program in each subject area consisted of assignments focusing on leveled basic skills. These assignments were kept in storag e shelves provided by the developer, and housed


Education Policy Analysis Archives Vol. 14 No. 12 10 in a resource room provided by the school. Additional supplies included posters, number games, and other materials intended for skil l-building relevant to the supplemental curriculum. Periodic achievement tests were admi nistered to students focusing on blocks of completed worksheet skills. Student performance reports, prepared by the corpor ation, were supplied back to teachers, parents, and students following achievement testing. Rewa rds and recognition systems were implemented to keep students motivated.


Gathering Evidence 11 School Context Variables Inputs and Processes Outcomes Figure 1 After-School Supplemental Program Theory Model Urban, large city school All minority, high poverty students School “under review” by city system School resources for: Smaller classes Multiple reforms in curriculum and instruction Regular-day reading and mathematics programs Teaching staff and qualifications Extended-time schooling Supplemental program inputs Placement testing, materials, training, supports, aides Supplemental program delivery processes Time on task, teacher facilitation, parent involvement, orderly classroom and program routine. High reading and mathematics achievement scores on: Skills tests matched to supplemental curricula Standardized Achievement tests District tests Higher independence and confidence in subject, as evidenced in: Self-efficacy measures Better completion times Attempts at more items/tasks


Education Policy Analysis Archives Vol. 14 No. 12 12 Training and support services Developers provided school administrators and teachers in all participating classrooms with training and materia ls before the program began. The corporationÂ’s staff provided ongoing assistance to teachers an d helped with program organization and delivery throughout the first semester and for much of the second. Aides/assistants : The corporation also provided aides/assistants to assist with the daily grading of assignments and management of materials in treatment program classrooms. As evident in Figure 1, several treatment program processes were expected to occur as a result of the inputs. Among the critical ones were the following. Student time-on-task For participating classrooms, the after-school hour was broken down into 20 minute work blocks in reading and math ematics, respectively. Children were expected to follow a structured routine to complete assignments for at least this period of time on days with supplemental instruction. Teacher-facilitated delivery Following the diagnostic testing individual classroom teachers were responsible for program delivery based on the prescribed program philosophy and daily regimen. Once trained, teachers were expected to allow individual children to complete each dayÂ’s assignments as independently as possible. Although not expected to score student assignments, teachers were expected to provide the feedback an d coaching needed to help individual children begin their work each day, or correct mistakes fr om the previous dayÂ’s work. Teachers were also expected to manage studentsÂ’ classroom behaviors during the supplemental hour, including keeping children occupied once worksheet activi ties were completed for the day. Parent involvement The program aimed to actively involve parents in their childrenÂ’s learning. To that end, the corporationÂ’s staff held parent orientation meetings, sent homework sheets home with particular children, and prepared student reports for parents. Orderly classroom environment Videos of ideal classrooms depicted an environment that was quiet, organized, and orderly, with children n eeding very little one-on-one guidance. When the program operated according to guidelines, teach ers/facilitators were minimally involved, and students progressed from level to level guided by their own high motivation and engagement levels. The classrooms were expected to be distraction-free and conducive to independent learning. Other treatment program assumptions were implicit. The after-school curriculum was intended as a supplement to the regular curricula in reading and mathematics, emphasizing state content standards. Thus, there was an implicit a ssumption that the embedded skills would be aligned with and complement those typically covered by teachers in Pre-K through Grade 5 classrooms during the regular school day. The regular-day curr iculum was also expected to affect children in treatment program and comparison classes uniformly. Once inputs were allocated, it was assumed that there would be consistent levels of support an d buy-in from teachers, school leaders, parents, and students, so that the program ran smoothly, as designed. Because of the emphasis on parent involvement, more parents were expected to be involved in their childrenÂ’s education in the supplemental program classrooms than in classrooms without these services. Evaluation Questions Given the programÂ’s theory of action, questions that guided the design and data gathering procedures were classified under four headings: treatment fidelity (both formative and summative phases), teacher perceptions and buy-in (both phases), initial process-outcome relations and moderator effects (formative phase only), and ea rly treatment impact (summative phase only). Questions are listed below.


Gathering Evidence 13 Treatment fidelity To what extent were inputs and proce sses observed during the pilot year, consistent with theory in treatment classrooms? Were program inputs and processes observed in treatment classrooms changing over time in directions expected per program theory? Teacher perceptions/buy-in Did participating teachers report satisfaction with the program products and services in the early and later phases of program implementation? Initial process-outcome relations and moderator effects Did the treatment yield better achievement outcomes for comparable groups of children in the formative phase? Did children’s achievement vary in treatment versus comparison classrooms where teacher perceptions on selected environmental variables varied (i.e., were high versus low)? These variables included perceptions of alignment of the supplemental program with th e regular-day curriculum, observed parent involvement levels, and observed levels of student independence. Short-term treatment impact Controlling for mid-year achiev ement, were there short-term effects of the supplemental program in reading and mathematics on key outcomes in comparable treatment versus comparison children? Methods Because the supplemental program was individua lly adapted, students at a given grade level were permitted to start at different points and move at varying paces through the after-school curriculum. To target both the primary and inter mediate groups, parallel forms of multi-level achievement tests were designed in each subject area to serve as outcome measures. These tests were expected to be more sensitive to early effects of supplemental services. Methods for observing and recording all input, process and outcome variable s described next were the same in both phases of the research Formative Phase—The “Before” Study The formative study of the program began s oon after the summer teacher orientation. It yielded documentation of the extent to which the observed program processes, inputs, and outcomes were consistent with the program’s underlying theory and philosophy in the very early life of the program (semester 1). Process data were ga thered using classroom observations and teacher surveys, along with outcome data on multi-level reading and mathematics tests focusing on skills reinforced through the treatment program Matched samples of treatment and comparison group students by primary (Grades Pre-K-1) and interme diate level (Grades 4–5) were identified at the start of the school year. All children were first-time enrollees at the particular grades and not in special education. A Grade 3 class with retained st udents and a special education class did not have matches by grade and were treated separately to im prove internal validity of the comparative design (descriptive data were collected for them). In the comparative design, thus, the primary and intermediate samples were essentially independ ent samples matched by grade; demographic equivalence of the within-grade samples was examined at the start, but could not be sustained due to student mobility (detailed next). Descriptive analysis of the qualitative and teach er survey data were complemented with twoway ANOVAs that examined early process-outcome relationships by grade, with appropriate moderators as independent factors (e.g., effects of high and low levels of teacher-perceived curriculum alignment with the supplemental prog ram by treatment versus comparison group). The outcome analyses used grade-free multilevel skills te sts as the main achievement outcome measures


Education Policy Analysis Archives Vol. 14 No. 12 14 in reading and mathematics. The multi-factor ANOVAs helped examine and as necessary, rule out effects of extraneous environmental factors on studen t achievement and select an optimal statistical design in the summative phase. In addition to in formal exchanges that occurred regularly between the teachers, researchers, the developers and sc hool personnel, results of the formative study were formally fed back to program developers, s ponsors, and on-site participants as program implementation continued in mid year. Summative Phase—The “After” Study At the request of the sponsor, the summative phase of the evaluation was implemented during the last 16 weeks of the school year as prog ram implementation continued. It was also guided by the program theory model. Data collection continued with classroom observations and surveys to document changes on program inputs and processe s over time in matched classrooms by grade. Using the end-of-first semester scores on different subject area tests as the covariate, ANCOVA and effect size comparisons were now used to draw conclusions on early program effects in the previously identified treatment and comparison stud ents within independent, primary (Grades Pre-K through 1) and intermediate level (Grades 4–5) sub-samples. Student mobility and attrition rates that the school and researchers were unable to control, reduced sample sizes in the summative phase. Corrective actions included the use of the midyear covariate to equalize pre-existing domainspecific student differences in the summative analyses. The data were checked to see if homogenei ty of regression assumptions for conducting ANCOVA were met (i.e., there was no interaction between the covariate and treatment conditions). Independent factors in the first analysis were treatment versus comparison conditions. Dependent or outcome measures were reading and mathemat ics scores on the multi-level tests. Effect sizes were computed using Glass’ formula to understand the direction and magnitude of initial effects. Additional analyses compared means descriptively on other outcomes in treatment/comparison groups. Changes in Comparative Research Design The present ETMM application incorporated a comparative design that has been characterized as a quasirather than a trueexperiment. While students in the school were “randomly assigned” to teachers in the beginning of the school year because of an administrative policy of heterogeneous grouping, 9 of the teachers (classrooms) volunteered to participate in the treatment program across grade levels—this resulted in un controlled conditions with respect to teacher equivalence in treatment and control conditions. In matched classes by grade, howev er, equivalence of students from treatment and comparison conditions was attempted and periodically checked on four background characteristics: ethnicity, gender, membership in free lunch program, and native language spoken at home (Limited English Proficiency status). Initial equi valence was established within grades. To obtain higher sample sizes by level, a decision was made to separately study primary (PreK-1) and intermediate (Grade 4–5) samples using students from combined grades at each level. Grade-level breakdowns were examined descriptively prior to initiation of the formative study, and grade was used as a control variable in later statistical analyses. Because the primary matching variable was grade level, the samples were treated as independent samples in statistical comparisons


Gathering Evidence 15 and hypothesis tests, with covariates included in the summative analyses. Due to small numbers, nestedness of students in classrooms was not taken into account in the analysis. Subject Characteristics Table 1 shows treatment group statistics on mean number of assignments completed as an index of program exposure. Tables 2 and 3 show the characteristics and numbers of students in samples at the point of commencement of the formative study, and in mid year before the summative phase began. During the course of the investigations, attrition due to student mobility, inadequate exposure to the treatment due to irregular attendance, or missing data on critical outcome variables resulted in changes in sample composition and fewer cases for particular summative analyses. These changes to sample size reduced power of the statistical tests in the summative phase, but did not markedly alter the comparability or representativeness of the original matched samples on background characteristics deemed relevant for the investigation (this was checked, and proportions were comparable in different ethnic and gender grou ps). Regardless, because of sample attrition, summative analyses incorporated a covariate to adj ust for mid-year differences in academic skills in both subject areas and used the adjusted Sums of Squares (Type III) for calculation of variances because of unequal Ns in cells. Table 1 Mean Treatment Exposure by Grade Subjec t Area and Level: Number of Students Grade and Subject Mean # of Assignments SD N Pre-K Reading 510.77 155.64 13 Math 703.77 170.20 13 Kindergarten Reading 654.67 208.52 15 Math 682.67 245.40 15 Grade 1 Reading 706.33 210.32 15 Math 931.47 165.11 15 Grade 4 Reading 700.00 167.52 20 Math 673.50 182.56 20 Grade 5 Reading 706.50 177.56 20 Math 733.75 161.71 20


Education Policy Analysis Archives Vol. 14 No. 12 16 Table 2 Demographic Equivalence in Initial Treatment and Compar ison Samples by Level Demographic Variable (Level) Treatment Group Comparison Group % N Gender (Primary) Male 21 21 45 Female 26 26 56 94 Gender (Intermediate) Male 15 15 48 Female 16 16 52 62 Ethnicity (Primary) Black 31 31 66 Hispanic 8 8 17 Other/Unknown 8 8 17 94 Ethnicity (Intermediate) Black 28 28 90 Hispanic 3 3 10 62 Free/reduced lunch (Primary) 47 47 100 94 Free/reduced lunch (Intermediate) 31 31 100 62 English speakers (Primary) Non-native Englis h speakers 7 7 15 94 Native English speakers 40 40 85 English speakers (Intermediate) Non-native English speakers 3 3 10 62 Native English speakers 28 28 90 Table 3 Sample Sizes by Level in MidYear prior to Summative Study Level (Outcome Measure) Tr eatment Comparison Total N Primary (Reading) 35 33 68 Primary (Math) 35 33 68 Intermediate (Reading) 30 31 61 Intermediate (Math) 29 30 59 Breakdowns by grade available on request.


Gathering Evidence 17 Sampled Observation Notes Pre-K classroom: Children seem to know what to do. Children in groups of 5–6 at table with adult–aide or teacher. Each focused on worksheet. Engaged in worksheet. Aide guidin g student to write– “down up, down up...” Grade 4 classroom: One girl has finished her worksheet, She says to observer “ XXX has he lped me in math.” She walks to desk, ch ecking her sheet. She discovers she has missed 3 items. She returns and begins to do them Codes Consistent with Program Theory [Ss following program protocol] [Ss and aide following program protocol] [aide assisting Ss; po sitive environment] [Ss following program protocol; positive comment on program] Sampled Observation Notes Grade 1 classroom: (Developer) giving directions for XXX routine to students.. .”be quiet, get your packet, get ready for XXX. But no one seems to pay attention to him, except for a few kids. They are extremely noisy.... Grade 3 classroom : Only 5 students in class; 4 of whom are on task. Teacher working hard–trying to keep them seated. R says–“Shouldn’t the sheets be matched to their levels of comfor t?” Teacher responds– “Yes, but we moved them up faster and they are discouraged” Codes Inconsistent wi th Program Theory [Ss loud; developer managing Ss’ behavior] [Supplemental progra m attendance low] [T managing Ss’ behavior] [Ts not following program protocol] Figure 2 Classroom Observation Records an d Line-by-line Coding Procedures Data Sources, Measures and Data Collection Details on the development and validation procedures for three newly developed instruments are given under particular sub-heading s. The appendix provides additional details on assessment specifications and items with early validity and reliability data. Classroom observations of program inputs and processes Narrative running records of treatment classroom activities were sampled during the supplemental hour by observers at both


Education Policy Analysis Archives Vol. 14 No. 12 18 primary and intermediate levels. For the formativ e study, a total of 20 such observations were conducted for 30 minute periods each and distributed equally in intermediate and primary classrooms. Likewise, in the summative phase, 11 observations were conducted (5 were in primary classrooms, 5 in intermediate classrooms, and 1 in the grade 3 class). The text data were coded line by line, using classical content analysis proced ures (Ryan & Benard, 2001) and codes were clustered under general themes. A sample of coded observation data is shown in Figure 2 and illustrates how codes extracted from each line of text data were classified under broader themes to evaluate their consistency with expectations given by the program theory model in Figure 1 (results reported in Table 4). To examine changes over time, the proportions and rank-order of counted codes by theme category were compared in the first and second semesters of program implementation, the formative and summative phases of the research. Teacher self-report surveys in part icipating and comparison classrooms In both semesters, treatment teachers were asked to rate the quality of diffe rent aspects of the supplemental program. At the end of the each semes ter, teachers in both participating and non-participating classrooms matched by grade level (N=20, 8 in paired cla ssrooms by grade plus others) were also asked to respond to items tapping three key moderator vari ables: perceived alignment of the supplemental program with the regular curriculum in reading and mathematics, perceived parent involvement levels in their classes, and perceived levels of student independence. To check for their perceptions on the degree of regular curriculum alignment with the supplemental program objectives, skills were ex tracted through a content analysis of the supplemental materials, and presented to treatment and comparison teachers in the survey (the complete instrument appears in the appendix). Item responses and means on survey indices were compared descriptively in treatment and comparison classes in both semesters to obtain a sense of the differences on contextual variables under the two conditions. Table 4 shows sample items from each sub-domai n of the survey. As is evident, CronbachÂ’s alpha reliability estimates were found to range from .73-.89 (greater than th e acceptability criterion set for .70) on all teacher survey indices. Student outcome measures Student achievement scores, time taken, and number of items attempted, on the specially designed multi-level skills tests in reading and mathematics were the outcome measures used to evaluate short-term effects of the program. The domain for each test was ordered, and represented by progre ssively complex groups of skills, st arting at the beginning of prekindergarten levels and going to a few levels beyond the maximum achievement expected at the highest grade. Test specifications, shown with sample items in the appendix, were developed with the involvement of staff from the curriculum cor poration. Items matched to each skill area were then selected from the existing pool of publishe d curriculum materials. Because of the volume of assignments and items published, prior exposure to i tems was not considered to be a major threat to student performance measures obtained. Two parallel forms of each multi-level test were prepared at each level and subject area, for separate use in the formative and summative phases of the study. Split-half reliability of the forms, based on a separate pilot study with a center-based sample, ranged from .67 to .72 in the primary group, and .78 to .82 in the intermediate group in reading and mathematics, respectively. Convergent validity coefficients with supplemental program ex posure, items attempted, and speed of completion were moderate to high and consistent with theoreti cal expectations (reported in the appendix).


Gathering Evidence 19 Table 4 Teacher Perceptions Survey: Results in Treatment and Comparison Classes Treatment Control Dimension Sample item Total Items Phase of Study MeanSD Mean SD CronbachÂ’s alpha Formative 15.173.13 15.11 2.47 Curriculum alignment, reading To what extent is the following skill/area addressed in your regular curriculum: Identifying main ideas? 6 Summative 15.103.14 15.51 2.92 .89 Formative 14.503.73 14.00 4.27 Curriculum alignment, math To what extent is the following skill/area addressed in your regular curriculum: Sequencing numbers? 11 Summative 24.116.08 22.61 5.44 .85 Formative 6.54 1.94 6.89 2.13 Parent involvement (In your class) To what extent are your studentsÂ’ parents/guardians involved this year in: Helping students with homework? 3 Summative 6.42 1.81 7.10 1.86 .74 Formative 7.67 1.97 8.56 1.88 Student Independence and Goal-directedness (In your class) To what extent are your studentsÂ’ showing independent, selfdirected behaviors in Mathematics?..In Reading? 4 Summative 8.72 1.22 9.51 1.61 .74 Formative Phase N (teachers)=15; Summative Phase N (teachers)=19 Evidence of content-based validity (match of testsÂ’ content with teachersÂ’ regular-day curricula) of the skills sampled on the multilevel te sts in reading and mathematics was obtained by semester through the teacher survey, and is shown in the appendix. As is evident, teachers in both treatment and comparison classrooms saw greater alignment of the reading skills with their regular curriculum than with mathematics skills; however, as the school year progressed, more of the mathematics skills were covered by teachers in bot h conditions, improving content validity by the end of the summative phase (see increase in composite score mean on curriculum alignment in Table 3).


Education Policy Analysis Archives Vol. 14 No. 12 20 Test administration conditions were un-timed. Each child started at several levels lower than their assessed ability level and was asked to go as far as he or she could. Starting and ending times were recorded. Scoring was standardized with the help of a key, and included partial credit scoring on a few items. Scorers were formally trained in a practice session until they were found to agree on their scoring decisions. Levels of scorer agreemen t in scoring of particular items was found to exceed 70% with practice tests. Other outcome measures Two self-efficacy scales (see the appendix), focusing on reading and mathematics respectively, were developed and validated for use in the summative phase of the study. Based on indicators drawn from the theoretical liter ature on self-efficacy, these instruments included 13–16 self-report items with 3 point Li kert scales. A typical item asked, Can you do the math problems your teacher gives you? The primary level instruments were designed as interview-based assessments, while at the intermediate level the same instruments were administered as teacherguided paper and pencil questionnaires. The intermediate level self-efficacy scales were contentvalidated against theoretically derived indicators by external experts and the research team. The scales showed adequate Cronbach’s alpha reliability (.74 in math and .77 in reading). The primarylevel instrument was tested during the formative investigation but not used in the summative study due to unacceptable reliability. Finally, scaled scores from the state and city standardized achievement test, CTB-4, were also used as additional measures of achievement outcomes in the second phase at the intermediate level. For primary children, teacher ratings from the Early Childhood Language Arts Scale locallydeveloped in the New York City system were used to compare treatment and comparison students. Program Fidelity in Formative and Summativ e Phases: Changes in Treatment Definitions In the formative study, potency of the treatment was operationally defined based on the number of after-school sessions attended by treatment children, with data collected on number of worksheets completed to supplement that inform ation. However, site observations during the formative phase revealed that not all students attended the after-school supplemental sessions regularly. Further, they were often pulled out earl y by their parents who took the assignments home for completion. The school principal added Saturday sessions to the extended hours on school days. The providers allowed this to happen, as it fi t their program theory calling for greater parent involvement and task engagement. A change was thus made to the summative study to improve validity of the design. An a priori decision was made in consultation with the pr oviders and school stakeholders to set a cut-off for student exposure to treatment at a minimum of 100 assignme nts in a subject area and to a minimum of 200 assignments over two semester s. Thus, the “treatment condition” was now operationally defined in a broader way based on task completion both in and out of the after-school classroom environment. This resulted in a small chan ge in the composition of the original samples at the primary and intermediate levels in the summative phase (fewer than 10 students were excluded, and most of these had moved away from the school ). Instead of imposing a standardized model that could not be sustained in real school environments, this alternate program model was collaboratively considered a more realistic operational definition of the supplemental program. As indicated earlier, to enhance internal va lidity of the quasi-experiment, key extraneous variables identified in the environment were examined statistically and ruled out as possible threats before the comparative summative study was undertaken. Grade-retained students without similar matches received year-long supplemental services in Grades 2 and 3 (N=11 in each). Likewise, a mixed-grade special education cl ass without matching pairs of children were in the supplemental


Gathering Evidence 21 program (N=7). These students were studied as se parate samples using one-group, pre-test to posttest change designs. The analyses were treated as descriptive, because of the lack of matching comparison children and small sample sizes. The summative study of prel iminary program effects thus focused on a primary sample (Grades PreK-1) and an intermediate sample (Grade 4–5) and used a comparative design, matched by grade level, and controlling for mid-year achievement on multi-level math and reading tests as the covariates. Results Extent of Treatment Fidelity: Classroom Observations At the end of the formative phase, classroom observation results were mixed (see the left panel of Table 5 showing frequencies). However, classroom processes changed in positive directions by the end of the year (Table 5, right-hand panel showing frequencies). The percentages in Table 5 refer to proportions of the total coded text data in different thematic categories by semester. Examples of text segments under each theme ar e provided as quotes in the extreme left-hand column. Themes have been logically grouped under broader “input” and “process” categories. Results from the formative phase in Table 5 can be compared on common thematic categories with results of the summative phase using rank-orders, rather than the absolute frequencies, as the number of observations lessened by about 1/3 in the second semester. The summary results reflect activities documented in classrooms sampled by semester; primary and intermediate level data are combined in the table. Table 5 (left) shows that progra m inputs were largely consistent with theory in the formative phase–with both the developers and the school prin cipal jointly investing considerable resources. The principal and corporation staff were documented to be highly involved with program delivery. Most teachers and aides were involved in classroom practices that were consistent with the program theory, although some of their actions were di rected towards arresting student misbehaviors. Classroom processes were uneven, however, particular ly in intermediate classrooms (grades 4–5, not isolated in the table). In all, there were 240 (4 1%) coded occurrences of student unruliness and 61 (11%) associated classroom management behavi ors. Such observations were classified as inconsistent with the theoretical expectations of a smoothly operating and quiet classroom. Among other inconsistent findings, parents were often observed pulling their children out during the supplemental hour and teachers tended to let them take assignments home. At the end of the summative phase (right hand panel of Table 5), observational records showed patterns suggesting that the program wa s being implemented in a manner that complied more with the major program guidelines. Notably, behaviors of students and teachers, at both primary and intermediate levels, were more consis tent with program expectations, and ongoing program inputs expected per theory were found to increase proportionally in classes observed. There was some continuing evidence of unruly student conduct (again, mostly at higher grade levels). However, compared to the first semester, th e high rank and frequency of this irregularity had reduced reflecting only 16% of coded observations.


Education Policy Analysis Archives Vol. 14 No. 12 22 Participant Teacher Pe rceptions and Buy-in Because the number is small, participant teacher survey results are not reported in a table. In the formative phase, only six of 9 participating tea chers responded to program-related questions on the teacher survey (in the appendix, item-sections 36 and 38 ). Particularly, when asked if the program had any instructional value, all responding teachers opted to leave that item blank in the first semester. At the end of the summative phase, there appeared to be greater acceptance of the treatment program by a majority of participating teachers co mpared to mid-year ratings. Notably, all the teachers responded to the survey. In all, 8 (89%) indicated that time for program management was “reasonable”, given the supports they received; 7 (78%) indicated the content of the assignments was “effective”; 6 (67%) endorsed the “instructiona l value” of the program and found the worksheet format to be “effective”; and 7 (78%) indicated that time and other resource demands were “reasonable”. Smaller numbers (1–5 of 9) of teach er participants chose “ineffective” responses to two questions or left them blank (11–56% responde nts). These items dealt with time for providing individualized feedback, consistency of the suppl emental program with regular curriculum (5, 56% positive responses in each), and other resource needs (4, 44% positive responses). Table 5 is presented overleaf.


Gathering Evidence 23 Table 5 Themes from Classroom Observations: Summary of Results from Formativ e and Summative Phases Formative Phase Summative Phase Themes Frequency (%) Rank Frequency (%) Rank 1.0 Observed Inputs Consistent with Program Theory Materials — “diagnostic testing”; “tests” ; “testing materials”; “placement materials”; “daily assignments”; “su pplies” “blue bins with materials organized”; “program shelves” “s tacked materials”;“game board” 61 (11%) 3 20 (9%) 5 Supports (developer/administration) — “s taff” helping teachers assess; “staff” reading test dir ections; “staff” providing training in “school auditorium”; “aides ” grading in progra m “resource room” ; “principal” stops in class; “principal inviting parents to orientation on phone”; “aides” helping teacher at table. 28 (4.8%) 7 34 (15%) 4 2.0 Helpful Inputs but Inconsistent with Program Theory “snacks” for children; “small class” sizes 5 (.09%) None recordedN/A 3.0 Observed Processes Consistent with Program Theory 3.1 Positive Environment —teacher giving “positive reinforcement”; “children seated in small desk clusters as aide encourages one child to answer”; “diverse groups of children ”; program “staff encouraging child”; “staff “helping”; teacher/aide “assisting” children. 84 (14%) 2 21 (9%) 5 3.2 Teacher/Aides Following Protocol — “giving directions”; “keeping time”; “grading assignments”; “monitoring make-up assessments”; “checking answers” “placing packets in bin”; “w alking around”: “looking at child’s work”. 56 (10%) 5 41 (18%) 2 3.3 Students Engaged/following Protocol — “self-checking an swers” “placing packets back in bin”; “on-task”; “ask ing math questions”; “asking reading question”; “counting out loud when fi nishing sheet”; doin g ”assignments for the day”. 32 (5%) 6 54 (24%) 1


Education Policy Analysis Archives Vol. 14 No. 12 24 Formative Phase Summative Phase Themes Frequency (%) Rank Frequency (%) Rank 4.0 Observed Processes Inconsistent with Program Theory 4.1 Teacher/Aide/Other Engaged in Student Behavior Management — teacher/aide asking “students to be quiet”; “reprima nding students”; “sending students home”; “suspending students”; “lou dly asking students to sit down”. 61 (11%) 3 20 (9%) 5 4.2 Student Misbehaviors during Supplemental Hour —“playful”; “wandering around”; “noisy”; “distracted”; “compl aining” “complaining loudly”; “not responding to directions”; “looking at others-not concentrating on work”; “chatting with others”. 240 (41%) 1 35 (16%) 3 4.3 Miscellaneous Activi ties outside Protocol and Program Plan —“teacher moving children to a high er level worksheet before they reach mastery ”; “Saturday sessions”; “parents picking up children before the supplemental hour has ended”; “teacher/aide lettin g children take wo rksheet home”; “children being moved out of XXX for disciplinary reasons”. 15 (2%) 8 None recordedN/A Ncodes (sem 1) = 582; Ncodes (sem 2) = 225. Percentages are rounded.


Gathering Evidence 25 Process-Outcome Relations: Formative Phase Initially (Table 6–7), achievement outcomes were better for treatment children at the primary level rather than at the intermediate. Better outcomes were likewise found in reading than in mathematics, using the multi-level tests as mid-year outcome measures. The combined primary level treatment group (Table 6) was 0.50 standard deviation (SD) units ahead of matched peers in mathematics perf ormance, and 0.58 SD units ahead in reading performance. Although this difference was not statis tically significant at the 5% error level, gradelevel interactions were non-significant showing that the early influence of the supplemental program was similar in all primary grades. With grade level increases scores improved significantly in both groups. In the combined intermediate grades (Table 7), treatment students were trailing behind their matched counterparts by -0.40 SD units in mathem atics scores. This difference was significant at 10% error level ( p =.08). In reading, Grade 5 students wer e 0.86 SD units ahead of matched peers while grade 4 students were -0.86 SD units below ma tched peers, generating an overall effect size of 0.035. The opposite results in Grades 4–5 yielded a significant interaction effect, showing that children in these two grade levels responded to the program differently ( p <.01). The mixed achievement outcomes at the intermediate level coul d be stacked against observations gathered from the intermediate classrooms (Table 5) and attrib uted to the high levels of behavior problems documented. Table 6 Results in Formative Pha se: Reading and Mathematics Performa nce in Primary Students Receiving Supplemental Instruction Outcome Variable Mean SD R2 Effect Size Reading (Primary Level) Treatment 26.77 15.15 .342 0.58 NS Comparison 22.22 7.83 Mathematics (Primary Level) Treatment 83.68 27.76 .602 0.50 NS Comparison 69.17 28.88 ANOVA tables available on request. NS not significant at 5% alpha level; F 1, 56 =2.42, p=.125 NS not significant at 5% alpha level; F 1, 52 =1.75, p=.192


Education Policy Analysis Archives Vol. 14 No. 12 26 Table 7 Results in Formative Phase: Re ading and Mathematics Performa nce in Intermediate Students Receiving Supplemental Instruction Outcome Variable Mean SD R2 Effect Size Reading (Intermediate Level) Treatment 69.60 9.57 .237 0.04 NS Comparison 69.29 8.78 Mathematics (Intermediate Level) Treatment 54.32 23.11 .537 -0.40 Comparison 64.21 24.89 ANOVA tables available on request. NS not significant at p = 05; F 1, 45 =0.085, p=.772 p < .10; F 1, 45 =3.305, p=.076 Teacher Perceptions and Treatment-Mo derator Effects: Formative Phase Table 4, referred to earlier, also showed the results on teacher-perceived levels of curriculum alignment, parent involvement and student independence in the classroom in the first and second phases of the investigation, based on means on teacher survey indices (see also the appendix for ratings on items 4–36). Findings were not very different over time or between treatment and comparison classroom teachers on composite survey indices. When means increased as they did on curriculum alignment with mathematics as the school year progressed, both treatment and comparison classroom teachers provided similar rating s on items, yielding comparable means. Comparison teachers reported marginally greater levels of Parent Involvement and Student Independence in their classrooms than treatment teachers. Survey item-level ratings from the summative phase on skill-alignment (evidence of content validity of outcome measures in the appendix) were similar in both participating and nonparticipating classrooms, with greater levels of fit reported with reading curricula. In the reading area, close to 2/3 of 19 teachers in both progra ms indicated matches to a “great extent” between the supplemental program’s reading skills and their cu rricula. In the math area, matches to a “great extent” were reported on recognizing numbers, reci ting numbers, sequencing numbers, addition, and word problems (1/3 to 2/3 of teachers). The remaining math skill areas, such as subtraction, multiplication and division, generated very low propor tions of positive ratings, even at the end of the year. To check for moderating effects of differenti al levels of curriculum alignment, parent involvement or student independence in treatment and comparison classes, factorial ANOVAs showed that teacher-perceived curriculum alignment levels in reading in the primary sample had significantly different achievement effects in the formative phase ( p =.05). Other results—a sampling of which is shown in the appendix—were non-significant for all other moderators in combined samples (primary and intermediate).The analyses were repeated in the summative study and the decision to use ANCOVAs was made after modera tor effects were found to be non-significant.


Gathering Evidence 27 Early Treatment Effects: Summative Phase. Table 8–9 and Figure 3 show the results of the ANCOVAs. Overall, the treatment primary group was 0.45 standard deviation units ahead of comparison children in reading performance on skills/areas covered in the supplemental curriculum, unadjusted for mid-year performance (Table 7 and top two panels of Figure 3). Adjusted for mid-year scores, the treatment group was 3.4 raw units ahead. In combined primary grades, the treatment group was 0.58 standard deviation units ahead in mathematics performance. Adjusted for mid-year performance, the treatment students were still 5.09 raw score units higher than their matched counterpart s. Although not statistically significant at the 5% error level, these effects may be classified as moderate in magnitude. Table 8 Results in Summative Phase: Reading and Mathem atics Performance in Primary Students Receiving Supplemental Instruction Outcome Variable Source of Variance (ANCOVA) Mean Square df F p Regression (covariate)6136.43 1 96.00 .000 Treatment 172.29 1 2.70 .106 Reading at Primary Level Error 63.90 58 Regression (covariate) 1941.38 1 100.8 .000 Treatment 116.58 1 0.6 .441 Math at Primary Level Error 194.04 57 Descriptive Statistics on Groups Mean SD Adjusted Mean R2 Effect Size Reading Treatment 31.1 13.1 30.7 .65 .45 Comparison 25.6 12.1 27.3 Math Treatment 108.06 20.11 103.37 .70 .58 Comparison 93.20 26.20 98.28 Covariate for both reading and math was the midyear reading score; F for covariate* treatment interaction=1.93, p=.17 (readi ng);.F=0.12, p=.726 (math).


Education Policy Analysis Archives Vol. 14 No. 12 28 Table 9 Results in Summative Phase: Reading and Mathem atics Performance in Intermediate Students Receiving Supplemental Instruction Outcome Variable Source of Variance (ANCOVA) Mean Square df F p Regression (covariate)2501.27 1 15.83 .000 Treatment 3.24 1 0.02 .887 Reading at Intermediate Level Error 157.99 44 Regression (covariate) 9433.75 1 51.60 .000 Treatment 1953.23 1 10.68 .002 Math at Intermediatey Level Error 182.82 50 Descriptive Statistics on Groups Mean SD Adjusted Mean R2 Effect Size Reading Treatment 57.8 12.7 56.34 .27 .08 Comparison 56.5 17.2 57.15 Math Treatment 84.1 21.3 84.17 .54 .65 Comparison 71.9 18.6 71.94 Covariate for both reading and math was the midyear reading score; F for covariate* treatment interaction=0.03, p =.863 (reading); F =.01, p =.968 (math). Treatment students were clearly ahead of their matched counterparts in the combined grade analysis at the intermediate level in mathematic s (Table 8 and bottom panels of Figure 3), as evidenced in a positive effect size of 0.65 ( p =.002). Adjusted for mid-year performance, the treatment students were still 12.23 raw units higher than their matched peers. In reading, however, there was a no discernable effect evident at the inte rmediate level (effect size of +0.08). Adjusted for mid-year scores, the treatment group was just 0.81 raw units below their matched peers.


Gathering Evidence 29 15 17 19 21 23 25 27 29 31 33 ObservedAdjusted Treatment Comparison Reading Means in Primary Students 40 43 46 49 52 55 58 ObservedAdjusted Treatment Comparison Reading Means in Intermediate Students 80 85 90 95 100 105 110 ObservedAdjusted Treatment Comparison Math Means in Primary Students 60 65 70 75 80 85 90 ObservedAdjusted Treatment Comparison Math Means in Inte rmediate Students Figure 3. Summative ANCOVA Results: Effects of Supplemen tal Program on Achievement


Education Policy Analysis Archives Vol. 14 No. 12 30 Other Effects Performance on distri ct and state tests On the Language Arts scale at the primary level, slightly higher proportions in the treatment group received teacher ratings of 5–6 (on a scale of 1–6) on Phonemic Awareness. In the other three areas, higher proportions of comparison students received ratings of 5–6. These differences were not statistically significant. On the CTB-4 math and reading test, the numbers of intermediate students with complete data changed from 2001 to 2002; thus these results could only be compared descri ptively with 14 unmatched cases. They are not reported here due to instability of findings. Test completion rates and time taken Controlling for ranges of scores by quartile on the multilevel tests, a preliminary comparison of average time taken by students in treatment and comparison group suggested a pattern showing students who received supplemental services typically took 6–10 minutes less time to complete the tests. Fo r example, the mean time taken in reading for students in the bottom quarter of the distribution was as follows at the primary level: Table 10 Primary Time Required for Test Comp letion, Bottom Quartile, by Group Group Mean time taken (minutes) SD Treatment 25.4 3.4 Comparison 31.3 8.4 Controlling for grade level and given similar testing conditions, the mean number of items attempted by students was also higher in treatment classes in mathematics, and significantly different from comparison students (F 1, 125= 11.69, p<.001). Typically, the treatment students attempted 2– 6 more items at each grade in reading; in mathematics the average differences were approximately 8– 20 more attempted items. Self-efficacy measures In the combined 4th and 5th grade samples, the treatment students had a mean Math Self-efficacy score of 23.0 (SD=4.0). The Compa rison children had a mean of 24.3 (SD=3.8). This yielded an Effect Size of -.034, favoring the students without the Supplemental program. With the Reading Self-efficacy measure, the treatment students’ mean was 18.6 (SD=3.3). The comparison children had a mean of 18.4 (SD=4.4), yielding an Effect Size of +.045, barely favoring the treatment students. Preliminary effects on self-efficacy were either absent or on the negative side. To sum up, the early effects of the supplemental program were evident on skills tests aligned with the supplemental curriculum, but not on other measures. The developer and the school personnel were reminded that observed positive ef fects were “gross effects” and tentative; that is, results depicted the effects of the supplemental pr ogram as operationalized at the site and necessarily confounded with those of other reforms and supports concurrently aiming to raise student achievement. Confounders could not be teased out, as the program by its very definition was an add-on to the regular day programs in the same subject areas. However, the potential effects could still be broadly gauged in comparable groups to whom supplemental services were provided or withheld.


Gathering Evidence 31 Discussion The paper began with an aim to demonstrate and appraise a complete empirical application of the ETMM design for gathering research evidence on school-based programs and policy initiatives, in light of NCLB requirements callin g for schools to implement programs supported by scientific evidence and the federal recognition of RCTs as the “gold standard” for scientific rigor. The focus was on a supplemental instruction program. The studies were done at one pilot site—an elementary school in Harlem. At the outset, the reader should be reminded that the present ETMM application was limited by several field constraints and lack of resources, particularly, a time limit of one academic year. However, given these realities, what were the key advantages and disadvantages of the ETMM approach as compared to RCTs, had the latter been a design option under the same conditions? In the present application, the ETMM study was akin to small-scale, multi-method case study, focusing in-depth on implementation of a supplemental prog ram at a particular site, and following the progress of the program as it matured and settled into a routine. It made inferences about possible early effects in treatment and comparison setti ngs at one site only. A quasi-experiment was embedded in the design from the start, but formal linkages of program processes to outcomes were emphasized in the confirmatory phase of the research. Despite the time limit, there were before and after studies included in the investigation, driven by different purposes. Within the boundaries of one school, the study attempted a systemic approach to the design, making a formal effort to map and attend to the possible interacti ve/mediating effects of various context input process variables in the larger environment of a new program on outcomes(CIPO) An analysis of a program’s theory of action in terms of CIPO variables was thus th e starting point of the design process. As documented, several design challenges were faced once the studies were begun in the Harlem school. This is not uncommon in pilot e fforts in real time school settings. Lessons were learned. Design changes were made— most design alterations were based on interactions with key stakeholders, formally gathered empirical evidence, and documented observations in situ Because of the use of comprehensive, mixed method approaches, there was better documentation of the various problems that aros e in both treatment and comparison environments and the larger organization: sample attrition, emer ging definition of the supplemental treatment in classrooms and the school, extent of treatment fidelity and stability as time passed, potential contaminants in the environment of both treatment and comparison students, such as student behavior problems. On all these, empirical data ge nerated from the formative phase informed design decisions and changes. Because there were two separate phases of the research design, instrumentation issues could be tackled in the first phase with analyses of early impact held off until some evidence of validity and reliability was at hand on major variable measures. Stakeholders could look at the findings themselves and use the first pha se results to alter program delivery; before-after comparisons could be made more meaningfully with an array of data from multiple sources. Teachers, leaders, parents gained more owners hip of the new program by the second phase, improving delivery and fidelity. Was it reasonable to incorporate a summative study within the pilot year of a new program? Ideally, the formative phase would last at least 1–2 years, with the summative phase starting soon after. Preferably, trained personnel would contin ue program implementation in the summative phase, either with cohorts students in the original treatment group continuing to receive services for studies of longitudinal effects, or with scaling up and expansion of the program to other, carefully selected sites to maximize generalizability and ecolo gical validity of the confirmatory phase results. Scaled-up experiments using RCTs are best deferre d until the second phase in ETMM studies; had


Education Policy Analysis Archives Vol. 14 No. 12 32 this been possible in the case presented here, it might have strengthened the quality of evidence (other things remaining constant). Feasible progra m models that emerged from the first phase could then be subjected to formal effectiveness testing in the second, using a tighter design that combined RCTs with other methods. Questions may be raised about the ad-hoc instruments developed for the present ETMM application. A limitation was that early e ffects were evidenced only on specially-designed assessments specific to the supplemental curriculum and using the developer’s item pool, rather than on independent, broader and standardized measures of achievement. Supplemental programs have narrower foci than regular curricula. When in pre-adoption stages, over-reliance on external standardized achievement measures may generate invalid findings due to issues of nonalignment/poor content validity. For optimizing lo cal validity, instruments and data-gathering methods may thus need to be customized for smallscale testing and monitori ng of novel programs, as shown here. At the same time, resources have to be dedicated to gathering sufficient evidence of validity and reliability for results to be defensible. Several recommendations were made to devel opers and school personnel, with cautionary pointers on limitations. The developers were informed that increased alignment of a supplemental program with the regular-day curriculum’s research base, content, and philosophy would likely improve outcomes as well as teacher and parent buy-in (as seen in teacher survey and student outcome data). The study also did not examine the quality of curriculum materials vis--vis the state’s content standards and standards for best pr actices set by national subject area associations such as the National Council for Teachers in Math ematics and the National Council for Teachers of English. As necessary, developers were encouraged to examine the content of curricular products and their consistency with credible research, best practices, broader subject area domains tapped by national standardized achievement tests. Developers and school-based personnel were advised to plan program tryouts, replications, and related re search with a longer term view, incorporating an understanding of the types of resources and conditions necessary for maximal success on particular outcome measures. To compare the costs of the ETMM approach versus randomized field trials, the reader could weigh the breadth and quality of evidence ge nerated from the present application versus the costs with RCT studies such as the 21st CLCC evaluation (described in the literature review). A main distinction is that the ETMM studies attend to program-development issues within particular environments while attempting to map a program’s processes and effects over time. As shown in the present case, the smaller-scale ETMM design permitted more inclusiveness and participation of stakeholders and better relationship-building with researchers, making program improvements more likely. Despite the limitations, thus, the full-array findings were better understood through the documentation; stakeholders and researchers could appraise the results in a more informed manner—building trust amongst each other. In terms of disadvantages, the major design barrier of the ETMM application had to do with the high demands on resources and commitments of the developer, researchers, and sponsor to the project. Larger scale efforts could not be considered because of the intense human resource and material demands at a single site. These drawbacks must be weighed against the depth, meaningfulness and lo cal utility of the body of information obtained. How much better would the quality of evidence be if a traditional RCT had been implemented instead at the school described? Even if students had been randomly assigned to the supplemental services and control conditions at the start, the original RCT design would have been severely compromised because of factors such as tea cher volunteers and high student mobility. With school-based innovations, thus, the answer may lie in carrying out a small number of in-depth, siterestricted, formative ETMM-type studies first. Once the first phase points to logistically feasible and promising program models ,a confirmatory phase co uld be initiated to scale up and test the models


Gathering Evidence 33 with experiments Such an approach may in fact be more cost-efficient in the long run than large scale randomized experiments (or quasi-experimen ts), without preparatory program-testing in natural settings. Compared to national implementations of RCTs, more limited and carefullymonitored ETMM-type field trials might better pr edict likely program impacts, and inform actions on subsequent program development and expansion. In the end, the question as to how well ETMM designs compare with the federallyrecommended gold standard must be left to the reader, other researchers, and users of research information. Further discussions should continue on alternate methods for improving scientific rigor of field studies and evaluations, particularly as successful instances of ETMM-type studies are documented in education and other fields.


Education Policy Analysis Archives Vol. 14 No. 12 34 References Aguirre International (2000). Save the children web of suppor t initiative: Annual report 1999– 2000 San Mateo, CA: Author. Anderson, L. M. & Weiner, L. (2004). Early implementation of suppl emental educational services under the No Child Left Behind Act: Year One Report Washington, D.C.: U.S. Department of Education, Office of the Under Secretary. Bickman, L. (2000). Summing up program theory. New Directions in Evaluation, 87 103–112. Campbell, D. T. (1981). Introduction: Getting re ady for the experimenting society. In L. Saxe & M. Fine. Social experiments: Method s for design and evaluation (pp. 13–18). Beverly Hills, CA: Sage Publications. Chatterji, M. (2004). Evidence of “What Wo rks”: An argument for extended-term mixed method (ETMM) evaluation designs. Educational Researcher, 33 (9), 1–13. Chatterji, M. (2005). Reprint. Evidence of “What Works”: An argume nt for extended-term mixed method (ETMM) evaluation designs. Educational Researcher, 34 (6), 13–24. Cohen, J. (2003). Supple mental education: Six essential components. Principal, 82 (5), 34–37. College Board. (1999). Reaching the top: A report of the Na tional Task Force on minority high achievement New York, NY: The College En trance Examination Board. Cook, T.D. (2002). Randomized experiments in educational policy research: A critical examination of the reasons the evaluation community has offered for not doing them. Educational Evaluation and Policy Analysis, 24 (3), 175–199. Cronbach, L. J., & Associates. (1980). Toward reform in program evaluation San Francisco, CA: Jossey-Bass Publishers. DeAngelis, K., & Rossi, R. (1997). Schools serving family needs: Ex tended day programs in public and private schools Issue Brief. Washington, DC: Am erican Institutes for Research. (ERIC Reproduction Docu ment No. ED 406 022.) Eisenhart, M. & Towne, L. (2003). Contestation an d change in national pol icy on “scientificallybased” education research. Educational Researcher, 32 (7), 31–38. Harvard Family Research Project (2003). Issues and opportunities in out-of-school time evaluation: Why, when and how to use evaluation—experts speak out Cambridge, MA: Harvard Family Research Project, Harvard Graduate School of Education. Huang, D., Gribbons, B., Kim, K.S. Lee, C., Baker, E.L. (2000). A decade of results: The impact of the LA’s BEST after school enrichment initiative on subsequent student achievement and performance Los Angeles, CA: UCLA Center for the Study of Evaluation.


Gathering Evidence 35 Mathematica Policy Re search, Inc., & Decision Inform ation Resources, Inc. (2003). When schools stay open late: The na tional evaluation of the 21st Century Community Learning Centers Program Jessup, MD: U.S. Department of Education, ED Pubs. No Child Left Behind Act of 2001 (NCLB), Public Law No 107–110, 115 Statut e 1425 (2002). Ryan, G. W., & Bernard, H.R. (2000). Data manage ment and analysis met hods. In N.K. Denzin and Y.S. Lincoln (Eds.), Handbook of Qualitative Research (pp. 769–802). Thousand Oaks, CA: Sage. U.S. Department of Education. (2002). No Child Left Behind : The facts about 21st Century Learning Washington, DC: Author. Retr ieved November 19, 2004, from University of California at Irvine Department of Education (2001). Evaluation of California’s after-school learning and safe neighborhoods partnerships program: 1999–2000 Preliminary report CA, Irvine: Author. Vallercamp, N., & Owens, D. (2003). Eight ke ys to a successful extended-day program. Principal, 82 (5), 22–25. Weischadle, D.E. (2002). Extended learning op portunities: Some lesso ns from the field. Education, 123 (1), 73–81. U.S. Department of Education. (2003). Identifying and implementi ng educational practices supported by rigorous evidence: A user friendly guide Washington, D.C.: Institute for Education Sciences.


Education Policy Analysis Archives Vol. 14 No. 12 36 About the Author Madhabi Chatterji Teachers College, Columbia University Young Ae Kwon Kwon Learning Center, Charlotte, North Carolina Clarice Sng Teachers College, Columbia University Email: Madhabi Chatterji, Ph.D., is Associate Professor of Measurement, Evaluation and Education at Teachers College, Columbia University. Her research interests are in evidence-gathering and evaluation of field inter ventions with systemic designs, designing classroom and institutional as sessment systems, and in th e design and validation of construct measures with classica l and Rasch measur ement methods. Young Ae Kwon, Ed.D. is the owner and director of the Kwon Learning Center at Charlotte, North Carolina. Her research interes ts are in effective af ter-school programs for minority children and advanced statistical models. Clarice Sng, Ed.D. is Associate Director, Office of Accreditation and Assessment, Teachers College, Columbia University. Her research interests are in instrument design and construct validation, mixed-m ethod and longitudinal research.


Gathering Evidence 37 Appendix: Supplemental Data Supplemental Program Evaluation: Teacher Survey Below is the text of the survey, with so me ratings choices indicated in brackets. Purpose : This survey is intended for teache rs whose classes are receiving the supplemental instruction progra m, as well as teachers of classes who are not. The purpose of this survey is to gather information on your classroom curriculum and environment, parent invo lvement levels, and if a pplicable, your current perceptions of the effecti veness and utility of th e supplemental program. Time: The survey should take on ly 10–15 minutes to comp lete. Please respond to the questions as honestly and as thoughtfully as you can. Confidentiality : The results will be used in the study in aggregated form only. Although we are asking for individual te acher names or classroom identifiers for matching student names to correct classrooms, all the information will be coded anonymously and kept strictly confidential. THANKS FOR YOUR TIME Teacher Name: Classroom: Grade: Number of students: Room #: Questions: 1. Does your classroom participate in the exte nded day supplemental program? [Yes/No] 2. Are students in your class repeating a gr ade and/or in a special education program? [Yes/No] 3. What innovative programs are in effect duri ng the regular day in your classroom in reading and mathematics? (E .g., Success for all) List up to 3 key programs: Curriculum Focus [reading]: To what extent are the following skills/areas addressed in the regular READING curriculum in your classroom? Use these responses( Great Extent Moderate Extent or Little or Not at all ): 4. Reading comprehension in leveled passages 5. Listening comprehension in leveled passages 6. Identifying main ideas 7. Identifying details 8. Sequencing main ideas/details 9. Making connnections among ideas (e.g., cause and effect): Curriculum Focus [Math]: To what extent are the following skills/a reas addressed in the regular MATH curriculum in your classroom? Use these responses( Great Extent Moderate Extent or Little or Not at all ):


Education Policy Analysis Archives Vol. 14 No. 12 38 10. Recognizing numbers 11. Reciting numbers 12. Sequencing numbers 13. Adding/subtracting 1–4 digit numbers in horizontal or vertical notation 14. Adding/subtracting 1–4 digi t numbers with place value 15. Multiplication tables 16. Multiplication problems with 1–4 digits 17. Simple division 18. Long division 19. Word problems with above operations 20. Fractions 21. Adding fractions; subtracting fractions 22. Drawing lines/writing skills (Motor skills; hand-eye coordination) 23. List a maximum of 5 areas that you do emphasize that are not listed above: Parent Involvement: Think of your class as a whole. To what extent are your students’ parents/guardians involved in th eir student’s education this year in the areas listed? Use these responses( Great Extent Moderate Extent or Little or Not at all ): 24. Helping student with homework or academics 25. Responding to teacher requests/needs 26. Attending orientat ions/trainings 27. Attending school functions Perceptions of Student Performanc e: Think of your class as a wh ole. Compared to the beginning of the year, to what extent are your students showing gains in these areas? Use these responses( Great Extent Moderate Extent or Little or Not at all ): Use these responses(A-C): 28. Mathematics 29. Reading 30. Writing (words, composing sentences, stories, themes) 31. Other subjects Perceptions of Student Independence: Think of your class as a whole. Compared to the beginning of the year, to what extent are your students show ing signs of self-directed and independent learning beha viors? Use these responses(Great Extent Moderate Extent or Little or Not at all ): 32. Mathematics 33. Reading 34. Writing (words, composing sentences, stories, themes) 35. Other subjects Supplemental Program Perceptions (RESPO ND ONLY IF YOUR CLASS IS RECEIVING SUPPLEMENTAL SERVICES.) Rate your perceptions of the effectiveness of the program in these areas. [Scale: Effective/Reasonable or Not Effective/Unreasonable ; comments also allowed.] 36. Quality of Worksheet Assignments 36.1 Content 36.2 Presentation/ format 36.3 Consistency with regular curriculum 36.4 Instructional value 37. Resource Needs 37.1 Time for gradin g worksheets 37.2 Time for program management


Gathering Evidence 39 37.3 Time for providing in dividualized feedback 37.4 Other resource needs. 38. Program Support during pilot, COMMENT ON WHAT WOULD NEED TO HAPPEN FOR YOU TO ADOPT THE PROGRAM. Thanks again for your time! Convergent Validity Evidence Table A-1 Correlations of Multi-Level Read ing and Math Composite Scores Other variable Reading Test Score Math Test Score Program Exposure (# of worksheets completed) .62 .41 Number of Items attempted .97 .70 Completion time .92 .57 N=66; Split-half reliability ranges .6 7-.72 (primary); 78-.82 (intermediate) Evidence of Content-validi ty of Multi-level Tests Table A-2 Teacher Ratings of Curriculu m Alignment of Supplement al Program (B y Program) Supplemental Program Comparison Program Overall Program Extent: Raw (%) Extent: Raw (%) Extent: Raw (%) TeachersÂ’ Survey Item Great Moderate Little /NoneGreat Moderate Little /NoneGreat Moderate Little /None 20. Fractions 5 (56)a 0 3 (33) 2 (20) 2 (20) 6 (60) 7 (37)a 2 (11) 9 (47) 21. Adding/ subtracting fractions 1 (11) c 1 (11) 4 (44) 0b 2 (20) 6 (60) 1 (5)d 3 (16) 10 (53)22. Drawing lines/writing skills 4 (44) a 3 (33) 1 (11) 5 (50) a3 (30) 1 (10) 9 (47)b 6 (32) 2 (11) a One survey with no rating for this item; b Two surveys with no rating; c Three; d Five.


Education Policy Analysis Archives Vol. 14 No. 12 40 Table A-3 Teacher Ratings of Curriculu m Alignment of Supplement al Program (B y Program) Supplemental program Compariso n program Both programs Extent: Raw (%) Extent: Raw (%) Extent: Raw (%) Teachers’ survey item Great Moderate Little /None Great Moderate Little /None Great Moderate Little /None Curriculum alignment with Reading 4. Reading comprehension 5 (56) 3 (33) 1 (11) 7 (70) 2 (20) 1 (10) 12 (63) 5 (26) 2 (11) 5. Listening comprehension 6 (67) 3 (33) 0 7 (70) 3 (30) 0 (0) 13 (68) 6 (32) 0 (0) 6. Identifying main ideas 5 (56) 4 (44) 0 7 (70) 2 (20) 1 (10) 12 (63) 6 (32) 1 (5) 7. Identifying details 5 (56) 4 (44) 0 6 (60) 3 (30) 1 (10) 11 (58) 7 (37) 1 (5) 8. Sequencing main ideas 4 (44) 5 (56) 0 (0) 6 (60) 3 (30) 1 (10) 10 (53) 8 (42) 1 (5) 9. Connections among idea s 5 (56) 3 (33) 1(11) 6 (60) 4 (40) 0 11 (58) 7 (37) 1(5) Curriculum alignment with Math 10. Recognizing numbers 8 (89) 1 (11) 0 9 (90) 1 (10) 0 17 (89) 2 (11) 0 11. Reciting numbers 6 (67) 2 (22) 1 (11) 7 (70) 3 (30) 0 13 (69) 5 (26) 1 ( 5) 12. Sequencing numbers 7 (78) 1 (11) 1 (11) 5 (50) 3 (30) 2 (20) 12 (63) 4 (21) 3 (16) 13. Adding 1–4 digit numbers 4 (44) 3 (33) 2 (22) 4 (40) 3 (30) 3 (30) 8 (42) 6 (32) 5 (26) 14. Adding 1–4 digit with place value 2 (22) 5 (56) 2 (22) 4 (40) 3 (30) 3 (30) 6 (32) 8 (42) 5 (26) 15. Multiplication tables 3 (33)a2 (22) 4 (45) 0 a 2 (20) 7 (70) 3 (16) a 4 (21) 11 (58) 16. Multiplication with 1–4 digits 2 (22) a 3 (33) 3 (33) 1(10) 2 (20) 7 (70) 3 (16) a 5 (26) 10 (53) 17. Simple division 1 (11) a 4 (44) 3 (33) 2 (20) 3 (30) 5 (50) 3 (16) a 7 (37) 8 (42) 18. Long division 0 a 2 (22) 6 (67) 1 (10) 1 (10) 8 (80) 1 (5) a 3 (16) 14 (74) 19. Word problem with division 1 (11) a3 (33) 4 (44) 5 (50) 0 5 (50) 6 (32) a 3 (16) 9 (47) a—One survey with no rating for this item.


Gathering Evidence 41 Self-efficacy Assessment Specifications and Sample Items Table A-4 Theoretical Indicators an d Matching Items in Matha Domain Indicator Sample Items Response Scale Individual reports or displays a: 1. “Can do” Spirit and Belief in being Successsful in Subject 1. Can you finish your math work by yourself? a) Yes, all or most of the time b) Yes, some times c) No, very rarely or never 2. Can you complete the math work your teacher gives you? 3. When you see a new math problem, do you like to solve it by yourself? 4. Do you try to do your math work yourself before you ask for help? 2. (Positive)Attitude towards Subject (no anxiety or fears) 5. Is doing math (number work) fun for you? a) Yes b) Unsure c) No 6. Do you like playing number games? 7. Do you like to work hard on math problems? 8. Do you think learning math will help you later? 3. (Positive) Self-concept related to Subject 9. Are you good at math? a) Yes b) Unsure c) No 10. Have you al ways done well in math? 11. Do you think you get good grades in math? 12. Do you think you are just as good at math as your classmates? a Parallel Items were written for Reading and Mathematics.


Education Policy Analysis Archives Vol. 14 No. 12 42 Multi-level Test Specif ications (Excerpts) Table A-5 Ordered Content Indicators and Matching Items in Matha Content Domain Indi cator Sample Items Domain Weight Level 1 Counting/sequencing 15% Ordered indicators: Counting up to 10, counting to 20, counting to 100, recognizing and ordering object sets up to 5, up to 10, up to 20–30, sequencing numbers up to 10, up to 100, up to 200. Item shows 5 objects on a page (e.g., pictures of 5 airplanes). Prompt says: Count the pictures while pointing to each one. Box provided for student to fill in number. Least difficult -Primary 2 Addition and Subtraction 25% Ordered indicators: Adding single digit numbers, adding two digit numbers, adding with numbers up to 100 with place value, adding of 2and 3-digit numbers with place value (same indicators for subtraction) 185 + 325 ----------7-2 = Least difficult – Primary More difficult Intermediate 3 Multiplication an d Division 25% Repeated addition, multiplication up to 3, multipli cation up to 12, digits* 1-, 2-, 3digits, 3digits*3digits (similar range of indicators for division, dividing with and without remainder) ___ x 8=48 185 x 5 ----------Least difficult – Primary Least to More difficult Intermediate 4. Fractions 25% Simple reduction, rewriting improper fractions, adding fractions with same or different denominators, Subtracting fractions 49/ 5= 2/5 + 3/5= Least to More difficult Intermediate 5. Word problems 10% Problems using simple operations; problems using more difficult operations Tom had one cookie. Then Sue gave him 5 more. How many cookies did Tom have altogether? Least difficult – Primary More difficult Intermediate a Similar test design for reading domain.


Gathering Evidence 43 Testing of Treatment x Moderators Effects on Outcomes: A Sampling of Results from the Formative Phase Table A-6 Interaction Effects of Curriculum Alig nment and Treatment on Mathematics Low Alignment High Alignment Group Mean SD Mean SD N Treatment 71.94 26.48 68.74 31.14 53 Comparison 67.66 28.16 66.16 26.25 54 Source of Variance Type III Sums of Squares df Mean Square F p Treatment 297.751 1 297.751 0.367 0.546 Curr. Alignment 139.08 1 139.081 0.171 0.680 Treatment x Curr. Align. 18.35 1 18.359 0.023 0.881 Error 83615.54 104 Table A-7 Interaction Effects of Parent Involvem ent and Treatment on Mathematics Low Involvement High Involvement Group Mean SD Mean SD N Treatment 80.10 17.77 63.09 33.61 53 Comparison 76.44 25.93 58.79 25.66 54 Source of Variance Type III Sums of Squares df Mean Square F p Treatment 412.75 1 412.75 0.560 0.456 Parent Involvement 7828.47 1 7828.47 10.62 0.002 Treatment x Parent involvement 2.71 1 2.71 0.004 0.952 Error 75921.44 104 See Table 3 and the survey described at the beginning of the appendix for survey indices and descriptive statistics. Median splits on survey indices were used to create sub-groups for both Tables A-6 and A-7. LeveneÂ’s test for equality of variances was non-sig nificant in all cases.


Education Policy Analysis Archives Vol. 14 No. 12 44 Table A-8 Interaction Effects of St udent Independence and Treatment on Mathematics Low Independence Hi gh Independence Group Mean SD Mean SD N Treatment 65.79 36.44 73.17 22.17 53 Comparison 55.42 26.82 76.20 23.82 54 Source of Variance Type III Sums of Squares df Mean Square F p Treatment 357.16 1 357.16 0.476 0.492 Student Independence 5247.88 1 5247.88 6.99 0.009 Treatment x Student independence 1188.42 1 1188.42 1.58 0.211 Error 77292.73 104 See Table 3 and the survey described at the beginning of the appendix for survey indices and descriptive statistics. Median splits on survey in dices were used to create sub-grou ps. LeveneÂ’s test for equality of variances was non-significant in all cases.


Gathering Evidence 45 EDUCATION POLICY ANALYSIS ARCHIVES Editor: Sherman Dorn, University of South Florida Production Assistant: Chris Murre ll, Arizona State University General questions about ap propriateness of topics or particular articles may be addressed to the Editor, Sherman Dorn, Editorial Board Michael W. Apple University of Wisconsin David C. Berliner Arizona State University Robert Bickel Marshall University Gregory Camilli Rutgers University Casey Cobb University of Connecticut Linda Darling-Hammond Stanford University Gunapala Edirisooriya Youngstown State University Mark E. Fetler California Commission on Teacher Credentialing Gustavo E. Fischman Arizona State Univeristy Richard Garlikov Birmingham, Alabama Gene V Glass Arizona State University Thomas F. Green Syracuse University Aimee Howley Ohio University Craig B. Howley Ohio University William Hunter University of Ontario Institute of Technology Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Green Mountain College Les McLean University of Toronto Heinrich Mintrop University of California, Berkeley Michele Moses Arizona State University Anthony G. Rud Jr. Purdue University Michael Scriven Western Michigan University Terrence G. Wiley Arizona State University John Willinsky University of British Columbia


Education Policy Analysis Archives Vol. 14 No. 12 46 EDUCATION POLICY ANALYSIS ARCHIVES English-language Graduate -Student Editorial Board Noga Admon New York University Jessica Allen University of Colorado Cheryl Aman University of British Columbia Anne Black University of Connecticut Marisa Cannata Michigan State University Chad d'Entremont Teachers College Columbia University Carol Da Silva Harvard University Tara Donahue Michigan State University Camille Farrington University of Illinois Chicago Chris Frey Indiana University Amy Garrett Dikkers University of Minnesota Misty Ginicola Yale University Jake Gross Indiana University Hee Kyung Hong Loyola University Chicago Jennifer Lloyd University of British Columbia Heather Lord Yale University Shereeza Mohammed Florida Atlantic University Ben Superfine University of Michigan John Weathers University of Pennsylvania Kyo Yamashiro University of California Los Angeles


Gathering Evidence 47 Archivos Analticos de Polticas Educativas Associate Editors Gustavo E. Fischman & Pablo Gentili Arizona State University & Universidade do Estado do Rio de Janeiro Founding Associate Editor for Spanish Language (1998—2003) Roberto Rodrguez Gmez Editorial Board Hugo Aboites Universidad Autnoma Metropolitana-Xochimilco Adrin Acosta Universidad de Guadalajara Mxico Claudio Almonacid Avila Universidad Metropolitana de Ciencias de la Educacin, Chile Dalila Andrade de Oliveira Universidade Federal de Minas Gerais, Belo Horizonte, Brasil Alejandra Birgin Ministerio de Educacin, Argentina Teresa Bracho Centro de Investigacin y Docencia Econmica-CIDE Alejandro Canales Universidad Nacional Autnoma de Mxico Ursula Casanova Arizona State University, Tempe, Arizona Sigfredo Chiroque Instituto de Pedagoga Popular, Per Erwin Epstein Loyola University, Chicago, Illinois Mariano Fernndez Enguita Universidad de Salamanca. Espaa Gaudncio Frigotto Universidade Estadual do Rio de Janeiro, Brasil Rollin Kent Universidad Autnoma de Puebla. Puebla, Mxico Walter Kohan Universidade Estadual do Rio de Janeiro, Brasil Roberto Leher Universidade Estadual do Rio de Janeiro, Brasil Daniel C. Levy University at Albany, SUNY, Albany, New York Nilma Limo Gomes Universidade Federal de Minas Gerais, Belo Horizonte Pia Lindquist Wong California State University, Sacramento, California Mara Loreto Egaa Programa Interdisciplinario de Investigacin en Educacin Mariano Narodowski Universidad To rcuato Di Tella, Argentina Iolanda de Oliveira Universidade Federal Fluminense, Brasil Grover Pango Foro Latinoamericano de Polticas Educativas, Per Vanilda Paiva Universidade Estadual Do Rio De Janeiro, Brasil Miguel Pereira Catedratico Un iversidad de Granada, Espaa Angel Ignacio Prez Gmez Universidad de Mlaga Mnica Pini Universidad Nacional de San Martin, Argentina Romualdo Portella do Oliveira Universidade de So Paulo Diana Rhoten Social Science Research Council, New York, New York Jos Gimeno Sacristn Universidad de Valencia, Espaa Daniel Schugurensky Ontario Institute for Studies in Education, Canada Susan Street Centro de Investigaciones y Estudios Superiores en Antropologia Social Occidente, Guadalajara, Mxico Nelly P. Stromquist University of Southern California, Los Angeles, California Daniel Suarez Laboratorio de Politicas Publicas-Universidad de Buenos Aires, Argentina Antonio Teodoro Universidade Lusfona Lisboa, Carlos A. Torres UCLA Jurjo Torres Santom Universidad de la Corua, Espaa


Download Options

Choose Size
Choose file type
Cite this item close


Cras ut cursus ante, a fringilla nunc. Mauris lorem nunc, cursus sit amet enim ac, vehicula vestibulum mi. Mauris viverra nisl vel enim faucibus porta. Praesent sit amet ornare diam, non finibus nulla.


Cras efficitur magna et sapien varius, luctus ullamcorper dolor convallis. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Fusce sit amet justo ut erat laoreet congue sed a ante.


Phasellus ornare in augue eu imperdiet. Donec malesuada sapien ante, at vehicula orci tempor molestie. Proin vitae urna elit. Pellentesque vitae nisi et diam euismod malesuada aliquet non erat.


Nunc fringilla dolor ut dictum placerat. Proin ac neque rutrum, consectetur ligula id, laoreet ligula. Nulla lorem massa, consectetur vitae consequat in, lobortis at dolor. Nunc sed leo odio.