1 of 23 A peer-reviewed scholarly journal Editor: Gene V Glass College of Education Arizona State University Copyright is retained by the first or sole author, who grants right of first publication to the EDUCATION POLICY ANALYSIS ARCHIVES EPAA is a project of the Education Policy Studies Laboratory. Articles appearing in EPAA are abstracted in the Current Index to Journals in Education by the ERIC Clearinghouse on Assessment and Evaluation and are permanently archived in Resources in Education Volume 11 Number 34September 24, 2003ISSN 1068-2341Constraining Elementary Teachers' Work: Dilemmas and Paradoxes Created by State Mandated Te sting Sandra Mathison University of Louisville Melissa Freeman University at Albany, SUNYCitation: Mathison, S. and Freeman, M. (2003, Septe mber 24). Constraining Elementary Teachers' Work: Dilemmas and Paradoxes Created by State Mandated Te sting. Education Policy Analysis Archives, 11 (34). Retrieved [Date] from 34/.AbstractThere are frequent reports of the challenges to tea cher professionalism associated with high stakes and man dated testing (McNeil, 2000). So, we were not surprised i n this year-long study of two elementary schools in upstat e New York to hear teachers talk about the many ways the 4th grade tests in English Language Arts, Mathematics a nd Science undermine their ability to do their jobs wi th integrity. We came to understand in more nuanced wa ys the ongoing tension created by teachers' desires to be professionals, to act with integrity, and at the sa me time to give every child a chance to succeed. What we found in these schools is that the high stakes tests continu ally forced teachers to act in ways they did not think w ere professional and often resulted in creating instruc tional environments that teachers did not think were condu cive to student success.


2 of 23 The teachers at these elementary schools are not ra dicals. They do not seek complete autonomy, they do not esc hew the need for accountability (even bureaucratic accountability), they find some virtue in state man dated tests, they are content within centralized systems that proscribe some aspects of their work. But, they als o perceive themselves as professionals with both the responsibility and capability of doing their jobs w ell and in the best interests of their students. New York Stat e's outcomes based bureaucratic accountability system t ests their resolve, makes them angry or frustrated, and requires unnecessary compromises in their work.Most of our time in fourth grade is spent test-prep ping There is very little of the extra projectsThe extra fun kinds of activitiesThat we used to be able to doThat goes by the waysideBecause we need to test prepBeing in fourth grade is almost an advantageIf I need materials I sayOh it's test relatedThen I can get themIf I have a child that I need to have looked atOh it's fourth gradeThere's more of an emphasis on somethingWhether that's good or bad I'm most uncomfortable at the mid-yearWhen it's time for us to decideIs this child going to meet the criteriato move on to the next grade?You take a child to a retention committeeThis child might not necessarily be ready for the n ext gradeBut professionally I know retention is not the answ er That is no longer weighted very heavilyWhen you as a professional sayI know the solution for this child is not retentionWhat this test is testing is goodKids should be able to read a passageAnd respond to it in writingThere's nothing wrong with thatWhat's wrong is the way the adults in the worldTake the scores and report them It's a benchmarkIf a child can't do it in fourth gradeAnd they can get it in fifth grade


3 of 23 Why should we penalize them?And if it takes them an extra year to master someth ing That's okay We are not financial planners, where we are judgedIn how many millions of dollars we brought inWe are not Wal-MartIn how many sales we madeWe are a service industrySo stop comparing successWith scores, growth, end productsWhat if you have a kid who got a two on the ELABut was a knuckleheadAn emotional disasterDisruptiveBut during the course of the yearIn behaviorIn courtesy and respectImproved tremendouslyAre you not a success then?Did that kid not improve?Are they measuring that?[poetic transcription of a group interview of Willo w Valley teachers]IntroductionThe current accountability strategies of school ref orm rely heavily on measuring outcomes, especially student achievement, and attac hing consequences, either positive or negative, to various levels of performa nce. These accountability strategies effect everyone and every aspect of scho ols and schooling at local, regional, national and international levels. This a rticle, examines the ways state mandated testing, the primary vehicle of accountabi lity, effect teachers' work and, in particular, how their professionalism is serious ly challenged by this testing. There are frequent reports of the challenges to tea cher professionalism (Note 1) associated with high stakes and mandated testing (M cNeil, 2000). So, we were not surprised in this year long study of two elementary schools in upstate New York to hear teachers talk about the many ways the 4th grad e tests in English Language Arts, Mathematics and Science undermine their abili ty to do their jobs with integrity. What we came to understand in more nuanced way is t he ongoing tension created by teachers' desire to be professionals, to act wit h integrity, and at the same time to give every child a chance to succeed. What we found in these schools is that state mandated tests continually forced teachers to act i n ways they did not think were professional, and that, in fact, this was often nec essary in order to give every child an opportunity to succeed.Context and Methodology


4 of 23 This year long ethnographic field study of two scho ols was conducted during the 2001-02 school year in two school districts in upst ate New York, and is part of a larger study of the relationships among teaching, l earning and state mandated testing in four upstate New York school districts. These school districts are different from many, at least at the moment, since each is pa rticipating in a National Science Foundation funded teacher enhancement project. This project is aimed specifically at providing professional development in science to elementary and middle school teachers with a pointed emphasis on helping teacher s better prepare their students for the New York State 4th and 8th grade science te sts. 2001-02 was the third year of this professional development project.Our research postulates that teachers in these dist ricts might be better able to cope with the demands of state mandated testing, certain ly in science but perhaps in other subjects as well, as a result of teachers' po tentially greater access to professional development. This paper does not addre ss this issue directly, but at this stage of our research project we are doubtful that this relationship holds. This is true in part because the science tests are signific antly less important to teachers, school administrators, and the New York State Educa tion Department than are the English Language Arts and Mathematics tests at the elementary and middle school levels. Having considered this possibility, our res earch focuses holistically on the interactions among teaching and learning across all subject matter. Indeed, as we will discuss here the relative importance of the te sts and when they are administered are key factors in decisions about cur ricular emphasis across the school year. Our long-term goal is to understand th e complex interactions at the classroom, building and system levels among the man y demands the state accountability system places on the educational ent erprise. In New York State, "outcome-based bureaucratic acco untability" prevails (O'Day, 2002). This is a form of accountability that holds teachers and schools accountable to state education authorities for producing "speci fic levels or improvements in student learning outcomes." (p.8) These student lea rning outcomes are manifest in performance on state mandated tests beginning in 4t h grade on through Regents Examinations required now of all students in New Yo rk's high schools. Such an outcome based bureaucratic accountability strategy focuses teachers (and students) on specific forms of limited knowledge an d skills and in so doing focuses pedagogical and curricular decision-making.The fieldwork for this study involved at least one day per week in each school--observing classrooms, talking with teachers and administrators, and attending school meetings and events. (Note 2) A great deal of our field work focused on 4th grade classrooms (since this is wher e the testing burden primarily lies) but we observed classrooms and talked with te achers at every grade level. Additionally, a focus group interview with teachers and a focus group interview with parents were conducted, as were individual intervie ws with building and district administrators. Throughout the data analysis, we en gaged a number of teachers and the principal at each school as peer debriefers continually checking our understandings and reading our case studies.Table 1 summarizes descriptive information about th e schools and districts and Table 2 indicates the schools' pass rates on the EL A, mathematics, and science


5 of 23 tests for the past three years. Table 3 illustrates the range of state mandated tests given in New York state elementary schools. Include d in this table are the dates the tests are administered and the format. Both are cri tical elements in teachers' decisions about what to teach, how, and when. Addit ionally, but not part of this study, New York has adopted, under the leadership o f Commissioner Richard Mills, the "Regents for All" Plan which will require all students pass a minimum number of courses and Regent's Examinations in five subjects to receive a State recognized high school diploma. Table 1 Description of Schools and DistrictsSchoolDistrict# of Students # of teachers Free/reduced lunch Race/ethnicityEnglish Lang Learners Grade levels Hemlock Elementary* 17 buildings; urban; overall 69% of students are on free/reduced lunch; drop out rate 7%; 9000 students 3953090%52% white 35% Black12% Hispanic 1% other 0%PreK 5 Willow Valley Elementary* 2 buildings (elementary & ms/hs); working class, predominately white; 1500 students 8185246%93% white 5% Black 1% Hispanic 1% other 2%K 6Source: 2002 New York State School Report Cards* pseudonyms are used for schools Table 2 Test Scores (% of students "passing" 4th Gr ade State Tests)SchoolYearELAMathScienceHemlock*1998-9915%48% 1999-0040%53%62%2000-0150%63%63% Willow Valley**1998-9944%71% 1999-0048%72%56%2000-0155%77%77%*This school did not meet the state standard in ELA but made adequate yearly progress (AYP) in 2000-01.**This school met the state standard and made adequ ate yearly progress (AYP) in 2000-01.


6 of 23 Table 3 New York State Mandated Elementary Tests (2 001-02)Grade Level FallSpringTest Format 4th English Language Arts (early Feb) Reading & 28 mc questionsListening & writtenresponsesReading & written responsesIndependent writing prompt Mathematics (early May) 30 mc questionsShort and extended responses Science (May) 45 mc questionsPerformance--5 stations; 4questions/station 5thSocial Studies (Nov) 45 mc questions 3-4 constructed responses1 document based questionContexts for Teachers' DilemmasTeachers may never have had much autonomy and the p rofessional status of teaching cannot be taken for granted. Teachers' wo rk has historically received low pay, been perceived as relatively low status, and o ften operates within authoritarian and often petty school cultures (Katz, 1971). "Educ ation has not suffered from any freedom granted teachers to run schools as they see fit; it has suffered from the suffocating atmosphere in which teachers have had t o work" (p.131). Still, much educational research demonstrates the centrality of teachers in educational reform (Elmore, 1996), they are "curricular-instructional gatekeepers" (Thornton, 1991). Schools have also been the locus of almost every so cial change effort placing ever more demands on teachers (e.g., drug education, sex education, values education, environmentalism, bus duty, data management) with n o reprieve from prior demands. The current standards based reform movemen t with its clear specification of content, pedagogy, and assessments adds to these demands, increases authoritarianism, and further erodes teac hers' sense of professionalism (Madaus, 1998; Mathison, 1991; Noble & Smith, 1994; Ross, 2000; Vinson, Gibson & Ross, 2001). In a study of Kentucky teachers afte r the implementation of the Kentucky Education Reform Act, Kannapel, Coe, Aagaa rd, Moore & Reeves (2000) conclude that, "the educators we spoke with resente d the accountability measures as an insult to their professionalism."There is ample research describing how state mandat ed tests, particularly high stakes tests, challenge and compromise the professi onalism of teachers. McNeil's (2000) research in Texas illustrates a range of con straints on teachers' work, constraints that lead them to "exclude their riches t knowledge from their lessons" (p.192). These constraints spring from the increase d standardization and specification of important knowledge as that which is on the test. As a result, teachers adopt generic forms of content and present ation; develop a "test based curriculum"; separate content "for the test" and "r eal content"; further fragment


7 of 23 knowledge; and even retire. Testing leaves little time for "real instruction" (Hoffman, Assaf & Paris, 2001). In some cases, when a mandated test demands something that has not previously been a routine pa rt of the curriculum, such as writing or problem solving, there is refocusing alt hough in ways driven pointedly by the test (Hillocks, 2002; Kannapel,, 2000).Teachers do not feel good about the constraints tha t testing places on their work. McNeil (2000) describes teachers moving away from p articularized child centered teaching to teacher centered generic teaching, beca use the latter reflected state mandated curriculum and assessments. Dramatically, she concludes: "The reforms required that they choose between their personal su rvival in the system or their students' education" (p.192).The schools in this study reflect findings of other researchers. Teachers at Hemlock and Willow Valley Elementary Schools perceived thei r professionalism to be diminished. Through outcomes-based bureaucratic acc ountability teachers' work has come to be defined by the state-mandated tests, especially in English Language Arts, as well as district directives geare d to improve state test scores. But for these teachers it is not an either or choic e between personal survival and the students' education. These teachers confront th e dilemma of being a good teacher, a professional, and helping kids to succee d, which is marked by performance on state tests. What we saw repeatedly was that this dilemma is almost always solved in favor of the students, that teachers sacrifice their professional integrity in order to help every child be as successful as s/he can be on the tests, even when they lack faith in the indi cator. This resolution plays itself out in the classroom as well as around the administ ration and scoring of the state tests. The following sections elaborate how teacher s experience and come to uneasy resolutions of the dilemmas they face.Faith in ChildrenThe popular media and politicians often portray tea chers as contributing to the low achievement of children, especially children of col or, by having low expectations and lacking the faith that all children can learn. The political slogan, "No Child Left Behind," which titles the current Elementary and Se condary Education Act is a manifestation of this belief. However, the teachers in these schools, both in word and deed, challenge this representation although li ke teachers everywhere they talked of the overwhelming social forces on childre n's lives outside the school building. And, they did not always feel they were a ble to compensate for a lack of experiences (such as rich early literacy experience s) or life circumstances (poverty, violence, homelessness).This was especially true at Hemlock Elementary, a s chool where most children are on free or reduced lunch and many are African Ameri can. "These are not children that don't learn. These are children that do learn-slowly." "We are being judged on something that is largely out of our control," Heml ock teachers explain as they relate stories of student absenteeism, high mobilit y, and academic need. "And what does it do to the individual kid? If we have a chil d who's a slow learner, that is a huge concern that is being left out of this testing thing by the media and politicians and the Regents. They don't want to know that there is such a thing as a slow learner. And to tell a child, who gets to this high er level in a school year that they


8 of 23 are a failure because they didn't reach this goal i s horribly wrong, horribly wrong for that child."There is less confidence in the children and teache rs have a more limited sense of efficacy at Willow Valley. Willow Valley Elementary is a huge school, a consolidation of three elementary buildings into on e, occupying an office building complex the school district acquired from a downsiz ed business. Students here are white, working class and poor, and living in a neig hborhood enclave cordoned off by industry and freeways. Teachers here frequently characterize the school's students as a high needs population: "It's hard and with the special ed kids… we need consistency and structure. As soon as the tini est, tiniest thing changes, they're very needy in that sense. That so terrifies me about them going into fourth grade because their independence to be able to, eve n on a very simple task, read the directions and complete it… As long as everythi ng is being modeled step-by-step or very guided or very structured, the y're fine. But as soon as you look for that independence, they struggle." Parents are aware of the characterizations of their children: "Labels [that they give our kids]-you go to the school board meeting and you hear this, go to a PTA meeting, go to a com mittee meeting, and it’s the socio-economic background, it’s the transient popul ations. So, because of this we can't expect a good education for our children?" Th e principal is aware of the strong tendency to view high needs students as somehow les s able than others and feels it is his role to continuously stress that teachers need to learn to work with what the students bring with them, not what they aren't brin ging. Willow Valley teachers don't give up on children, b ut they often express reaching the limits of their capabilities. "We are doing wha t good 4th grade teaches are supposed to do, we're teaching the students the cur riculum. You can't ask us to make up for the fact that this child is deficient i n this skill and has been since kindergarten. There are just, I don't know how to d escribe it, there are just certain things that are beyond the 4th grade classroom teac her's control and yet we are being asked what are we going to do about this chil d? I can't do anything more. I've done everything I can do. You have to pass it off t o somebody else now." But still, teachers worry about what will happen to children, "it still eats us," and repeatedly we saw teachers making school instructive and enjoy able for their students.In the classroomTeaching to the testThe many meanings of 'teaching to the test' and the validity of the test itself conspire to create anxiety about the right thing to do. The basic tenet seems to be: if a test measures what is important then teaching to the test is okay, but if the test is misdirected or poorly constructed or only a part ial picture of what is important then teaching to the test is not okay (Heubert & Ha user, 1999; Smith, 1991). The difficulty for teachers is that they often hold bot h views simultaneously. The 4th grade ELA encourages them to teach more writing tha n they have before and the 4th grade Math Test encourages them to teach more p roblem solving--so teaching to the test (in the sense of taking curricular cues from the test content) is good. But, the reading and writing on the 4th grade ELA is for mulaic and focuses on syntax, and discourages creativity, exploration of language and discussion--so teaching to


9 of 23 the test is bad. Coupled with a context that define s these tests as high stakes tests, especially the ELA, with serious consequences for s chools (threats of state intervention), for teachers (shame and rewards), an d students (possibilities of retention in grade, labeling), and teachers are lef t with little choice. They teach to the test. At Hemlock this is a highly structured, o rchestrated effort while at Willow Valley this is a more haphazard, individual respons e. Content "I'm finding that I used to read stories for enjoym ent. And now when I'm reading a story I'm trying to think, 'Alright, now how am I going to use this?' And I'm trying to get the contrast and compa re. And trying to do author studies. And I almost find that I'm not enjo ying it. I'm enjoying it, but it's not like it used to be when we could read a story put it aside and maybe do a tracing and cutting activity to go with that story. I'm not doing so much cutting anymore. I'm doing a lot more I'm trying to do critical thinking and we're writing in journals. It 's not a fun thing anymore. I'm trying to always get two jobs done as one. How can I use this twice? How can I really push this? [Willow Valley teacher] Teachers value what they perceive to be positive ch anges the tests have instigated in their teaching, resent what they have had to giv e up to make these changes, and sometimes defiantly teach what they think is import ant even though it may not help the students do well on the test.At both Hemlock and Willow Valley schools teachers believe the state mandated tests have changed what they teach, and often for t he better. Teachers believe the ELA "is a good test. It tests listening, reading an d writing at an appropriately high level." The Hemlock reading teacher describes the t est as focusing on "higher order thinking skills, therefore in our program the empha sis has been changed from the lower level thinking skills such as recall and deta il to the higher level skills. That's a benefit. Another benefit is that we focus on writin g much earlier… due to the nature of the test we've gone from filling in the missing word, which is a former emphasis, to understanding main idea, inference, conclusions, predicting and those are all higher level skills. So the result for the students is that they are getting really a much higher level instruction now than they used to ." And a 3rd grade Willow Valley teacher now does, "a lot more note taking, lots of graphic organizers, and I don't think if it wasn't for the test that I would use th em in such detail." In math she teaches the concepts and skills the 4th grade teach ers say the kids need, but she goes on, "I feel like I'm very much rushing to say 'we've covered it and they've at least seen it' but not giving them the practice the y need." Teachers identify positive changes in their curriculum because of the ELA and math tests, but seldom mention the social studies or science tests.Recognizing the ELA test required more of their stu dents than they had expected in the past, the Hemlock teachers used Title I money t o develop a curricular strategy to prepare their students to do as well as possible on the ELA test. "We spent a lot of time analyzing tests. At the same time we were m aking a huge effort to integrate. We were choosing materials and making selections do double duty with science or social studies, working around the themes so there was a whole integrated package." Teachers used trade children's literature magazines ( Ladybug in 3rd


10 of 23 grade and Spider in 4th grade) as the texts and developed multiple choice and short answer questions (like those on the ELA) for each story. As the teachers were developing this trade magazine based curriculum, th e district curriculum committee adopted a basal reading series (Scott-Foresman's Re ading) that Hemlock teachers are required to use. Language arts instruction now consists of the regular classroom teachers teaching the basal reading serie s while the reading teachers travel from classroom to classroom armed with magaz ines and packets of ELA test-like questions providing fast paced, no nonsen se instruction of material that resembles that on the test.Teachers at Hemlock think adoption of the basal rea ders is an insult and a distraction. In conjunction with the text book adop tion (in both language arts and math) are messages from the district office that al l teachers should be on the same page at the same time. "A lot of teacher hours went into the curriculum that they produced and then when we got our new reading serie s it was imposed on us… it is a mandate that you're on a certain page in a certai n week across the district [and this] is unrealistic depending on the kids' abiliti es. So the teachers here just feel like all we're doing is frustrating our children. We are not teaching them the way that we as professionals should be allowed to help all of o ur children learn." The teachers have more confidence in their own trade magazine ba sed curriculum to prepare the students.Even though teachers feel the tests, especially the ELA, has challenged them to teach more and better they stick very closely to th e forms of knowledge on the test. And so there is a question about whether students a re engaged in higher order thinking or merely the appearance of such. The ELA and math tests are scored as a 1 (serious academic deficiencies), 2 (needs extra help), 3 (meets the standards) and 4 (exceeds the standards) and these levels have become an organizing structure for teaching. In fact, some form of this scoring rubric is posted in every classroom in both of these schools. This excerpt fr om a 4th grade classroom observation illustrates how being pushed by the tes t to have higher expectations is simultaneously dulled by the test. This class is reading Velveteen Rabbit. The teacher passes out a worksheet and tells the students she is going to gi ve a response that is a 4, or a 3, or a 2. She directs them to put a 4 on the back of their worksheet and an arrow next to it.T: If I'm going to write an answer that is going to score a 4, what does it need?S: Answer complete.S: Neat.T: I agree, but I wouldn't worry about neatness fir st. S: Topic sentence.T: Yes, you need to have some sort of topic sentenc e. You need to remember to restate the question. What else?S: Details.T: YES, details, details, details. Where do you get the details? S: In the book.T: Ok, it's complete and it has a topic sentence. W hat else will people scoring be looking for?


11 of 23 She reminds them about the 'Daily Language Activity hints she gave them in the morning--punctuation, spelling, capital letters, and correct grammar.T: Leave a space and put a 3. What is going to be t he difference between a 4 and a 3?S: One of those things is not included.T: Everything needs to be there. It will be mostly complete. Will it be perfect?S: No.This lesson continues until they have gone through 4, 3, 2, 1 and then the teacher shares some examples of responses to th e question, "Why does the Velveteen Rabbit feel plain and ordinary?"T writes: "He feels plain." The students give it a 1 because it is too short. The teacher comments that we don't know who 'he' is and comments on the need for more details.T writes: "The Velveteen Rabbit feels plain and ord inary." The students give it a 3. The teacher disagrees and gives it a 2 She says it is missing details from the story--have you proven it from the story? T writes: "The Velveteen Rabbit feels plain and ord inary because all of the toys make fun of him. For example, the expensiv e toys snub him and make him feel commonplace." The teacher tells t hem this response is a 4. One girl copies the answer but pauses to sa y she disagrees, that not all the toys make fun of him because one doesn' t. The teacher agrees and changes the word all to most. In spite of the pressures of the tests, teachers do exercise their professional judgment, almost with an air of defiance, and do wh at they think is right by the children even though it isn't consistent with the d istrict's curricular mandates and may not be directly tied to the test. These acts of defiance are frequently tied to helping children feel successful, encouraging them, giving them an opportunity to have fun. One district uses Everyday Math and this teacher describes "absolutely breaking the rules of Everyday Math ." "All of the [students] failed the multiplication test. They didn't know how to do the partial produc ts algorithm. They felt stupid, they felt incompetent, and they failed it miserably because their brains couldn't process all those steps at one time. So I've gone b ack now, I've spent two class days teaching them, doing a task analysis first, wh ich comes pretty naturally after you've taught the multiplication algorithms. I care fully added each step, if you skip one of those steps kids like this will not be able to make that mental jump, they can't do it, you have to go in a methodical way, th ey have to master each step, and then they feel good about themselves. They were beg ging me for harder problems. They get turned on by that. They love it. Now they' re going to go home, they 're going to do this homework they made up and they are all going to know how to do partial products algorithms, which I guarantee will be on the test." But she adds weakly, "Not partial products, but multiplication problems." These teachers struggle with the fear of falling be hind in a system that frowns on those who do. Instead of comfortably working on wha t they perceive their students need to better understand the material, they push a head until it is obvious that pushing ahead is causing their students to fall fur ther behind. The curricular calendar and the testing schedule do not stop for m ake-up time and so the


12 of 23 pressure is to catch up by covering material superf icially. Textbook AdoptionDistrict textbook adoption occurred in both of thes e districts as a result of the state standards and tests. And textbooks are chosen to ma tch the tests, not a difficult thing to do given that the textbook and test publis hers are often one and the same. While these new textbook adoptions filled a void wh ere there previously had been few resources, they also create chaos and conflict. In the case of Hemlock, the adoption of a basal reader diverted teachers from a curriculum they had created. At Willow Valley, some teachers did find the time to d o "double entry teaching." "What I end up doing is double teaching because I'm teaching the series and I'm also teaching using the strategies and the plans th at I had when I taught novels. I'm basically double dipping for them, but you have to in order for them to get all of the skills. And I can't teach skills in isolation. What good is teaching them the “short a” sound in ten words if they are not going to use it within a story and be able to read it. You look for stories like Little Bear that would have that “short a” sound within it, so now they can apply the skill they learned."The other consequence of district wide textbook ado ptions is a perceived added difficulty in integrating the curriculum. Because t ime is a scarce commodity, and teachers understand the priority of language arts, they would like a curriculum that provides language arts skills through math, science and social studies content. The Hemlock teachers had selected trade magazine storie s with science and social studies content for precisely this reason. They now have textbooks that are a giant step backward in terms of integration. "As happy as I am to have a standardized curriculum across the district, this new reading pr ogram has no fourth grade social studies content and no fourth grade science content None." In many ways, these teachers are faced with a richn ess of resources but lack the time, guidance, and support for creating an integra ted curricular whole out of the textbooks, trade materials, math series, science ki ts, newspapers, test preparation materials. One teacher summed up this frustration, "You have to wonder, do you do the math in the reading series or the reading in th e math series?" PedagogyThe Hemlock Elementary plan to better prepare their students also dealt with how language arts would be taught and incorporated more ELA focused instruction by reading teachers in all 3rd and 4th grade classroom s. The ELA curriculum included blocking off specific times in each week at each gr ade, breaking students into four homogeneous groups and having four teachers working with each group in a different spot in the building. Groups were based o n Terra Nova test scores, teacher judgments of reading ability and students' potential performance on the ELA tests--as solid 3s, 3s but potential 4s, 2s but potential 3s, and 1s and 2s. Teachers are confident that small homogeneous group s working closely with a teacher is the best way to meet the students' indiv idual needs and capitalize on their strengths. "[Teachers] who had the higher gro ups could do a lot more of the advanced higher order thinking skills, whereas my k ids would be doing a lot more of the decoding, word recognition and basic lower leve l comprehension skills."


13 of 23 This plan was thwarted by the superintendent who de creed that children could no longer be pulled out or grouped in preparation for the ELA test, and this decree left teachers feeling betrayed, undermined. The district is attempting to promote inclusion and to disrupt a tracking system that tak es root in the early years of schooling. The district response was totally unexpe cted and seems illogical to the teachers--they are still permitted to group and use pullout strategies in math. The school's ELA scores had gone up dramatically with t he teachers' plan and they have profound confidence in the power of grouping a nd pull out strategies. Expecting recognition, the blow is huge. "Now I wou ldn't dare pull a student out to help them improve. We were told in no uncertain ter ms that we had to follow policy. The removal of the principal [because she permitted teachers to use this strategy] was a message to staff. First, we got the news of h ow well we had done. We were shocked and ecstatic, and then totally demoralized. We were stunned." Whether grouping and pull out programs are a good or bad id ea the dynamics here suggest an undermining of teacher professionalism even thou gh all parties are driven by an effort to help the kids do well on the indicator th at matters most, the ELA test. The Hemlock strategy of dedicating the reading teac her to do the "ELA curriculum" and the classroom teacher to teach the basal reader created additional challenges to teachers sense of being a good teacher. New teac hers are especially frustrated: "We don't decide what is taught during that time. I t's all reading teacher." Teachers' professionalism is compromised in two ways by this test score improvement strategy. First, classroom teachers are left standi ng around watching while reading teachers use direct instruction techniques (which s ome do not agree with) thus wasting valuable resources that could be used to he lp children. Second, this strategy leaves teachers in a bind if the reading t eacher is absent or late. Sometimes they find themselves singing songs or hav ing students read quietly, not wanting to start something new until they know what is going on. And if the reading teacher does not show up they do not have the ELA m aterials and have to substitute other content. On one such occasion the teacher remarked that he had been promising the kids they would do social studie s and the absence of the reading teacher is what made that possible.District textbook adoption, common curricula, stand ardization weigh heavily on teachers, challenging the fundamental notions of in dividualizing education, child centered teaching. Teachers acknowledge they need t o measure students’ reading, comprehension, and so on but feel they are caught o n the horns of a dilemma of standardization and individualization. They are for ced to ignore individual strengths and needs in an attempt to get all children ready t o tackle the same test at the same time. "There are deep contradictions in the me ssages we are getting. Every kid is supposed to have and indeed we are supposed to encourage them to build on their individualized learning styles. The distri ct actively supports individualized educational programs for children and then we are s upposed to cram them through the test using the same approach for all children. Give me a break!" Splitting the CurriculumMcNeil (2000) describes teachers' use of "double en try lessons" that split the curriculum into the real content and the official ( tested) content. Such a strategy would be seen as a luxury by the teachers at Hemloc k and Willow Valley where


14 of 23 time is a scarce commodity and teaching the officia l (tested) content takes all the time there is, and more. The strategy that has evol ved in these schools is a splitting of the curriculum according to the relative importa nce of the test and the time of year the test is administered. Although there are 4 tests given at the elementary level in New York, everyone implicitly understands that the ELA is what matters. Reading and language arts are seen as the basis for all other subjects (and, in fact, a common criticism of all other tests is that they test reading as much as science or math or social studies) and so take precedence. It is the ELA scores that have been used for decisions about remediation, retentio n in grade, teacher quality. Table 3 indicates when each test is administered in 4th grade--ELA in early February, followed by math and then science in the spring. So, in primary grades and especially 4th grade the school curriculum is l anguage arts intensive until February, followed by a couple of months of concent ration on math, and much more limited emphasis on science. And, 5th grade te achers should not expect that students will be prepared during 4th grade for the social studies test which is given in November of the following year for a 4th grade c ohort of students--there simply is no time."We structure our whole day in 4th grade right up t hrough January, our whole day is structured towards the ELA, and then after that, af ter the ELA, there will be a shift in focus and then we will be structuring our entire day to focus on math and science." About 4 hours each day from September to January, the teachers prepare students specifically for the three days of ELA testing, for the moment in time when teaching and learning stop, when Hemlock stands still for the test. And the same rhythm repeats itself at Willow Valley Ele mentary. "So I find that I often put social studies and science on the back burner t o get through the reading and the writing. And I find that I'm spending a good 2 1/2 to 3 hours a day on language arts and I'd rather not. I'd rather be able to teac h every subject every day and that doesn't often happen in my class. I wish it did, bu t it doesn't. Right now we are under the gun, we are under pressure. You hear it f rom the administration, you hear it from colleagues, "Do you think they are rea dy?" and they don't do it to nag you, it’s a concern." Another teacher anthropomorph izes science: "Poor science--it's really been pushed aside. How am I go ing to get [the students] ready for the science test in two weeks?"Two days after the ELA test at Hemlock, the teacher s are smiling; the pace is more relaxed, the discipline looser. In a 4th grade clas s, students are tackling a deductive reasoning problem. They are given clues and use the m to deduce the correct answer. The lesson is interactive. There is talking among the students, and questioning and sharing between students and teache r. The students are engaged and interested. This is a welcome respite before se rious preparation for the state math test begins.These classrooms are unlike our traditional images of elementary school classrooms that focus on language arts, especially reading, in the morning while children are fresh and attentive, and then move to mathematics and finally science and social studies in the afternoon, with special s ubjects interspersed throughout the week. Because of the testing, the curriculum ha s been split across the school year, not across the school day or week. And, altho ugh language arts has always consumed most of the time in elementary classrooms, it is even more so in these schools.


15 of 23 The Test, ItselfDuring testingWhen the tests arrive at schools the tension rises. Teachers must watch their students take these tests and adhere to New York St ate Education Department instructions about test administration. Sorting thr ough how to administer the test, what questions can the teacher answer, how should t he accommodations for special education students be implemented is a danc e the teachers do throughout the testing. And, while teachers are mindful of fol lowing the rules they interpret the directions differently. Some teachers are adamant a bout not answering any questions and watch in silence as some students str uggle, others simply sit, and many work diligently on the test. Others encourage students to ask questions hoping they will be ones teachers can answer: "Toda y when you are doing your questions, get your hand up and ask. Most of the ti me we could answer your question."During the days of a test, teachers do quick checks on student scores, analyze the test questions, check up on students, talk with the m about their perceptions, give them moral support, reprimands, and teach cram sess ions based on the teacher's preview of the test. In one 4th grade class after t he first session of the math test, the teacher asks two boys, "How was it?" The studen ts respond, "easy" "fun" "boring." And then two boys ask the teacher if 50 50 = 250. She has them figure it out and they find the answer is 2500. She shows the m another way to solve the equation. The teacher laughs, grateful that the boy s thought the test was easy, oblivious to the fact "they have no clue." And she goes on, "Are they trying to use something I taught, then that's important to me, no t so much that they got it right." In another classroom just before the second day of the math test, the teacher is more focused. She hands pencils to students that sa y "4th graders are #1" and tells them, "These are special pencils that only work on this portion of the test." But before they begin the test she gives the students a quick refresher on parallel lines, perpendicular lines, trapezoid, parallelogram, hexa gon. And she makes a last minute plea that they remember what they have learn ed about probability and fractions. As the students take a bathroom break, t his teacher looks over the test and her mood sinks noticeably--too many factions, d ecimals, but then a sigh of relief, a graphing problem. "We've done at least 5 of these in our graphing unit." In another 4th grade class after the first day of E LA testing, students color, play board games, and play on the computers while the te achers gather the tests and make charts and record student scores. Teachers com pare notes on how hard they felt the test was and how well their students did. Question by question, teachers analyze the test. One teacher does an item difficul ty analysis. With this information they hope they will be better prepared next year.In another class, after weeks of intense preparatio n for the ELA, a teacher watches silently as her students finish the second day of t he test. Once the test booklets are collected she tells the students to sit down and li sten because she is going to yell at them. And she does. "I know that was a long test. B ut I cannot believe--I was ready to scream when I saw you sitting there staring into space. Don't tell me you couldn't have found one run-on sentence, a spelling mistake, or checking bullets against


16 of 23 your answers to make sure you covered everything. T wo half-hour sessions is not too much to ask of a 4th grader. We've worked all y ear on this. You can put 10 minutes more effort. Please tomorrow, don't just si t there. Find something to fix. I saw someone spell first, f-r-s-t. If I go to read t hem and I find that I was wrong, I'll take it all back. But if I find that I am right, I' ll be even madder than I am now. Tomorrow you have another writing session. Only tom orrow you will use all your time."Teachers know these testing moments cannot judge th e quality of their work, but they find themselves acting as if this were so. And sometimes acting in ways that may not make them proud of themselves as teachers. "I have to come to balance in my own head, about how to keep the kids just as sho rt of being over the line with stress themselves. They are children, they have to play and have fun. They are nine."Scoring the testsSchools in New York are responsible for scoring the ir own state tests. A number of teachers indicated that scoring the tests is a crit ical experience for understanding the content of the test and what constitutes a 4, 3 2, and 1 response. (This experience has been important in the past because a ll elementary and intermediate tests were secure, but beginning with 2002 schools may keep the tests and use them to prepare for the upcoming year's test.) It t akes a small group of teachers a full day's work to score any given test, a hidden c ost of the state's accountability system. New York State Education Department provide s training videos and materials to be used in every scoring situation, an d the scoring session begins with a review of the rubrics then scoring a sample of re sponses. Once they begin scoring teachers discuss disagreements or questions During the math test scoring the questions and conc erns teachers have stem from an interest in being fair to the student. In this s ession the first issue that arises is around responses that give an answer but do not sho w any work. The rubric clearly indicates that students should receive NO points if they answer the question correctly but do not show their work when it is req uired. One teacher sees that the student did the work, but erased it. If you can sti ll see it, does it count as shown work? The teachers agreed that it does. And the dis cussion among the teachers and the facilitators deviates from the rubric and r esolves the meaning of "shown" work. And the resolution favors students on both co unts. T: Answer correct, but no work?F: Give partial credit.T: But what if the work is there, only erased?F: Full credit, as long as you can see it.The next issue to arise is in scoring a graphing pr oblem--students can get a 3, 2, or 1. A teacher asks about the meaning of a 3 score, w hich the rubric says is a complete and correct answer, and a 2 which are give n if some information is missing. The answer that sparks the discussion is a student's graph that is complete but for the exception of one unlabeled axi s. There is a title, one axis is labeled and numbered, the names for each bar are gi ven (e.g. horses) but the axis label (e.g. animals) is missing. "Obviously he know s how to make a graph, why


17 of 23 should he be penalized? Does he have a complete und erstanding of what goes into a bar graph? Yes." Another teacher sympathizes, "We had that problem in the past and we had to give them a 2." But the teacher is no t mollified and his fellow scorer says, "If you feel so strongly about it, do what yo u want to and give it a 3. If you go by what [the State's rubric] says, give it a 2." Th e facilitator intervenes, trying to calm the outraged teacher and eventually he gives t he student a 2 and turns to the next student response to find exactly the same scen ario. But this time the student gets a 2 because they had all the correct labels ev en though the bars in the graph were incorrect. "This child obviously did not under stand the concept of making a graph but because she was able to follow the direct ions and knew enough to label, she gets the same points as the other who obviously understands how to make a graph but forgets one label. That's not right." Muc h like teachers redirect students to focus on preparing for taking the test, this tea cher is redirected to get on with the scoring.F: That's why you can't compare answer to answer. Y ou have to go by the rubric. T: OK, then you can't compare scores. You can compa re scores between schools, yet we can't compare one answer to another? You're telling me that that child has the same comprehension as the other one? Right now I could fight with the state! F: Stay on task, we have only an hour.Another teacher interjects with a new question,T2: If answers are completely wrong, but the proces s is correct? F: It's a partial--1.Scoring the tests leads teachers to question their judgment, the judgment of others and especially the possibility that they may have s cored too harshly. These moments of uncertainty arise especially when scorin g items that require students to show their work or write an explanation. Teachers a gonize over finding something salvageable even in the most incomplete answers. Ag ain, while scoring the math test teachers have to work through what it means fo r a student to show 'at least the beginning of a process.' The New York State Educati on Department help line provides them with no guidance and they conclude:F: If we can defend our score and our interpretatio ns then let's do it. We can give credit for the start of a correct process if it ult imately leads to the correct answer. T: When in doubt err on the side of the student.With this exchange, it became clear how to resolve many uncertainties--when in doubt err on the side of the student. And this is w hat the teachers did and the scenario repeated itself when teachers scored the s cience test although always with much discussion. Interestingly, this is an iss ue that is specifically addressed in an informational Q and A memo from New York State E ducation Department that says: Q: On borderline calls, when deciding between adjac ent score points, should the scorer always give the "benefit of the d oubt" to the student and award the higher score?A: No. Such a practice can result in scoring "drift ." After scoring a number of responses, a scorer may gradually, even u nconsciously,


18 of 23 begin to accept less (or demand more) than is appro priate in awarding a particular score point. Scoring "drift" can create an unfair situation where a student response could receive a different score from the same scorer depending on when the response was scored. T o prevent "drift" and maintain the consistency and accuracy of all sc ores, it is helpful to refer occasionally to the student responses used in the training materials as examples of the various score points. These responses are often called "anchor papers" because they help to f ix the acceptable range within a score point and prevent the scorer f rom "drifting" higher or lower in their expectations for awarding a score point. Scorers should also be encouraged to consult their Table Facilitat ors and Scoring Leaders with responses that seem on the line betwee n two score points. Even at this last moment, when teachers can help st udents be as successful as they possibly can be on the state tests, they do so They follow the rubric as well as they can because they believe a great deal of effor t has gone into creating them, but they are willing to "give the student the benef it of the doubt."ConclusionsThe teachers of Hemlock and Willow Valley are force d into untenable situations fraught with dilemmas that are difficult to resolve and maintain teacher professionalism and help all children to succeed to the best of their ability. Repeatedly we saw teachers put in lose-lose situati ons. They act in ways that are inconsistent with what they believe to be best teac hing practice in order to increase the likelihood that students will succeed as measur ed by the state tests, which at least for many teachers is a poor indicator of the achievement and success of children. Teachers must often do the wrong thing in order to do the right thing, sort of.It is essentially a utilitarian ethic that underlie s test driven curricular reform, one based on means--ends arguments (Mathison, 1991). Th e New York State Education Department adopts the view that the ends justify the means, and teachers too are drawn into this logic. The means a re approaches to teaching and content that teachers might not chose--that do not represent good professional practice and, the state’s desired ends (high test s cores) are a poor but powerful proxy for the teachers’ desired ends (the contextua lly appropriate success of every child).The experiences of these two schools tell us a grea t deal about the impact of state mandated, high-stakes testing and this paper has sp ecifically focused on how these tests challenge teachers' professionalism, especial ly with regard to how they treat children. Of course, this is an interesting argumen t only if these things matter. These teachers wonder if policy makers and politici ans have any sense of children's individual differences and the centralit y of that concept to teaching and learning. Current state standards based reform and assessment policies and practices would suggest that policy makers and prac titioners either have no sense of this, or maybe they don't care, or maybe they ar e trying to redefine these ideas. Through the currently proffered solutions to proble ms of education, policymakers/politicians/corporate CEOs eschew what teachers know about human learning and cognition, and much of what teachers k now is helpful and harmful to


19 of 23 children's achievement.Are policy makers and politicians unaware that outc ome based bureaucratic accountability driven by state mandated tests will reduce teacher professionalism and autonomy? That some research (see O'Day, 2002) suggests lower performing schools will actually lose ground? And that these a ccountability strategies do relatively little to alter the fundamental injustic es in schools and society, such as racism and classism? We don't know for sure, but we think probably not. There is a fundamental disagreement about what kind of work te achers and students should be doing in schools--work that requires real critic al thinking that may contribute to the evolution of a just and equitable society or wo rk that has the appearance of critical thinking and will contribute to oppression (These are not simple political disagreements; they are disagreements connected wit h power and money. For a more detailed discussion of this argument see, Math ison, Vinson & Ross, 2001; Vinson, 1999.) "By insisting that legitimate learning necessarily presents itself in and on the basis of test scores, such testing refuses to a dmit and accept differences (individual as well as cultural) in kno wledges, values, experiences, learning styles, economic resources, a nd access to those dominant academic artifacts that ultimately contrib ute to both the appearance of achievement and the status of cultura l hegemony upon which standards-based reforms depend. In effect, s tandardized testing encourages a singular and homogeneous public school ing—one antithetical to such contemporary ideals as diversi ty, multiculturalism, difference, and liberation—vis--vis an underlying and insidious mechanism or technology of oppression, one in which the interests of society’s most powerful (the minority) are privileg ed at the expense of those of the less powerful (the majority)" (Vinson, Gibson & Ross, 2001). The teachers at Hemlock and Willow Elementary Schoo ls are not radicals. They do not seek complete autonomy, they do not challenge t he need for accountability (even bureaucratic accountability), they find some virtue in state mandated tests, they are content within centralized systems that pr oscribe many aspects of their work. But, they also perceive themselves as profess ionals with both the responsibility and capability of doing their jobs w ell and in the best interests of their students. New York State's outcomes based bureaucra tic accountability tests their resolve, makes them angry, and requires unnecessary compromises in their work. These teachers are more angry or frustrated than be tter, and with little indication that student achievement is advancing in genuine wa ys or that schools are being reformed.ReferencesDarling-Hammond, L. (1990). Teacher professionalism : Why and how? In A. Lieberman (Ed.), Schools as collaborative cultures: Creating the future now (pp.25-50). Bristol, PA: Falmer Press. Elmore, R. F. (1996). Getting to scale with success ful educational practices. In S. Furhman & J. A. O' Day (Eds.), Rewards and reform: Creating educational incentives that work (pp.294-329). San Francisco: Jossey-Bass. Heubert, J. P. & Hauser, R. M. (1999). High stakes: Testing for tracking, promotion, and g raduation


20 of 23 Washington, D.C.: National Academy Press. Hillocks, Jr., G. (2002). The testing trap: How state writing assessments con trol learning New York: Teachers College Press. Hoffman, J. V., Assaf, L. C. & Paris, S. G. (2001). High stakes testing in reading: Today in Texas, tomorrow? Reading Teacher, 54 (5), 482-92. Kannapel, P. J., Coe, P., Aagard, L., Moore, B. D. & Reeves, C. A. (2000). Teacher responses to reward s and sanctions: Effects of and reactions to Kentucky 's high-stakes accountability program. In B. Whitford & K. Jones (Eds.), Accountability, assessment, and teacher commitment: Lessons from Kentucky's reform efforts Albany, NY: SUNY Press. Katz, M. B. (1971). Class, bureaucracy, and schools: The illusion of ed ucational change in America New York: Praeger. Little, J. W. (1990). The persistence of privacy: A utonomy and initiative in teachers' professional relations. Teachers College Record, 91 509-536. Madaus, G. (1998). The distortion of teaching and t esting: High-stakes testing and instruction, Peabody Journal of Education, 65 29-46. Mathison, S. (1991). Implementing curricular change through state-mandated testing: Ethical issues. Journal of Curriculum and Supervision, 6 201-212. Mathison, S., Ross, E. W. & Vinson, K. D. (2001). D efining the social studies curriculum: The influenc e of and resistance to curriculum standards and testing in social studies. In E. W. Ross (Ed.), The social studies curriculum: Purposes, problems, and possibi lities Albany, NY: SUNY Press. McLaughlin, M. W. & Talbert, J. E. (2001). Professional communities and the work of high schoo l teaching Chicago:University of Chicago Press. McNeil, L. M. (2000). Contradictions of school reform: Educational costs of standardized testing. New York: Routledge. Noble, A. J., & Smith, M. L. (1994). Old and new be liefs about measurement-driven reform: "Build it an d they will come." Educational Policy, 8 (2), 111-136. O'Day, J. A. (2002). Complexity, accountability, an d school improvement. Harvard Educational Review, 72 (3). Ross, E. W. (2001). Diverting democracy: The curric ulum standards movement and social studies education. In D. W. Hursh & E. W. Ross (Eds.), Democratic social education: Social studies for social change. New York: Falmer Press. Shulman, L. (1987). Knowledge and teaching: Foundat ions of the new reform. Harvard Educational Review, 15 (2), 4-14. Strike, K. A. (1993). Professionalism, democracy, a nd discursive communities: Normative reflections on restructuring. American Educational Research Journal, 30 (2), 255-275. Thornton, S. (1991). Teacher as curricular-instruct ional gatekeeper in social studies. In R. Shavelson (Ed.), Handbook of research on social studies teaching and learning New York: Macmillan. Smith, M. L. (1991). Meanings of test preparation. American Educational Research Journal, 28 (3), 521-42. Vinson, K. D. (1999). National curriculum standards and social studies education: Dewey, Freire, Foucault, and the construction of a radical critiqu e. Theory and Research in Social Education, 27 (3), 296-328. Vinson, K. D., Gibson, R., & Ross, E. W. (2001). Hi gh-stakes testing and standardization: The threat t o authenticity. Monographs of the John Dewey Project on Progressive Education, 3 (2).About the Authors


Sandra MathisonCollege of Education and Human DevelopmentUniversity of LouisvilleLouisville KY 40292 Sandra Mathison is Professor of Education at the Un iversity of Louisville. She teaches evaluation and qualitative research methods and her research focuses on democratic and fair evaluation practices in schools Melissa FreemanSchool of EducationUniversity at Albany, SUNY1400 Washington AvenueAlbany NY 12222 Melissa Freeman is project manager of an interpreti ve study of the impact of high stakes testing in upstate New York. Her interests i nclude theoretical and methodological issues in interpretive research and evaluation, democratic practices in schools, and critical social theories.AcknowledgmentThis publication is based on research supported by the National Science Foundation (Grant # ESI-9911868). The findings and opinions expressed herein do not necessarily reflect the position or priorities of the sponsoring agency.Notes 1. While there is ample debate about whether teaching is a profession or not, and whether it ought to be considered a profession (see Strike, 1993) there are strong arguments for labeling teaching a profes sion (Darling-Hammond, 1990; Little, 1990; McLaughlin & Talbert, 2001). We adopt the view that teaching is a profession because it requires specia lized knowledge and skills, especially as manifest in Shulman's notion of peda gogical content knowledge (1987) and contemporary theories of child developm ent. In addition, teachers just as all other professionals are concerned simul taneously with both means and ends. 2. We wish to thank Kate Abbott and Kristen Campbell-W ilcox, our research collaborators on this project.


