xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c19989999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00105
Educational policy analysis archives.
n Vol. 6, no. 14 (July 21, 1998).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c July 21, 1998
Some comments on assessment in U.S. education / Robert Stake.
Arizona State University.
University of South Florida.
t Education Policy Analysis Archives (EPAA)
xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 6issue 14series Year mods:caption 19981998Month July7Day 2121mods:originInfo mods:dateIssued iso8601 1998-07-21
1 of 8 Education Policy Analysis Archives Volume 6 Number 14July 21, 1998ISSN 1068-2341 A peer-reviewed scholarly electronic journal. Editor: Gene V Glass Glass@ASU.EDU. College of Education Arizona State University,Tempe AZ 85287-2411 Copyright 1998, the EDUCATION POLICY ANALYSIS ARCHIVES.Permission is hereby granted to copy any a rticle provided that EDUCATION POLICY ANALYSIS ARCHIVES is credited and copies are not sold. Some Comments on Assessment in U.S. Education Robert Stake University of IllinoisAbstract We do not know much about what assessment has acco mplished but we know it has not brought about the reform of American Edu cation. The costs and benefits of large scale mandated achievement testing are too co mplex to be persuasively reported. Therefore, educational policy needs to be based mor e on deliberated interpretations of assessment, experience, and ideology. Evaluation of assessment consequences, however inconclusive, has an important role to play in the deliberations. During the last half of the Twentieth Cen tury in America, the traditional quality control of schooling, i.e., informal management (by teachers as well as administrators) board oversight, parent complaint, state guideline and regional accreditation, have continued to be prominent in school operations. But because the perceived quality of public education has fallen off, other means have b een added to evaluate and to improve teaching and learning. For thirty years, assessment has been a significant means of quality control and instrument of educational refor m. Earlier, in the CenturyÂ’s third quarter, the impetus for changing American schooling was the appearance of Sputnik. It was rea soned that American schools were unsuccessful if the Soviets could be first to launc h spacecraft. College professors and the National Science Foundation stepped forward to rede fine mathematics education and the rest of the curriculum, creating a "new math," inqu iry teaching, and many courses strange to the taste of most teachers and parents. According to Gallup polls year after year, citizens expressed confidence in the local sc hool but increasingly worried about the national system. In the 1960s, curriculum redevelop ment was the main instrument of
2 of 8reform but, in the 1970s, state-level politicians, reading the public as unhappy both with tradition and federalized reform, created a reform of their own. Their reform spotlighted assessment of student performance. The term "assessment" then became taken t o mean the testing of student achievement with standardized instruments. Student performance goals were made more explicit so that testing could be more precisely fo cused, and efforts were made to align curricula with the testing. Schooling includes many performances, provisions, and relationships which could be assessed but attention came down predominantly on the students: "If they havenÂ’t learned, they havenÂ’t be en taught." Now for at least two decades, in almost e very school, at every grade level and in each of the subject matters, student achievement ha s been assessed. And every year, it has been found largely unchanged from previous test ing. Over the same periods, teaching, on the whole, appears to have been little changed, certainly not restructured. Explication of goals appears not to have set more a chievable targets. The last decade has seen efforts to set standards particularly for leve ls of student performance needed to restore American Education to a leading, world posi tion. From time to time, gains occurred, but small and not sustained--losses also occurred. Instead of reading this lack of sustained progress as pointing to need for a dif ferent grand strategy, the clearest summons has been for additional assessment.Purposes and Expectations of Assessment Goal statements are simplifications. The felt purposes of education, aggregated across the profession, across researchers, the publ ic and the primary beneficiaries, are far more complex than those represented in goal stateme nts and formal assessments. Facts, theories, and reasoning are needed not just in isol ation but interactively, innovatively, in a range of contexts. We hold a vast inventory of ex pectations, beyond catalogue, partly ineffable, often only apparent in disappointments a s students fall short. That immense inventory is approximated by the informal assessmen ts by teachers much better than by explicated lists of goals. The grand manifold of purposes of Educati on held by any one person at any one time also is complex, and situational and internall y contradictory. People, even those specially trained, are not very good at speaking of "what all they expect" of an educated person. Again, the complexity shows most forcefully when the person does not perform well. Any one shortfall tells little about the arra y of purposes. Any one assessment, however precise and valid, does not sample well the manifold of purposes. Broad and attentive use of assessments, formal and informal, evokes realization that what we expect of students and the uses to be made of a gra duateÂ’s education extend far beyond formal goals, standards and lesson plans. Formal re presentations of aim and accomplishment provide flimsy accounts of the real thing. This is not to suggest it useless to reco rd educational purposes and student performance. It is useful to categorize them, to il lustrate and prioritize them, sometimes by abilities and subject matters--but always a risk The subsets or domains are artificial. Needed in the anticipation and provision of Educati on, they often serve poorly to represent the education a student is attaining. Ass essment based strongly on goals or domains is likely to tell more about the territory of teaching than the territory of learning. Procedurally, Education is organized at t he level of courses and classrooms, then lessons and assessments. Actually, education occurs in complex and differentiated ways
3 of 8in each childÂ’s mind. Assessments tuned to manageme nt levels cannot be expected to mirror the complexity of learning and diversity of learners. However carefully named and designed, mean scores do not necessarily indica te basic accomplishments for a group of learners. Each testing needs empirical val idation.Validation of Assessment Standardized test development is one of t he most technically sophisticated specialties within Education. Definitions and analy tic procedures, at least at the major testing companies are scrutinized, verified, codifi ed and reworked. The traditional ethics of psychometrics call for extensive construct valid ation of the measurements to be used in schooling. And it is not enough that the instrum ents and operations be examined for accuracy, relevance and freedom from bias, but that independent measurements be used to confirm that scores indicate what we think they indicate. Sound test development is a slow and expensive procedure. In the development of assessment instrume nts by the 50 states, adequate validation has seldom taken place. Instruments have been analyzed statistically to see that they are internally consistent but not that me an what users think they mean. Presumption that assessments indicate quality of te aching, appropriateness of curricula, and progress of the reform movement-commonplace p resumptions in political and media dialogue--is unwarranted. Proper validation w ould tell us the strength or weakness of our conclusions about student accomplis hment. Those studies have not been commissioned. The most needed validation of st atewide assessment programs has not taken place. The question of whether or not the assess ment legislation, as opposed to the assessment scores, is having a good effect on stude nt education is a separate question. Assessment changes instruction. Reformists expect a ssessment will force teachers to teach differently, and, in various ways and to vari ous extents, they do. Each assessment effort will have both positive and negative consequ ences. The design and promulgation of an assessment program is only an approximation o f what actually occurs. The operation described in any report is a partial misr epresentation of institutional initiative and measurement integrity. For a reader, it is an o pportunity to misperceive what is happening in the schools and the lives of youngster s. We need better descriptions, better evidence, of those consequences of assessment. And partly because we construct nuances of meaning faster than we invent measuremen ts, we need to understand that we will never have a clear enough picture of the conse quences of assessment. All findings should be treated as partial and tentative.Value Determination Not only has there been an increase in th e amount of formal educational assessment but assessment has been applied increasi ngly to influence the well-being of students, schools and systems. The "stakes" have ri sen. Funding, autonomy and privilege have been attached to levels of scoring. The intent ion has been to get students and teachers dedicated to their tasks, and this sometim es happens, but there have been costs as well as benefits. Among the reported negative co nsequences of raising the stakes of assessment are:
4 of 8instruction is diverted, student self-esteem is eroded, teachers are intimidated, the locus of control of education is more centraliz ed, undue stigma is affixed to the school, school people are lured towards falsification of sc ores, some blame for poor instruction is redirected towar d students when it should rest with the profession and the authorities, and the withholding of needed funding for education app ears warranted. The most obvious consequence of increased assessment is that teachers increase preparation for test taking, including testtaking skills and greater familiarization with the anticipated content of testing. Also, topics te sted are considered of higher priority and topics untested slip in priority. Assessments a re not diagnostic. There is little strategic theory fitting pedagogy to assessment so that few teachers know how to respond to poor student performance, other than to try harder. Thus, over-emphasis on assessment erodes confidence in legitimate teaching competence. As the stakes rise, the central authoriti es are both pressured and authorized to intervene more in teaching responsibilities. A wide spread public perception of legislators and school authorities is that they are not knowledgeable or competent in matters of the classroom. With ever-confirming evid ence that students continue to be testing poorly, the public is tempted to withhold f unds for needed improvement in instruction. There is good evidence that increased funding alone will not greatly change the quality of teaching. But at the same time, by i nvesting in the assessment of students without investing in more direct evaluation of teac her and administrative performance, the professional people and the elected overseers a re partly "off the hook." In summary, the consequences of assessment are complex, extendi ng far beyond the redirecting of instruction toward state goals. It is too much to expect that we soon wil l clearly discern the consequences of assessment and, even less soon, what caused them. B oth the consequences and the causes are complex, both as to constituents and as to conditions. Lacking an adequate research base, curricular policy needs to be based on deliberations, long and studied interpretation of assessment, experience, and ideol ogy. That is unlikely when professional wisdom is getting little respect. Ofte n the public presumes that educators put their own interests above those of students. Bu t good deliberations are not uncommon. Evaluation of the consequences of assessm ent has an important role informing those deliberations. Even if we were able to improve determina tion of the consequences of assessment, we lack theory and management systems t hat guide us in applying that information to the improvement of teaching and lear ning. We need not wait for politics or the professional to be reformed. We can rely on the political, intuitive, and leadership processes we now have to make assessment more a pos itive and less a negative force within education. As indicated before, people do have diffe rent purposes for education and for assessment. And for any one purpose, they value the results differently. That is just part of the reality, neither excusing nor facilitating t he assessment of assessment. The assessment practice that does the mos t measurable, immediate good is not necessarily the practice that has the best long ran ge effect. For example, using testing
5 of 8time entirely for easily measured skills instead of partly for "ill-defined" interpretive experience increases precision and predictive valid ity but discourages well-thought-out advocacies to include problem-solving experience th roughout elementary school. Value trade-offs need to be considered for long-term as w ell as shortterm effects.Curriculum and Instruction Management of teaching and the curriculum cannot be effective without assessment. The best and the worst assessment we ha ve is informal and teacher-driven, sometimes capricious and sometimes more aimed at av oiding embarrassment than maximizing services to children. Yet, it works pret ty well, sensitive to what individual children are doing, viewed favorably by a substanti al proportion of parents and citizens, especially those people who interact themselves, ev en in small ways, with the academic program. Still, instructional assessment could be m uch, much better, and too little professional development is so aimed. The present i nformal assessment system is little engaged with the formal management information syst em of school districts and even less with the stateÂ’s student achievement testing a pparatus. The most successful school improvement ef forts have been those that decentralize and protect authority so that a match can be made b etween what the teachers want to teach and the parents and immediate community want taught. The present decadeÂ’s "standards movement" was a step in the wrong direct ion, a further imposition of external values. Assessment was used to nullify decentraliza tion efforts. The state does have a stake in what every child is learning but the state is poorly served by having each child trying to learn the same things. Accountability of the schools is in no way dependent on having each child tied to a core curriculum and tes ted on the same items. A single test for all is cheaper, but not a service to a diverse population of children. State assessment is not wrong in its most general finding that teaching and learning in the American schools are mediocre. And that the range across districts is huge. The spread of achievement scores is stable an d predictable, more a function of a childÂ’s lifetime educational opportunity than of wh at happens during a year in a classroom. Neither massive changes at home or in th e classroom are likely to result in substantial gains on current assessment instruments As stated earlier, the validity of measur ement of achievement is not the same as validity of those same scores as an indicator of qu ality of teaching and learning conditions. Teaching can be changed in a number of important ways within a school or classroom without change in achievement means. Usin g those scores as a measure of school improvement has not been validated. No accum ulation of evidence shows assessment to be an indicator of good schooling. In spite of the absence of validity, assessment means continue to be the primary criteri on for reform in a vast number of school districts. Given vigorous school improvement efforts over 20-30 years within countless districts, essentially all of them unacco mpanied by substantial change in assessment results, what should be concluded is tha t testing is insensitive to important changes in teaching or that schools cannot be impro ved. The latter is untenable.Uses and Stakes The uses to which assessment information will be put varies not just across assessment approaches but greatly within approaches as well. Different school systems,
6 of 8teachers, and children, even those greatly alike, w ill be affected differently. It is not reasonable to suppose that the stakes of assessment are unimportant if they have little impact upon the majority. Special attention needs t o be given to how assessment consequences affect the least privileged families a nd most vulnerable children. One of the primary stakes of testing is t he well-being of teachers. Teachers have much to lose in a high stakes assessment system. As sessment should not be avoided just because teachers protest but their working conditio ns and professional wisdom should not be trivialized. Teaching quality should be scru tinized. Student performance should be considered but it should not be a primary determ inant of teaching competence. There is only a small connection between how well a teach er teaches and how well a child performs on a test. One of the consequences of high stakes te sting is the manipulation of rosters to excuse poor scoring children from participation. Th e most common way at present appears to be to have children classified as "speci al education" students, but a good bit of ingenuity has been shown in optimizing rosters. High stakes assessment often does result in raised scores but the validity of widespread gains, locally or across the country, ha s not been established. No one wants to challenge the gains that appear, but presently e mphasis on small changes serves to orient the school to the assessments rather than to education. Many of the consequences of assessment are best learned from the people who administer the tests, even though they have a selfinterest. Many are quick to ackno wledge that the assessment enterprise is flawed. Good research can help but it is mostly a professional and political matter. Until community attitude sets out to make the best of the schools, less to blame them, (however much they deserve the blame), not much goo d will happen. This is not a nation dedicated to the best possible education system. Th ere are lots of people who would rather have lower taxes than to extend educational benefits. Higher taxes do not assure better opportunities but an interest in finding bet ter opportunities is not a national purpose. Looking at it simplistically, support for assessments appears to be a step toward improving education, but the quarter-century record shows that assessment-driven reform has not worked. Why does it continue to be p olitically popular? The main consequence of assessment-based reform is that educ ation has not substantially improved. We do not lack evidence of that.About the AuthorRobert E. StakeUniversity of Illinois--Urbana, Champaign Email: email@example.com Robert Stake is professor of education an d director of CIRCE at the University of Illinois. Since 1963 he has been a specialist in th e evaluation of educational programs, moving from psychometric to qualitative inquiries. Among the evaluative studies he has directed are works in science and mathematics in el ementary and secondary schools, model programs and conventional teaching of the art s in schools, development of teaching with sensitivity to gender equity; educati on of teachers for the deaf and for
7 of 8 youth in transition from school to work settings, e nvironmental education and special education programs for gifted students, and the ref orm of urban education. Stake has authored Quieting Reform a book on Charles Murray's evaluation of Cities-i n -Schools; two books on methodology, Evaluating the Arts in Education and The Art of Case Study Research ; and Custom and Cherishing a book with Liora Bresler and Linda Mabry on teaching the arts in ordinary elementary school cla ssrooms in America. Recently he led a multi-year evaluation study of the Chicago Teachers Academy for Mathematics and Science. For his evaluation work, in 1988, he recei ved the Lazarsfeld Award from the American Evaluation Association, and, in 1994, an h onorary doctorate from the University of Uppsala.Copyright 1998 by the Education Policy Analysis ArchivesThe World Wide Web address for the Education Policy Analysis Archives is http://olam.ed.asu.edu/epaa General questions about appropriateness of topics o r particular articles may be addressed to the Editor, Gene V Glass, firstname.lastname@example.org or reach him at College of Education, Arizona State University, Tempe, AZ 85287-2411. (602-965-26 92). The Book Review Editor is Walter E. Shepherd: email@example.com The Commentary Editor is Casey D. Cobb: firstname.lastname@example.org .EPAA Editorial Board Michael W. Apple University of Wisconsin Greg Camilli Rutgers University John Covaleskie Northern Michigan University Andrew Coulson email@example.com Alan Davis University of Colorado, Denver Sherman Dorn University of South Florida Mark E. Fetler California Commission on Teacher Credentialing Richard Garlikov firstname.lastname@example.org Thomas F. Green Syracuse University Alison I. Griffith York University Arlen Gullickson Western Michigan University Ernest R. House University of Colorado Aimee Howley Marshall University Craig B. Howley Appalachia Educational Laboratory William Hunter University of Calgary Richard M. Jaeger University of North Carolina--Greensboro Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Rocky Mountain College Dewayne Matthews Western Interstate Commission for Higher Education
8 of 8 William McInerney Purdue University Mary P. McKeown Arizona Board of Regents Les McLean University of Toronto Susan Bobbitt Nolen University of Washington Anne L. Pemberton email@example.com Hugh G. Petrie SUNY Buffalo Richard C. Richardson Arizona State University Anthony G. Rud Jr. Purdue University Dennis Sayers Ann Leavenworth Centerfor Accelerated Learning Jay D. Scribner University of Texas at Austin Michael Scriven firstname.lastname@example.org Robert E. Stake University of Illinois--UC Robert Stonehill U.S. Department of Education Robert T. Stout Arizona State University