xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c19989999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00092
Educational policy analysis archives.
n Vol. 6, no. 1 (January 02, 1998).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c January 02, 1998
Includes EPAA commentary.
Political legacy of school accountability systems / Sherman Dorn.
Arizona State University.
University of South Florida.
t Education Policy Analysis Archives (EPAA)
xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 6issue 1series Year mods:caption 19981998Month January1Day 22mods:originInfo mods:dateIssued iso8601 1998-01-02
1 of 33 Education Policy Analysis Archives Volume 6 Number 1January 2, 1998ISSN 1068-2341A peer-reviewed scholarly electronic journal. Edit or: Gene V Glass Glass@ASU.EDU. College of Education Arizo na State University,Tempe AZ 85287-2411 Copyright 1998, the EDUCATION POLICY ANALYSIS ARCHIVES.Permission is hereby granted to copy any article provided that ED UCATION POLICY ANALYSIS ARCHIVES is credited and copies are not sold. The Political Legacy of School Accountability Syste ms Sherman Dorn University of South Florida Abstract The recent battle reported from Washington abo ut proposed national testing program does not tell the most important political story ab out high stakes tests. Politically popular school accountability systems in many states alread y revolve around statistical results of testing with high-stakes environments. The future o f high stakes tests thus does not depend on what happens on Capitol Hill. Rather, the existe nce of tests depends largely on the political culture of published test results. Most c ritics of high-stakes testing do not talk about that culture, however. They typically focus on the practice legacy of testing, the ways in which testing creates perverse incentives against g ood teaching. More important may be the political legacy or how testing defines legitimate discussion abou t school politics. The consequence of statistical accountability systems w ill be the narrowing of purpose for schools, impatience with reform, and the continuing erosion of political support for publicly funded schools. Dissent from the high-stakes accoun tability regime that has developed around standardized testing, including proposals fo r professionalism and performance assessment, commonly fails to consider these politi cal legacies. Alternatives to standardized testing which do not also connect schooling with th e public at large will not be politically viable. Introduction The short-term question about high-stakes test ing is not whether it shall prevail but who shall control it. The president of the United State s advocates the use of standardized testing developed by the federal government. ( Note 1. Opens in separate browser window.) Conservatives who vigorously oppose nationalized cu rriculum and testing agree that testing should exist, but organized on a state and local le vel instead (see Diegmueller and Lawton 1996; Lawton 1997). The recent compromise between R ep. William Goodling and the
2 of 33White House left the long-term fate of a truly nati onal testing program unresolved (Hoff 1997). Nonetheless, what is not at stake is the exi stence of high-stakes testing. Recent polling suggests that the idea of national testing is very popular (Rose, Gallup and Elam 1997), and that popularity reflects the past twenty years' growth of standardized testing. The debate over the control of testing takes for grante d the existence of standardized testing because of its recent history. States for many year s have been accumulating testing requirements which their legislatures, state offici als, or local administrators have chosen. Despite considerable evidence that high-stakes test ing distorts teaching and does not give very stable information about school performance, t est results have become the dominant way states, politicians, and newspapers describe th e performance of schools. Some have continued to note the problems of high-stakes stand ardized testing (e.g., Madaus 1991; McGill-Franzen and Allington 1993; Neill 1996; Nobl e and Smith 1994; Shepard 1991; Smith 1991; Smith and Rottenberg 1991; Wirth 1992: Chap. 7). Others try to accommodate some measure of standardized testing while building what they see as safeguards against obvious abuses. Still others (administrators in sys tems or schools with above-average test scores) use results as part of a marketing or publi c relations strategy. Few critics of high-stakes testing, however, have explicitly noted the way in which the public use of accountability systems shapes the politics of educa tion writ large. Statistical accountability systems are importa nt because numbers have visible power in public debate. Anyone who listens to or reads polit icians, journalists, and social critics will hear statistical references. Slowly over the last c entury, statistics have taken a prominent place in political culture. Whether the statistic i s the official unemployment rate, poverty rates, poll results, or SAT scores, a specific numb er fills a niche in discussion. As Carol Weiss (1988: 168) wrote, The media report the proportion of the population t hat has been out of work for fifteen weeks or more, characteristics of high scho ols which have the highest drop-out rates, reasons given by voters for choosin g candidates. These kinds of data become accessible and help to inform policy de bates. A number connotes objectivity or, at the very least legitimacy. Because we perceive numbers and statistics as having a certain force on its face (just by being quantitative), we allow statistics to shape our perception of the wor ld and the issues we perceive as important. They present selective information and thus center discussion around specific topics (silencing others). Nonetheless, we often yearn for the end of political uncertainty through statistics. Partisans in a conflict may heatedly ar gue that their methods are better, or their opponents' use of statistics is politically motivat ed, yet behind the veneer of cynicism lurks a desire for unquestionable statistics that will end debate. Maybe the official poverty line is arbitrary, but others have calculated alternative p overty estimates (Axinn and Stern 1988: 73-77; Ruggles 1990). The portrayal of a "rising ti de of mediocrity" in schools was an alleged lie, but then the critics presented their o wn statistics as counter-evidence (Berliner and Biddle 1995; Bracey 1991, 1992, 1993, 1994, 199 5a, 1996, 1997; National Commission on Excellence in Education 1983). The production and presentation of statistics is part of the fabric of public debate, and public policy that involves the heavy use of statis tics must consider the long-term consequences of that use. At least two such consequ ences are important, what I will call the practice and political legacies of statistics. The distinction between th e two revolves around related but heuristically distinct issues:
3 of 33How do policies based on statistics shape practice? How do policies based on statistics shape future pu blic policy debate? The practice legacy of statistics is the nuts and bolts of how statisti cs shape government and private action. For example, the official U.S. cons umer price index determines cost-of-living indices for Social Security, government pay schedul es, and the behavior of many private organizations. Census population counts determine s tate representation in the U.S. House of Representatives and some federal spending patterns. This practice legacy can, by itself, engender vivid disagreement about statistical mecha nisms. In 1997, several so-called deficit hawks suggested changing the calculation of the con sumer price index to lower cost-of-living indices deliberately. While they cla imed that the official inflation statistics misrepresented the "true" amount of inflation, repo rters and groups such as the United Auto Workers clearly understood that the argument was no t about the most accurate picture of inflation but was, in large part, about the practic e legacy of inflation statistics for the U.S. federal budget, entitlement programs, and private c ompany wages and benefits (e.g., "Will Washington Cut Our COLA?" 1997). Similarly, debate about the conduct of the decennial U.S. census in the past ten years has revolved not around accuracy but policy consequences. If, as some have proposed, the Bureau of the Census augments its population count with samples to measure undercounting and adjusts the of ficial counts with the help of samples, the distribution of federal aid to cities and state s as well as Congressional representation will change according to adjustment for undercounting. P oliticians in jurisdictions with alleged undercounting have an interest in supporting such a djustment based on sampling because adjusted population counts would give their constit uencies higher federal aid. Other politicians have an equally intense incentive in op posing the use of sampling to prevent the loss of federal aid (Mears 1997; Roush 1996). The p ractice legacy of statistics is an obvious consequence of tying statistics to public policy. T he examples above show specific practice legacies, when statistics are mechanisms of what Pa ul Starr (1987: 55-57) calls "automatic pilots." They may be less obvious in the creation o f systems of incentives, as some argue that high-stakes testing environments create. Wheth er the result is from explicit formulae or a consequence of incentives, a practice legacy is t he influence of policy on short-term behavior. What is less clear, but equally important, is the political legacy of statistics, the way that the use of statistics by itself shapes public debat e. ( Note 2 Uses second browser window.) Discussion about teenage pregnancy is a good exampl e of how the existence and distribution of statistics shapes debate. In the late 1960s and early 1970s, as teenage birth rates were decreasing, the Alan Guttmacher Institute and other s began publicizing estimates of teen fertility statistics to illustrate what they termed an epidemic of teenage pregnancy. The social construction of teen pregnancy as a growing problem contributed to political support for policies such as family planning and has been criti cal in debates over the consequences of family planning policies, even when the statistics were questionable (Vinovskis 1988). Feminism also contributed to changing attitudes tow ards family planning policies, but the paradox for social scientists is that demographic t rends did not affect perceptions of the levels of teen pregnancy. Academic researchers on t een pregnancy have recognized the incongruity that the definition of teen pregnancy a s a social problem coincided with a decrease in birth rates (e.g., Furstenberg 1991). S till, gross numbers (for example, total births to teen mothers) created the popular percept ion of a crisis. Statistics help define perceptions of social realities and possibilities. Starr (1987: 54) has noted, An average is not just a number; it often becomes a standard. . Many
4 of 33regularly reported social and economic indicators h ave instantly recognizable normative content. The numbers do not provide stric tly factual information. Since the frameworks of normative judgment are so w idely shared, the numbers are tantamount to a verdict. The existence and frequent public reporting of teen pregnancy statistics by themselves created public debate that led to policies attempti ng to limit teen pregnancies. Much other public reporting of statistics likewise shapes publ ic debate: Newspapers and broadcast news regularly report unemployment and inflation figures crime rates, and school test scores. The distinction between practice and political legacies of statistics is useful in explaining why accountability practices are so popular and wha t the potential consequences of the most commonly-discussed accountability systems might be in the long term for school politics. Most critics of high-stakes standardized testing po int to the practice legacy, the way that high-stakes testing may narrow the focus of teachin g and provide perverse incentives within schools and school systems. However, the political legacy is as important as, and in some important ways dovetails with, the practice legacy. High-stakes testing narrows how we judge schools as institutions and whose school succ ess is important. Moreover, opponents of high-stakes testing rarely consider the political l egacy of proposed alternatives. The most prominent alternative vision of accountability revo lves around the outdated model of ascendant professionalism. A consideration of accou ntability's political legacy would require different alternatives to high-stakes testing, ones that would cultivate deliberate political connections between schools and communities. Table of Contents The Importance of Political Legacies The Popularity of School Accountability Unexamined Assumptions of Accountability The Political Costs of Accountability The Political Weaknesses of Professionalism The Ground We Stand on Where To Go References (uses a second browser window) The Importance of Political Legacies I choose the term political legacy for statistics because statistical systems constit ute a special example of how public policy creates long-t erm consequences for public debate. Those who study government from a variety of discip lines recognize that public policies set in motion political dynamics that shape the contour s (and sometimes define the limits) of accepted political debate. Two parts of the origina l Social Security Act of 1935, pension insurance and Aid to Dependent Children (the federa l program most call welfare), demonstrate the way that policies can define the po litical landscape. The pension insurance part of Social Security is a universal program; any one who pays into Social Security as a wage-earner (as well as a beneficiary defined by la w) is eligible for payments when older. The universality of the Social Security pension has made its basic features unassailable politically. By contrast, federal welfare was a mea ns-tested program. Only poor people (and not all poor people) were ever eligible for federal ly-supported welfare programs. Unlike Social Security pension insurance, welfare was poli tically vulnerable because of its means testing. Since most people would like to live long, they think of Social Security as an
5 of 33important safety net. But most people do not want t o be poor and, as critically, may not think they ever will be poor enough to be on welfare. The universality of Social Security has protected it politically. Thus, when President Rona ld Reagan suggested changing the pension program in the early 1980s, politicians ral lied to support the system. However, without universality, federal welfare had a much le ss powerful base of support, and the Republican Congress and President Bill Clinton ende d the federal welfare guarantee in 1996. The original outlines of the two programs sha ped future debate over them (Skocpol 1991). The different histories of school desegregatio n in the South and elsewhere since 1954 are also results of a political legacy. The fundamental paradox of desegregation is that the South (including border states) had the most integrated s chools in the country by the late 1980s (Orfield 1993). Southern schools have been more int egrated because of two policies vigorously pursued by white, racist politicians and officials before 1954: state laws mandating segregation and policies of school and go vernment consolidation. Because state law and intentional acts by school officials were a n obvious cause of school segregation, federal courts after 1954 had clear and convincing evidence of unconstitutional segregation in Southern systems and were willing to order far-r eaching remedies in the late 1960s and early 1970s. In addition, Southern school systems a re usually much larger than systems in many other states because of consistent success in consolidating school systems this century. For example, Mecklenburg County, North Carolina, ha s one school administration, so the suburbs of Charlotte are in the same school system as the city. In contrast, the suburbs of Boston are in school systems separate from the cent ral city. Desegregation advocates in the South had two advantages stemming from consolidatio n. First, courts were more willing to order metropolitan desegregation plans in the South after the Milliken v. Bradley (1974) decision required that judges find specific evidenc e of discriminatory intent to remedy metropolitan segregation in fragmented urban areas. Second, large systems made white flight more difficult. Because the South had both a history of state-directed discrimination and also large school systems, desegregation effort s in the region in the late 1960s and early 1970s were more vigorous and far-reaching than in t he rest of the U.S (Douglas 1995; Orfield, Eaton, and the Harvard Project on School D esegregation 1996). The political legacy of statutory segregation and school consolidation m ade extensive desegregation more feasible in the South. These stories, of government pension and welfa re programs in one case and desegregation in the other, demonstrate the relatio nship between the structure of public policy and later political decision-making. To be s ure, that influence is not one-way. A government is not an empty vessel easily manipulate d by electoral and other political forces. Instead, government agencies have their own interes ts, and officials often act in their organizational interests (Balogh 1991b; Galambos 19 70). Schools, like other public bodies, have their own professional and organization dynami cs that mediate, rather than automatically reflect, outside influences. Thus, wh en we speak of a political legacy of school policies (including statistical systems), that lega cy is part of a larger negotiation over the role of public schools. Two facets of that constant barg aining are particularly relevant to understanding the current school accountability reg ime: the limits of educators' professional authority and the local nature of schooling. First, as explained in the next paragraph, school administrators have tried to claim both bureaucrati c autonomy and public acknowledgement of expertise involved in running schools. They have been far more successful in the former task than in the latter. In addition, schooling is a local, public service. Local political control of schools, and the close watch that one can theore tically keep over such institutions, may be
6 of 33one reason why school administrators garnered auton omy earlier in this century. One can thus view statistical accountability systems as one way to resolve the dilemma between granting autonomy and authority to educators and ke eping them under some political control. The political legacy of statistical accountabi lity systems is important because support for publicly controlled schools is fragile. School admi nistrators deliberately built a set of bureaucratic institutions in the early twentieth ce ntury to buffer themselves politically, in part by claiming the need for autonomy to exercise professional judgment and wield their expertise (Tyack 1974; Tyack and Hansot 1982). That autonomy, and the justification for publicly controlled schooling, has been on the wane since mid-century for several reasons. First, the civil rights movement targeted schools a s one public institution that was treating poor and minority children unequally. The attack on school inequalities undermined support both from those who thought that inequality is mora lly wrong and also from those who had relied on state and local control of education to p reserve bastions of private privilege (Kozol 1991). Second, the credibility of public institutio ns as a whole has deteriorated. In part, the Vietnam War and Watergate created a credibility gap between what public leaders said and what most citizens saw happening (Schell 1975); in addition, the internal politics of public agencies have damaged their ability to wield profes sional consensus as a political force (Balogh 1991a). Third, schools have been the target for half a century of accusations of ineffectiveness and soft standards. All of these ev ents undermined the legitimacy of school administrators as autonomous professionals and publ ic schools as worthy of financial and political support (Tyack and Hansot 1982). Privatiz ation, through charter schools or vouchers, represents one potential result of declin ing support for school systems as publicly financed and controlled organizations. The politica l legacy of current educational reforms, including growing development of statistical accoun tability systems, will define in some measure the future debates about schooling. Return to Table of Contents The Popularity of School Accountability The public judging of schools by test scores i s relatively new in the United States. School statistics have existed since the late 19th century and claims to objective measurement of student achievement from the turn of the 20th, but achievement scores have typically been only for internal consumption within school bureaucracies until recently. In the wave o f school criticism after World War II, ideological de bates over progressive education and the needs of the Cold War were the explicit points of c onflict; statistical evaluations were invisible in the 1940s and 1950s debates over schoo ling (Ravitch 1983: 71-80, 228-32; Spring 1989: 10-33). The public debate over Scholas tic Aptitude Test (SAT) score trends did not exist until the mid-1970s, even though the decline in mean scores began in the early 1960s. The New York Times for example, did not start reporting SAT scores a nnually until 1976 (Maeroff 1976). No network news broadcasts bet ween 1968 (when the Vanderbilt Television News Archive began recording and indexin g network news) and 1974 reported test scores as the substance of the story; the firs t networks to do so after 1967 were ABC and CBS on October 28, 1975. ( Note 3 Uses second browser window.) The popular reportin g of periodic student data, therefore, is of relatively recent vintage. One may consider statistics as one of many types of evidence and reasoning in publ ic debate, such as the following list (meant to be an illustrative rather than a comprehe nsive typology):
7 of 33Ideology Debates can focus on the purposes of schools and th e perspectives offered in the curriculum or in teaching techniques. The at tack on what progressivism had become by the 1940s is an example of ideological debate, as was the attack on outcome-based educatio n in the early 1990s in Pennsylvania and elsewhere. Representative Story Debates can center on real or apocryphal stories ab out education that represent the issue at hand. Anecdotes about high s chool graduates who cannot read (and the argued need for higher graduat ion standards) are an example of argumentation from representative story. Statistics Debates over the quality of education in the 1980s, following the Nation at Risk report (National Commission on Excellence in Educa tion 1983), are an example of discussion focused on statistics. Direct Observation Debates can also focus on what individuals have see n, first-hand, in schools. I do not know of any national debate relyi ng on directly observed evidence. The self-evident explanation of the last statement suggests, in part, that we focus on statistics because having a "national discussion" b ased on personal, direct observation of schools is a contradiction in terms: we cannot each observe the nation's schools, and our judgment of "the nation's schools" will depend on s econdor third-hand information. Still, most discussion of schools, and even school statist ics, is local. Only thirteen network news broadcasts in the twenty-year period 1968-1987 repo rted statistical test score trends. ( Note 3 Uses second brower window.) Most reporting on educa tion, and most of what individuals hear and read from popular media sources, is still in local news broadcasts and local newspapers. Why, then, have local educational debat es generally assumed the importance of statistics, something that makes more sense for a n ational debate? The common use of statistical mechanisms to ga uge school effectiveness, including the power of standardized test scores, owes its existen ce to the tension between the development of a national debate over education in the twentiet h century and the continuation of local decision-making. The result is a set of themes whic h dominates discussion in cities and states across the country and that borrow much of t heir character and assumptions from the national debate. In many cities and towns, for exam ple, newspapers and local news broadcasts describe similar issues such as discipli ne problems and whether high school graduates are ready for the workplace. Several chan ges in schooling since the early 19th century have encouraged a national debate. First, e ducational reformers have typically borrowed from each other's ideas, spreading them fr om region to region. Second, professional educators and muckraking journalists i n the late 19th and early 20th century explicitly campaigned in nationally-distributed jou rnals against school corruption and the decrepit conditions in urban schools, on the one ha nd, and for professional autonomy on the other. Their campaign nationalized the Progressive Era education debate. Third, administrative progressives (as David Tyack has ter med them) were successful in creating standard institutional routines in the first half o f the 20th century, so that many school experiences adults remember now are much more similar across the country than adult memories of childhood were 150 years ago. We thus h ave a common set of experiences
8 of 33nationally, making the terms of debate familiar. Fi nally, the nationalization of politics more generally after World War II encouraged the debate over Cold War schooling described earlier. The civil rights movement and desegregatio n consolidated that national framework for discussion. Still, the national educational discussion is a layer on top of and filtering down through older, local politics of schooling. Localism has re mained a powerful force. It has controlled the politics of local and federal educational programs. For example, Southern member s of Congress were critical in supporting federal vocati onal education programs early in the century because the federal government allowed Sout hern states to distribute funds disproportionately to white vocational programs and create different curriculum programs by race. The result was that vocational education prog rams served to reinforce the Southern caste structure (Werum 1997). Traditional federal d eference to state action also modified and limited Title VI of the Civil Rights Act of 196 4, whose implementation still helped force school desegregation in the South (Orfield 19 69). Opposition to federal intrusion has limited national action to the present, including P resident Clinton's desire for tests created and organized by the federal government. Politician s are willing for schools to buy textbooks from national publishers, accepting a tac it national curriculum (Miller 1997). Federal government decision-making, however, threatens more than loca l control of curriculum; it threatens local political networks a nd ways of doing business. Local political control of school policies and funding thus vie wit h the national debate. The result is frequently a set of variations on common practices, resulting in the illusion of local control in many school matters. Standardized testing and ac countability systems are one example of that limited variation. States are free to choose c ommercial tests, develop their own, or not to engage in high-stakes testing at all. Today, how ever, most local school systems or states test children in the spring using multiple-choice t ests with scores that schools can compare (using the publisher's data) against a norming popu lation of children in the same grade. In the past dec de, many states and local districts ha ve added real consequences for the tests, including publicly releasing score data. The result is a patchwork of high-stakes testing that covers most of the nation. Despite theoretical loca l choice about standardized testing, one way of publicly judging schools has become dominant The emergence of contemporary school "accounta bility" dependent on test score results combined an existing set of practices (standardized testing) with the judgment of local schools within a national framework. Within a decad e, public judgment of schools by test statistics became common, after the College Board p ublicized the decline in mean SAT scores, states began instituting minimum competency tests, and the National Commission on Excellence in Education published A Nation at Risk in 1983. Two historical perspectives underline the importance of understanding the polit ical implications of school accountability systems. Accountability has turned the use of educational st atistics upside-down. Statistics bolstered the claims of administrators to expertise early in this century, but politicians and popular news media now use statistics to judge school systems. This reversal shows the weakness of local school administrators i n claiming professional authority. Autonomy within bureaucratic organization, not publ ic respect of their expertise, is the primary power of school officials. The popularity of published test scores obscures al ternative ways of judging schools. In less than twenty-five years, statistical account ability has become so ubiquitous that it appears inevitable. The change has been, in retr ospect, both breathtaking and
9 of 33alarming in its speed. Political debate over the me aning of statistics has largely eclipsed other ways of describing what happens in c lassrooms. The dominance of educational test scores today hide s the fact that we did not have to use statistics as the dominant way of describing school s and their problems, and that in the past we have used many other means. Even when we evaluat e local schools using nation-wide questions, we can use many sources of information. Assuming we must use primarily statistics is dangerous. We must remember that the evaluation of schools by test score statistics is one among many possible ways of seeing education through both nat ional and local perspectives. Whether we made that choice con sciously or wisely is a different question. Return to Table of Contents Unexamined Assumptions of Accountability One consequence of public policy is the defini tion of legitimate debate and, by extension, what is not part of mainstream public discussion. O ften, the assumed axioms underlying policies silence other relevant concerns (Fine 1991 : 32-34). Despite more than twenty years of debate about the statistical performance of stud ents in the U.S. and the proper direction for school reform, remarkably few voices in public have questioned the primary assumptions behind the move towards accountability. This silenc ing shows what we are avoiding when we speak glibly of a political consensus around sch ool accountability. While we are agreeing to high-stakes testing, what uncomfortable issues a re we not discussing? The broad political legacy of statistical accountability systems is the narrowing of legitimate topics for public debate. We do not often discuss the purpose of acco untability or who will be making the key decisions to keep schools accountable. Accountability for what purposes? The dominant discussion of accountability leav es vague the goal of accountability mechanisms. The improvement of schools is an insuff icient goal because accountability is fundamentally a political and not a technical proce ss. Accountability has multiple meanings, in both a general sense and also the current sense in education of statistical judgment (Darling-Hammond and Ascher 1991). The apparent con sensus for "accountability" hides the differences (and the conflicts) among the follo wing meanings of statistical systems. Judging public schools as institutions One may use test score statistics to judge school s as a set of institutions. This sense of accountabil ity (judging the worth of schools in general by test scores) is one of the most widely used tool s in school politics. The annual release of average SAT scores in the late 1970s prepared the g round politically for the claim of declining school effectiveness made by the National Commission on Educational Excellence (1983). One political legacy of judging public scho oling by test scores is the assumption that schooling is a monolithic entity that fails or succ eeds as a single body. What this myth of a monolithic system hides is wide variations in schoo ling, especially between poor and wealthy schools (Kozol 1991). Another political leg acy is that, after intense media focus on statistics that suggest poor schooling, citizens ma y face difficulty reconciling popular conceptions of failing schools with information gat hered in other ways. Polls consistently show that parents' perceptions of their local schoo ls are more positive than their perceptions of schooling nationwide (e.g., Rose, Gallup, and El am 1997). In addition, private interests
10 of 33may subvert policies based on the gross judgment of schools. For example, some wealthy parents in one Michigan district deliberately pulle d their children out of high-stakes standardized testing when they perceived that it mi ght hurt their children (Johnston 1997). They may well have been willing to have high-stakes testing for "other people's children" (to borrow from Lisa Delpit's 1995 book title) but not theirs. This consequence is the educational equivalent of urban development NIMBY ( Not in My Back Yard) syndrome. Judging teachers and other educators One may also justify accountability as a way to raise (or clarify) expectations and goals for teach ers and administrators. An explicit part of accountability systems in the last few years has be en the evaluation of teachers, principals, and other administrators. For example, the Tennesse e Value-Added Assessment System, passed in 1992, originally mandated statistical mea sures of student gain as part of personnel evaluation (Educational Improvement Act of 1992). A n earlier variant of judging teachers, schools, and school systems by comparative statisti cs was the U.S. Department of Education's "Wall Chart" instituted by Terrence Bel l as an attempt to spur reform (Ginsburg, Noell, and Plisko 1988). This use of accountability focusing on teachers and administrators, is the one most criticized as encouraging teaching to the test and "gaming" test results (Cannell 1989; Glass 1990; Madaus 1988, 1991; McGil l-Franzen and Allington 1993; Merrow 1997; Shepard 1991; Smith 1991; Smith and Ro ttenberg 1991). The political legacy, however, may be even more harmful: By setti ng up a system based on the distrust of teachers, we make alternative ways of judging teach ers and schools more difficult (Fisher 1996; Sizer 1992: 188-89). Judging students In many states and school systems, standardized t ests have high stakes not only for educators but also for individuals stu dents, as scores can be among the criteria for entrance to academic programs, grade promotion, or other real rewards and punishments in schooling. The use of tests to sort students U.S began with monitorial schools in the early nineteenth century and admissions tests to early pu blic high schools (Kaestle 1973; Labaree 1988; Reese 1995). More recently, the use of so-cal led minimum competency tests emerged in the late 1970s as a response to allegedly lowere d standards of public schools (Bracey 1995b). The rationale of using tests to make studen ts accountable is that, having test scores as a clear goal, students and schools would meet th e expectations (Ravitch 1995). One potential legacy of such high stakes, however, is t he rhetorical scapegoating of students. Calhoun (1973: 70-72) describes one purpose of test ing in schools as displacing blame for ineffective teaching onto students. If a student fa ils a test, one may reason, the failure is the student's intelligence and lack of diligence. That consequence is already evident in many states with high-stakes testing. In Tennessee, for example, the teachers union pressed to exempt scores of students with disabilities from te acher value-added statistics ("Sanders model to measure 'value added'" 1991). One might pr esume that children with disabilities are those on whom we should most focus attention in evaluating teaching effectivene ss. Yet teachers asked for the exclusion of scores because, the union argued, including such scores would be unfair to teachers. The displacement of bl ame for failed schooling onto students is a legacy of testing that existed well before high-s takes standardized testing, but accountability systems may exacerbate such tendenci es (e.g., McGill-Franzen and Allington 1993; McGrew, Vanderwood, Thurlow, and Ysseldyke 19 95; National Center on Educational Outcomes 1994). Judging public policy One might use standardized test scores (like othe r information) to evaluate public policies. The National Assessment o f Educational Progress (NAEP) tests, begun in 1969, is theoretically a means for using n on-high-stakes testing to evaluate public
11 of 33school policy with objective data. NAEP data is at the heart of some recent debate about school and student performance (see Berliner and Bi ddle 1995, 1996; Stedman 1996a, 1996b). However, demands to use the NAEP to judge e ducators and students in high-stakes systems is threatening to compromise NAEP's use as a lower stakes way to gather information about student performance (Jones 1996; Koretz 1992a). One problem is the technical and fiscal demands of high-stakes versus low-stakes systems. In addition, however, is the ideological debate about the use of informat ion. Can one maintain a low-stakes statistical system in the face of political pressur es for high-stakes accountability? Building organizations In a broad sense, standardized testing supports t he determination or control of curriculum content at the state and n ational levels. Some such as Ravitch (1995) explicitly advocate curriculum content stand ards and see teaching to the test as valid with appropriate testing and content. One consequen ce of statistical accountability, however, is the creation of new public and private organizat ions producing educational statistics. Publicly, states now have accountability or evaluat ion offices whose job is to provide the technical expertise in analyzing test data, and the federal government has the National Center for Educational Statistics, which contracts out NAEP as well as compiling and disseminating a wide variety of educational statist ics. Private organizations supported by testing are the companies that write and sell tests or contract with agencies for the creation of specific tests. With each public release of test score statistics, popular news sources, politicians, administrators, and the public rely mo re on relatively anonymous technocrats to explain what is happening in schools. Other new pro fessions this century, such as nuclear science, have also staked their claim to expertise on political factors (Balogh 1991a). The fact that this reliance on statisticians stems from political pressure for school reform usually escapes notice. Marketing Schools occasionally use student statistics as pa rt of public marketing strategies, either to attract students who have cho ices (as in selective colleges) or to bolster public support. One of the largest metropolitan sch ool systems in the country recently produced a pamphlet boldly titled, "Our Students' T est Scores Reflect Academic Achievement" (Hillsborough County Public Schools 19 97). While one paragraph cautions that test scores are not the sole basis for evaluat ing students or schools, the rest of the pamphlet trumpets above-average achievement. Public relations was a strong motivation behind what Cannell (1989) called the "Lake Wobegon effect of claiming high test scores in public reporting through the use of outdated nor ms. The use of accountability data for marketing is an open secret among administrators. A s Dennie Wolf said in the John Merrow documentary Testing . Testing . Testing (1997), "Districts sell real estate based on test scores." With the decline of administrative authori ty described elsewhere in this article, superintendents have considerable interest in boast ing about their systems using any tools at their command. These varied purposes of accountability are no t necessarily congruent. The use of test scores to bash public schools is not compatible wit h a nuanced debate over public policy, and students and teachers may have conflicts of int erest when tests have high stakes for both. In addition to inconsistent purposes, the aim s of accountability do not easily include other issues relevant to education: equity, the dir ection of curriculum, or the purposes of education more broadly in a changing world (Darling -Hammond 1992). One dominant assumption of accountability systems is that the go als of education are agreed upon and we need only establish a system to measure whether sch ools and students meet those goals. The creation of statistical accountability systems may freeze the assumption of a single purpose
12 of 33of statistical accountability into a framework for the politically accepted discussion in education for years hence. Who keeps schools accountable? A second unexamined assumption is that central bureaucracies and popular news media are the logical, natural places for holding schools accountable for performance. In most school testing regimes, central offices (at the sta te or local level) are responsible for the general logistics of testing and compiling results. Results at some level are then available to administrators, public boards of education, and med ia organizations. In many states and regions, newspapers publish test score statistics, often ranking schools or systems based on the scores. But who is not among the direct targets of test score disseminati on is as important as who is Judges and advocates monitoring school system compl iance in discrimination cases Judges and advocates overseeing compliance with non discrimination orders (such as desegregation) generally are not intended users of "accountability" information. Despite promises by school systems to pay closer attention to achievement in desegregation cases, local systems have a very spotty record in demonstr ating success after the end of desegregation orders. Orfield, Eaton, and the Harva rd Project on School Desegregation (1996) has compiled evidence that, in several of th e major cases this past decade, school districts released from desegregation monitoring by the courts not only experienced resegregation but growing achievement gaps between white and minority students. The new accountability system does not appear geared to kee p systems accountable in this respect. Many advocates appointed to monitoring and advisory commissions have reported to Orfield and his associates that local systems have either d enied information (such as disaggregated test scores) outright or made the gathering of data extremely difficult. In addition, the Supreme Court decision in Missouri v. Jenkins (1995) declared that district court judges should consider test scores as marginally important (at most) as a measure of compliance with racial equity requirements. The only major cas e where a court has continued to monitor standardized test scores as part of a major equity lawsuit has been in New Jersey, where the state's supreme court continues to criticize inequa lities between the education offered children in the wealthiest and poorest systems of t he state ( Abbott v. Burke 1997). In the past five years, the court has broadened its focus from just monetary support of schools to include measurable outcomes. The New Jersey Supreme Court has been a lonely exception to the general rule, especially in the federal judi ciary: Accountability does not appear to require even reasonably equitable outcomes. Parents and the general public Parents receive test scores of their children, bu t rarely do they or the general public have direct access to te st score results or their limitations Popular news sources (television, radio, and newspapers) me diate the transmission of information, often deleting information critical to understandin g the limits of such data or transforming the statistics in ways either incomprehensible to r eaders or to create invalid statistical comparisons. The reporting of high-stakes test data by Nashville metropolitan newspapers form a case in point. Beginning in 1993, the state of Tennessee reported test results of schools and districts using a complex statistical s ystem called the Tennessee Value Added Assessment System. The state's newspapers have quic kly rushed to print school-by-school scores including rankings, even where schools many rankings apart had negligible differences in scores (in other words, when the ran kings were unjustified by the statistics). For example, in 1996 the Nashville Tennessean transformed the value-added scores into percentile ranking, even though the technical docum entation for value-added scores would
13 of 33not support such an interpretation (Bock and Wolfe 1996: Chaps. 5-6; Klausnitzer 1996; Tennessee Department of Education 1996). Why did th e Tennessean transform value-added scores that were the result of a prior statistical manipulation, and why did the paper then rank schools? One reporter explained: We chose to report in percentile ranks because it h elps people see how their school stacks up against the rest of the state, and because this information is not available anywhere else. It was calculated by The Tennessean ... [because] we wanted to offer something unique. We also wanted to answer our readers' number one question about the test scores: How does my child's school compare to the other schools? (Lisa Green, e-mail to author December 5, 1996) In addition, the newspaper reported percentile rank ings by tenths (for example, 50.1 instead of 50th percentile). The same reporter acknowledged that the newspaper staff did not consciously justify that apparent precision: There's really no need to report these numbers down to the tenth of a percentile. However, the programming for the site was written l ast year ... so the computer automatically included the decimal place, and we di dn't think it was necessary to take it off. (Lisa Green, e-mail to author, Dece mber 5, 1996) In this case, a metropolitan newspaper's desire to have "something unique" conflicted with its readership's interest in having clearly underst andable information to interpret independently, or even information with a justifiab le level of detail. Even if one assumes that the value-added scores are comprehensible, tra nsforming those into percentile rankings was neither valid nor necessary for rankings (itsel f a method of reporting scores which the state's external evaluators recommended against). I n no case did the newspaper note what the evaluators clearly stated: that school scores w ere unstable and could not be relied on for clear distinctions in performance (Bock and Wolfe 1 996: Chap. 5-6). The dissemination of information through two intermediaries (the state g overnment and news sources) in essence created one dominant way to analyze scores in the m etropolitan Nashville area: how did schools "stack up" in competition with each other? The false precision in percentile rankings suggested that readers could rely on the numbers as rigorous, objective facts. The accuracy of newspaper reporting is also questionable; the Tennessean had to reprint its comparative tables in 1994 because of acknowledged gross errors in reporting ("How Midstate Schools Stack Up" 1994a, 1994b). While comparisons among sc hools may be appropriate in some ways, the presentation of school scores suggested a certainty which was incompatible either with the statistical calculations or the mediation of state agencies and newspapers in transmitting test scores. Moreover, the dissemination and discussion of today's school accountability systems strip parents and the general public of control and owner ship of information. In the case of Nashville, a reporter reduced parental evaluation o f schools to examining rankings in a table, akin to sports league rankings (see Wilson 1996). O ne might contrast the typical method of disseminating accountability statistics with two al ternative local methods of accountability: the "visiting committee" of town elders in the eigh teenth and early nineteenth-century district schools, on the one hand, and the calculat ion of dropout statistics by a Hispanic activist organization in Chicago in the 1980s, on t he other. In many district schools, a small committee of citizens held the power of hiring and firing over schoolteachers and could visit the school at any time (e.g., Cohen 1973: 407). Acc ountability in district schools was a rough-and-tumble affair, often unfair to teachers, but local citizens could form judgments in
14 of 33a simple way: watching classrooms. Independent gath ering of data today is also possible. In the 1980s, Aspira, Inc., a Hispanic activist organi zation, suspected that official dropout statistics from the Chicago public schools were ina ccurate or fraudulent and conducted its own research. Activists then used the independent s tatistics to help prod Chicago towards urban school reform (Hess 1991: 7-21; Kyle and Kant owicz 1991). In both cases, individuals at the local level produced and acted o n their own judgments of schools. Reliance on centrally-calculated statistics in acco untability systems often overrides local, independent judgment of schools. The fundamental issue of control is directly c onnected to the purposes of accountability: Individuals in different roles would ask different questions of accountability mechanisms. Politicians might ask whether schools "measure up" to some standard (such as a national norm). Business leaders might ask about workplace-r elated skills and behavior. College faculty would want students to have some intellectu al foundation. Parents might ask whether their children are getting enough individual attent ion. Who should be asking the hard questions about schools? The history of the Common Core of Data (a set of education data collected by the federal government since the early 1970s) illustrates the difficulties of creating an explicit consensus. Because of pressure s within government, doubts about its utility and cost, and disagreements about what it s hould measure, the Common Core of Data for many years gathered relatively innocuous inform ation in a history Janet Weiss and Judith Gruber (1987) described as "managed irrelevance." O f all the information used by the National Commission on Excellence in Education (198 3) to lambaste the condition of schools, none came from the official federal educat ion database (Weiss and Gruber 1987: 370). What we face is not an explicit consensus but a hidden one, never debated clearly, founded on the spread of standardized test scores. Statistical accountability systems suggest an objectivity and universality of coverage which i s impossible. As Sizer (1995: 34) noted with regard to the debate about educational standar ds, "The word system has come up again; . Essentially, it implies a technocratic approach." We should not evade the political question of the purposes of schools through the pro duction of statistics. The current penchant for statistical accountability systems div erts resources to a mechanism that hinders discussing the nuts and bolts of schooling. We hide behind the apparently objective notion of an accountability system. Return to Table of Contents The Political Costs of Accountability The political legacy of statistical accountabi lity systems is complex because of the different possible aims of (and justifications for) accountability and also because statistical systems will vary among different states and distri cts. Nonetheless, one can identify several broad patterns which stem at least in part from the proliferation of statistical accountability systems. Two legacies have seriously damaged our co llective ability to have reasoned, broad discussion about the aims of schooling and reasonab le public policy. Statistical judgment of school has narrowed the basis on which we judge sch ools and has also encouraged impatience with school reform.Narrowed Judgment of Schools Technocratic models of school reform threaten to turn accountability into a narrow, mechanistic discussion based on numbers far removed from the gritty reality of classrooms.
15 of 33Over the past twenty years, the dominant method of discussing the worth of schools in general has been the public reporting of aggregate standardized test score results. Popular news sources typically distort and oversimplify suc h findings (Berliner and Biddle 1995; Darling-Hammond 1992; Koretz 1992b; Koretz and Dieb ert 1993; Shepard 1991). The recent public debate over schools is not rich, reli ant on multiple sources, or nuanced. Nor is the reliance on statistics inevitable in national d iscourse, despite recent history. Prior waves of reform, such as concerns about math and science education in the 1940s and 1950s (whether one agrees with their goals or not) did no t need test score data as motivation or evidence (Ravitch 1983). Test-score data and its use have pushed other issues to the margins. The aftermath of the 1983 report A Nation at Risk eclipsed two major policy initiatives of the first Reagan administration. The early 1980s saw dramatic cutbac ks in the support of the federal government for state and local public schools. At t he same time, social conservatives both in and out of the Reagan White House were arguing for the creation of vouchers to support parents sending their children to private schools. Neither of these issues, however, were part of the central discussion of education policy after the release of A Nation at Risk The dominant discussion in popular news media revolved instead around declining test scores, the presumed responsibility of schools for national economic decline, and how to tighten academic standards (Berliner and Biddle 1995; Brace y 1995b). Few mentioned changes in the federal budget or privatization proposals, even though one was a concrete policy of the Reagan administration and the other was a radical p roposal for changing the governance of schools. Ironically, the dominant discussion suppre ssed issues which concerned both liberals (upset at budget priorities) and social conservativ es (wanting vouchers). More recently, New Jersey Governor Christine T odd Whitman tried to argue that a standards-based accountability system alone could i mprove the state's schools. Her department of education responded to the state Supr eme Court's call for equity with state-level achievement standards but no added reso urces, despite the state's history of vividly unequal funding among school systems. The a rgument by the executive branch was that standards, by themselves and despite existing funding inequities, would create school improvement. The assumption by Whitman is that test -based school accountability, as a technocratic mechanism with threatened sanctions, i s sufficient to change schools, even schools with the worst records. The state court agr eed with the governor in that New Jersey could have state-level standards but disagreed with the argument that funding was irrelevant. It then ordered the state to improve its funding of poor schools (once again) ( Abbott v. Burke 1997). New Jersey is fortunate in having one branch of government able and willing to articulate a complex view of what school reform req uires. In general, however, extending public discussion of schools beyond test-score stat istics is difficult. Impatience with Reform On a political level, impatience with reform a nd the cyclical reporting of statistics encourages the dominant myth of contemporary educat ional politics, that schools continue to decline in quality. ( Note 4 Uses second browser window.) That myth encourages a cynicism towards reform strategies. We should not be surpris ed that we have witnessed several "waves" of reforms since the regular publishing of SAT scores began in the 1970s. The mundane details of statistical accountability syste ms encourages fads. Without a concrete sense of what children and teachers should be or ar e doing, the public compares statistics against a set of arbitrary benchmarks.
16 of 33 On a practical level, statistical accountabili ty produces both undue impatience with reform and laxity towards incompetence. The yearly reporting of test scores creates an artificial schedule for judging schools: Do they im prove by the next set of annual tests? The periodic nature of reporting school statistics driv es the disposal of reform writ large, because policy changes cannot change classroom practices on a deep and fundamental level or become institutionalized in a short time (Lipsky 19 80; Tyack and Cuban 1995). Yet, paradoxically, the annual time-frame of standardize d testing gives too much time for weak teachers to flounder without guidance or correction Pinning personnel practices to annual testing may undermine the obligation of fellow teac hers and administrators to keep a close eye on teachers without the necessary classroom ski lls. Principals may feel inclined to give poor teachers until the following cycle of annual t ests to improve. For children, however, a year of being with an incompetent teacher can be ex tremely destructive. The problem is in part one of inappropriate time scales. Annual tests are too infrequent for appropriate guidance of instruction or evaluation of teaching, while they are too frequent to measure broader changes in schools. In addition, standardized test accountability discourages the evaluation of what happens in the classroom. As long as a school or teacher ha s adequate test scores, what happens in the classroom is irrelevant. Similarly, poor test s cores indicate needed change, no matter what happens in the classroom. The philosophy behin d such practice-blind evaluation is putatively to give teachers autonomy. As the design er of one state's accountability system explained, accountability statistics allow teachers to make their own choices (Sanders and Horn 1994). Ultimately, however, this diminution of practice undermines teacher and school power, for several reasons. First, teachers do not usually have time to review and evaluate on their own a wide array of alternative teaching meth ods; they need support in selecting, adapting, and implementing different methods and cu rricula. Second, parents and other citizens do care about what happens in classrooms. Schools try ing dramatic departures from normal practices face (sometimes very reasonable) c riticism from parents even when the intent is to respond to the accountability system. Separating accountability from the sense of what a "real" school is (Tyack and Cuban 1995) is d eceptive in the long run. It gives schools the following message: "Make your choices because w e only care about test statistics. But we won't give you enough support to follow up on yo ur choices, and in the end we will condemn your choices if they violate our ideas of w hat schools should be." One consequence of statistics-driven impatience is incr eased cynicism among teachers and administrators and their uncertainty about what the public really wants. Discussions isolated from what happens in schools may be politically all uring and attractive to popular news sources, but test scores drive a wedge between scho ols and the students and public they serve. Parallels between Practice and Political Legacies The political legacies of high-stakes statisti cal accountability systems parallel the practice legacies in two respects. First, narrowed political judgment of schools is the macropolitical equivalent of teaching to the test, a narrowing of the curriculum. Researchers have documented the tendency for teachers to narrow thei r focus to content and styles which they perceive will result in high test scores (Madaus 19 88, 1991; Smith 1991; Smith and Rottenberg 1991; Shepard 1991). Relatively few teac hers, faced with the onslaught of standardized testing, are willing to innovate. Meie r (1997: 9) writes, The danger here is that we will cramp the needed in novations [in teaching] with
17 of 33over-ambitious accountability demands. Practical re alism must prevail. Changes in the daily conduct of schooling . are hard, s low, and above all immensely time-consuming; they require qualities of trust and patience that we are not accustomed to. High-stakes accountability is not a system that dem onstrates trust in teacher's capacities. By signaling massive distrust, high-stakes testing ins tead provides low expectations for teachers (Sizer 1992: 110-13). Imagine the result of a thoug ht experiment: the plight of John Dewey's University Lab School teachers under a high-stakes system. One might like to spend an extended time exploring history and science through the concrete example of textile manufacturing (Dewey 1899). In a modern accountabil ity system, however, the state will test the children in March or April, with much of the te st based on several dozen discrete skills. Whether the children can understand the role of tex tile mills in 19th century economic changes, or whether they can explain what principle s allow a loom to work, is irrelevant to accountability systems based on standardized tests. Balancing such competing demands is extremely difficult. Teachers and schools who fight the pedagogical consequences of high-stakes testing are relatively unusual. Whether one agrees with the appropriateness of multidisciplinary teaching for some or all children one cannot confuse the expectations of today's statistical accountability systems with exp ecting children to understand connections between what they see in life and academic discipli nes. The latter is of a higher order of magnitude entirely. Relying on standardized tests a nd high-stakes production of test statistics is itself a dumbing-down of political de bate and expectations for schools. Similarly, impatience with reform and fad feti shes are the macropolitical equivalent of being impatient with children's progress. The aggre gation of test score data often gives teachers and administrators incentives to exclude s tudents whom they feel will harm test figures. Repeated reports of test scandals, the ple a by teachers in Tennessee to exclude students with disabilities from their statistics, a nd variations in the proportion of students tested provide continuing evidence of the perverse incentives high-stakes testing provides (Glass 1990; Madaus 1988, 1991; McGill-Franzen and Allington 1993; McGrew et al. 1995; Smith 1991; Smith and Rottenberg 1991; Shepard 1991 ). These incentives perpetuate a dynamic of educational triage, wherein those who ha ve the best chance to survive in life because of other circumstances also have the best o pportunities to learn (Fuchs and Fuchs 1995; Sapon-Shavin 1993). Return to Table of Contents The Political Weaknesses of Professionalism If accountability based on standardized tests encourages a narrow political discussion about education and impatience with schools, altern atives proposed by critics of standardized testing confront the same history that engendered statistical accountability. Dissenters from the accountability "consensus" exis t, from longstanding standardized testing critics at FairTest ( http://www.fairtest.org ) to the Coalition for Essential Schools ( http://www.ces.brown.edu ) to Teachers College professor Linda Darling-Hammo nd and Arthur Wise, current president of the National Coun cil for Accreditation of Teacher Education (NCATE). Each opposes the idea of motivat ing school reform by standardized testing. The proposed alternative methods of motiva ting better teaching include performance (sometimes called authentic) assessment of students peer evaluation of teaching, and either creating a second tier of high-status teachers or r estricting entry into a limited number of
18 of 33high-status positions within teaching. Advocacy of greater professional authority in education have generally focused on teacher educati on and preparation (e.g., Darling-Hammond, Wise, and Klein 1995; Holmes Group 1986; also see Labaree 1992), but includes accountability; for example, Wise has been concerned with the deskilling of teachers since Legislated Learning (1979). In general, the critics of standardized te sting seek greater teacher autonomy and respect from the publi c, and in that way we might call professionalism the central value of the dissenters (e.g., Darling-Hammond 1988, Haefele 1992). Wise and Leibbrand (1993: 135) write that, Hallmarks of a profession include mastery of a body of knowledge and skills that lay people do not possess, autonomy in practice, and autonomy in setting standards for the field." If teachers could successfully professionalize, Wise and others suggest, they woul d gain more respect from the public and earn the autonomy needed to improve schooling (e.g. Wise 1994). The logic of professionalism is very appealing with the explicit parallels to the professionalism of medicine (Starr 1982). It links mechanisms within s chooling (who controls decision-making) to the public status of teachers a nd the politics of schools. Professionalism appears to be politically astute. Professionalism, however, is not likely to be a successful gambit in schooling, for several reasons. Most importantly, professional ideology is politically unpalatable in the late twentieth century. Trying to use professionalism mi sunderstands the historical context for the ideology of expertise and its widespread (polit ical) success a century ago. Professionalism in the form of high-status, science -based occupations like medicine and engineering was one response to the chaos of indust rialization and changing class structure (Wiebe 1967). Its early proponents argued that the complexities of modern life required technical expertise to solve public policy and prac tical problems. However, professions include more than high-status jobs, with occupation s as diverse as architecture and craft work like plumbing. A profession typically involves three dimensions: a claim to specialized expertise, some informal or formal credentialing to control entry into the occupation, and autonomy on the job (Friedson 1984). Classroom teac hing falls partway among all three dimensions. Classroom teaching does involve some sk ills that few could walk in off the street with, but the general public has far more kn owledge of what happens in classrooms (and is more willing to make second judgments of te aching) than fields like surgery. Long-term teaching requires credentials, but many s chool systems hire uncredentialed personnel on an emergency basis. Finally, public sc hools operate as loosely coupled organizations (Weick 1976): Most teachers can shut their doors in the face of some supervisory directives, but material conditions (su ch as the textbooks available) circumscribe their autonomy on the job, and they fa ce other demands they cannot ignore, such as the official curriculum and standardized te sts. We should see the ideology of professionalism thus as attempting to emulate a rel atively small slice of all occupations with professional traits rather than, as is typically as sumed, making teaching a "real" profession. Teaching already is a real profession, though one w ith less claim to specialized expertise and less autonomy than advocates of teacher professiona lism would want. Professionalism theories today appeal to an ou tdated ideal of insularity and ascendant authority. The worst excesses of school bureaucraci es today stem from successful professionalism, albeit not in the classroom. Super intendents at the turn of the century argued that schools needed to be away from politica l battles that would harm the integrity of school systems. Creating an autonomous professional unit (a central school office) would improve administrative efficiency and rid schools o f corruption (Tyack 1974; Tyack and Hansot 1982). Their success accelerated the bureauc ratization of urban school systems.
19 of 33Today, however, professionalism is no longer unques tioned. School administration has credentialism and relative autonomy on the job, but not as much claim to specialized expertise as sixty or seventy years ago. Not only a re North Americans far more skeptical of professional authority than fifty years ago (as dis cussed earlier), but capital mobility is impinging on professional authority in a wide range of fields. The parallels made between teacher professionalism and medical professionalism is jarring. One cannot today call medicine an autonomous profession when doctors are complaining that clerical workers and financial officers in health maintenance organizati ons are limiting their clinical decision-making (Bodenheimer 1996). In addition to ignoring the historical decline of professionalism, arguments for advancing teacher professionalism undermines democratic contr ol of schools. As Strike (1990: 362) noted, "Professionalism is nondemocratic in that it appeals to political values other than those of popular sovereignty to legitimate its auth ority." Peer review of teaching (e.g., Haefele 1992) is a case in point. Civil rights acti vists may not want teachers to have virtually unlimited autonomy in the classroom. Bob Peterson ( 1997: 4) explained, "A potential problem with the strictly professional union approa ch [to accountability] . in many urban districts has distinct racial overtones. Is peer ev aluation the exclusive province of teachers and administrators or should parents and community members play a role?" Especially as the teaching force's demographics diverges from tho se of students and parents (Justiz and Kameen 1988), relying on professional-only evaluati on may insult parents of a school who expect a role in school governance. Having an exper tise-based evaluation system conflicts with U.S. traditions of democratic control, upon wh ich civil rights activists have based advocacy of school governance councils. Some critic s of standardized testing, such as Wilson (1996), point to British school inspections as an alternative to statistical accountability. The heart of the British inspection system, however, was until recently a self-perpetuating corporate body selected by and fr om experienced teachers. One may (as Wilson did) use school inspection to point out the problems in high-stakes accountability. One may not, however, successfully import the insul ar assumptions of professionalism to late 20th United States public schooling. Professionalism is the dominant alternative to standardized-test-based accountability. Other critics of standardized testing-based account ability may not be as explicit as Wise in their advocacy of professionalism, and they may not agree with his proposals to limit entry into high-status positions in teaching. Still, they argue for more decision-making power in the classroom and school and see the bureaucratizat ion and centralization of authority as one of the reasons why standardized testing is flawed. Thus, Kenneth Peterson (1995: 4) argues that one of the key principles in teacher evaluatio n should be to "place the teacher at the center of evaluation activity." In that respect, th e professionalism label is a useful heuristic device for understanding opposition to standardized testing. Despite its intriguing hypothesis (that status and autonomy are the key to educationa l reform), professionalism is unlikely to supplant high-stakes accountability because it is p olitically untenable. Moreover, professionalism addresses primarily concerns inside schools (autonomy of teachers). Publicly, professionalism only changes t he superficial aspect of teacher status, not the public dissatisfaction and disconnection which schools face more broadly. Several historical changes have fragmented what is supposed ly a common public commitment to education. The aging of the population since the he ight of the baby boom has shrunk the political power of parents. In addition, the civil rights movement and a political coalition of fundamentalist Protestant organizations have stripp ed school officials of any broad political
20 of 33consensus. Finally, the fragmentation of urban poli tics and suburban growth has encouraged continued racial and class segregation (albeit in n ew forms), making common interests in broad school policies difficult (Katznelson and Wei r 1985). While I doubt professionalism's proponents would ever claim that it is a panacea, t hey have nonetheless pinned their hopes for dramatic school reform on a model that would no t solve the major problems of school politics today. Return to Table of Contents The Ground We Stand on Like the expansion of Israeli settlements in o ccupied territories, the continuing spread of standardized testing has created "facts on the grou nd" which have transformed both schools and the politics of education. To ignore the educat ional landscape around us, or to wish it would go away, is unproductive. Those who disagree with the assumptions of high-stakes, testing-based accountability must acknowledge that standardized testing is likely to become even more prominent in the short-term. This underst anding should not prevent advocates from fighting the trend where possible. Local victo ries against high-stakes testing are important both to the children involved and also as a standing alternative to technocratic accountability. Nevertheless, we should see clearly what is and is not possible in the near-term future. The Future Growth of Standardized Testing Standardized testing connected with high-stake s accountability systems is likely to become more prominent in the next five years in the majority of states. The Education Commission of the States (1997) recently reported t hat almost half of all states have implemented or are planning public accountability s ystems using statistical measures. Some additional states may use the national tests advoca ted by President Clinton (if the tests exist). Some like Tennessee will design their own accountab ility mechanisms. Others like New Jersey will create a set of content standards with the promise of new tests and accountability tied to the content standards. The federal governme nt and states will then spend millions of dollars developing tests, field-testing them, and s upporting their use. In the meantime, popular news sources will continue to report annual ly the average SAT scores and tests currently used in local jurisdictions. Within five to ten years, some states will begin the mandated use of exams replacing or supplanting curr ent off-the-shelf commercial tests. Moreover, the political debate over tests is l ikely to center around the federal relationship between Washington and the states or (with privatiz ation) public oversight of private schooling. For the duration of President Clinton's term, the administration is likely to support national tests, and governors who dissent ( like Virginia's outgoing Governor George Allen) will do so not because they disagree with hi gh-stakes tests but because they wish states to design their own independent standards. I f federal courts, using Agostini v. Felton (1997), allow tuition voucher programs to proceed, state legislatures may contemplate mandatory use of high-stakes testing for private sc hools accepting public funds. The debate would then shift to public control of private educa tional institutions. A vision of the future debate may be Ohio Association of Independent Schools v. Goff (1996), in which a federal appeals-court panel concluded that Ohio's requireme nt to test private school students was constitutional. Those who disagree with all high-st akes testing will be at the margins of debate in the near future, except where they make a lliances with others (as in the
21 of 33Congressional fight over national tests).Limits on High-Stakes Testing High-stakes testing has some significant weakn esses, despite the near-term growth we can expect. Some of the same dynamics which have li mited the accountability use of performance-based, open-form testing will also shap e standardized testing. Simply put, developing tests is expensive. The Tennessee legisl ature recently delayed the implementation of new subject tests for high school students to use in the value-added statistical system because, according to the bill's sponsor, the state could not afford the $10 million development cost (Educational Improvement A ct Amendments 1997; Finn 1997). In addition, political adversaries may well use the ma nagement and pedagogical problems of new testing and accountability systems as a pawn in broader partisan battles. California's recent educational history is a case in point. Ques tions about the utility and propriety of performance-based tests combined with the expense o f development and testing to kill the California Learning Assessment System. The governor state superintendent, and legislature at the time were at odds over the purpose of the sy stem, and that political conflict fed a controversy started by conservative critics over th e ideological content of the tests, dooming the largest experiment in performance-based account ability to date (Kirst 1996; McDonnell 1997). Observers of merit pay have noted that polit ical dynamics involving fairness and incentives to cheat typically kill merit pay system s (e.g., Glass 1990). The same may happen to the next generation of high-stakes accountabilit y. Contraction of the Meaning of "Public" Despite the weaknesses of high-stakes testing, the short-term consequence of more standardized testing may be intensified criticism o f public schooling and cynicism about the purposes of public educational systems. Schools nee d to be "public" in the sense of public involvement and political commitment (Fine 1991: Ch ap. 9; Katz 1992). However, the ranking of schools and teachers is inherently a zer o-sum game, and not everyone can be above-average. Seeing school performance in such te rms, divorced from classroom practice and public policy, makes both meaningful praise and criticism of schools very difficult. Moreover, the constant reinforcement of the myth of declining school performance will continue the erosion of support for the good school s that exist and make intense discussion of the needs of children more difficult. Return to Table of Contents Where To Go Some alternative models of accountability may reverse the destructive tendencies of statistical accountability systems, both in politic al and practice terms. Reconstructing public education in its best sense (schooling for children, their families, and the public) requires connecting schools in a meaningful and explicitly p olitical way with broader communities. In the same way that the development of the Central Park East elementary and secondary schools under Deborah Meier's leadership required b oth bureaucratic support and political connections to survive and thrive (Fliegel 1993; Me ier 1995), so other schools and school critics dissenting from the current accountability trend must craft an alternative support structure, both within and extending beyond public schooling. Sizer (1992) argues for
22 of 33opening up schools to external evaluation for pedag ogical reasons, to keep teachers in touch with reasonable expectations of what students shoul d do. In addition, allowing friendly critics into schools serves an explicitly political purpose, giving community members a concrete sense of what happens in schools. No stati stics can substitute for the type of immediate contact such external evaluation provides Permitting external evaluation is difficult to day. Allowing strangers into schools is threatening because it erodes, at least on a symbol ic level, the commitment to professional autonomy which administrators have maintained for a lmost one hundred years. In practical terms, it requires balancing the legitimate needs o f teachers for enough time to plan and try out ideas against the interests of parents and the public to know what is happening in schools. In systems where many teachers may be from ethnic and racial groups different from their students, the tension between teachers a nd parents may be real, and letting parents into evaluation may be politically tricky (B. Peter son 1997). Yet educators must acknowledge the need to move beyond professionalism as the primary route to support for public schools. Isolating the workings of schools f rom the public has done teachers and administrators a disservice in the long term as pro fessionalism has declined as a successful route to status and autonomy. External community evaluation is not the only conceivable way of crafting alternatives to high-stakes standardized test accountability. Other s might meet the same needs (e.g., Bernauer and Cress 1997). Common to solving the pol itical problems of accountability are the following three requirements: Accountability should encourage deeper discussion o f educational problems. Student performance should be the starting point of educati onal politics, not an occasion for political opportunism or crude comparisons. Statist ical accountability, with the centralization of statistical production and dissem ination through popular news sources, encourages oversimplification rather than a more extensive public discussion. Accountability should connect student performance w ith classroom practice. Statistical accountability, with the abstraction of student performance into numbers without context, removes classroom practices from t he discussion of educational reform. Accountability should make the interests of all chi ldren common. This sense of commonality is the best meaning of "public" in publ ic schooling. Statistical accountability systems intensify educational triage encouraging schools to isolate and devote fewer resources to students whom schools jud ge as difficult to teach. Politically, statistical accountability systems div ide the interests of schools and communities through competition for prestige and re sources. No one should pretend that accountability is withou t conflict or unproblematic. We should face those conflicts and issues directly, however, instead of hiding behind existing standardized testing. Some parents and others may w ell see statistical comparisons as a primary way for them to gauge school programs and c hildren's education, or as a way to advance specific interests. For example, parents of students with disabilities and disability advocates face real quandaries over accountability. On the one hand, high-stakes testing has created incentives for segregating students (McGill -Franzen and Allington 1993). On the other hand, the national rhetoric emphasizing achie vement for all students has provided a lever to criticize the omission of students with di sabilities from assessment systems, to craft
23 of 33new federal law encouraging inclusion in assessment and to create guidelines for state officials seeking to change assessment practices (T hurlow, Elliott, Ysseldyke, and Erickson 1996; also visit the National Center on Educational Outcomes site at http://www.coled.umn.edu/NCEO/ ). This dilemma is rooted in the tension between wa nting to protect students with disabilities from the dele terious consequences of high-stakes testing and yet also wanting whatever accountability system s exist to pay attention to their interests. Those criticizing statistical accountability system s must understand this and similar dilemmas of parents and advocates. Changing attitud es and assumptions, while protecting what many see as important in statistical accountab ility, requires modeling of worthwhile alternatives and small-scale demonstrations that ar e explicitly political. Over time, if not immediately, schools need a plausible, fair way to evaluate school improvement. With enough local models of alternative accountability, then perhaps the dynamics of educational politics at state and national levels can change to become broader, connect with classroom practices, and require more than sound bites. Witho ut those concrete examples, however, the domination of crude statistical evaluation of schoo ls will continue, to the detriment of schools, children, their families, and the public. Return to Table of Contents Acknowledgements I am indebted to Gene Glass and anonymous revi ewers at Education Policy Analysis Archives Brian Balogh, Douglas Fuchs, and the students in my masters history of education class for contributions to my thinking about this s ubject. I am solely responsible, of course, for the interpretations here and the accuracy of de tail.NotesI mean by standardized tests those administered in whole-group settings with quantifiable results. These include multiple-choice tests and also performance-based tests whose results are reportable in quantifiable terms. Thus, Advanced Placement exams conducted by Educational Testing Service are standardized tests for the purposes of this article because, even though parts of the test are performance-based (such as essays), the essays are scored by a quanti fiable rubric system and the whole test reported on the company's 1-5 scale for such t ests. Moreover, reporting scores by numbers allows the simplified public discussion whi ch is my focus here. For an introduction to Lauren Resnick's advocacy of measur ement-driven reform, see Simmons and Resnick (1993). For issues involved in Kentucky and Arizona (respectively), see Jones and Whitford (1997) and N oble and Smith (1994). 1. An anonymous reviewer noted that the line between p ractice and political legacies is fuzzy. In many ways, the debates over census underc ounting and the consumer price index are also debates about the political rhetoric of the reapportionment process and future support for government entitlement programs. Nonetheless, the distinction between the two legacies is a useful heuristic devi ce for explaining why the literature on perverse incentives of high-stakes testing does not address the critical issue of school politics. 2. According to the Vanderbilt Television News Archive s, the following broadcasts discussed standardized test score levels between 19 68 and 1987: October 28, 1975 3.
24 of 33(ABC, CBS); November 17, 1975 (CBS); August 23, 197 7 (ABC, CBS); August 24, 1977 (ABC, commentary); September 1, 1977 (CBS, com mentary); September 21, 1982 (CBS); September 19, 1984 (ABC); January 9, 19 85 (NBC); January 26, 1985 (CBS, NBC); September 22, 1987 (NBC). The search te rms included "standardized and test*," "test and scor*," "SAT and scor*," and "SAT and (college or scholastic)." Excluded from this list are stories about the alleg ed discriminatory nature of tests. I agree with Stedman (1996a, 1996b) that schools ar e not as good as they should be. Those problems do not mean the myth of declining qu ality is true: schools have been inconsistent and too often mediocre for many years. 4.ReferencesNote: Links here use a second browser window.Abbott v. Burke 1997. 149 N.J. 145. Retrieved September 27, 1997 from the World Wide Web: http://www-camlaw.rutgers.edu/decisions/supreme/m-6 22-96.opn.html Agostini v. Felton 1997. 117 U.S. 1997. Retrieved September 27, 1997 from the World Wide Web: http://supct.law.cornell.edu/supct/html/96-552.ZS.h tml Axinn, June, and Mark J. Stern. 1988. Dependency and poverty: Old problems in a new world Lexington, Mass: Lexington Books. Balogh, Brian. 1991a. Chain reaction: Expert debate and public participat ion in American commercial nuclear power, 1945-1975 New York: Cambridge University Press. Balogh, Brian. 1991b. Reorganizing the organization al synthesis: Reconsidering modern American federal-professional relations. Studies in American Political Development 5 (1): 119-172.Berliner, David C., and Bruce J. Biddle. 1995. The manufactured crisis: Myths, fraud, and the attack on America's public schools Reading, Mass.: Addison-Wesley. Berliner, David C., and Bruce J. Biddle. 1996. Maki ng molehills out of molehills: Reply to Lawrence Stedman's review of The manufactured crisis Educational Policy Analysis Archives 4 (February 26). Retrieved September 27, 1997, fro m the World Wide Web: http://olam.ed.asu.edu/epaa/v4n3.html Bernauer, James A., and Katherine Cress. 1997. How school communities can help redefine accountability assessment. Phi Delta Kappan 79 (September): 71-75. Bock, R. Darrell, and Richard Wolfe. 1996. Audit and review of the Tennessee Value-Added Assessment System (TVAAS): Final report Nashville, Tenn.: Tennessee Comptroller of the Treasury.Bodenheimer, Thomas. 1996. HMO backlash -righteou s or reactionary? New England Journal of Medicine 335: 1601-1604. Bracey, Gerald W. 1991. Why can't they be like we w ere? Phi Delta Kappan 73: 104-17.
25 of 33Bracey, Gerald W. 1992. The second Bracey report on the condition of public education. Phi Delta Kappan 74: 104-8, 110-17. Bracey, Gerald W. 1993. The third Bracey report on the condition of public education. Phi Delta Kappan 75: 104-12, 114-18. Bracey, Gerald W. 1994. The fourth Bracey report on the condition of public education. Phi Delta Kappan 76: 115-27. Bracey, Gerald W. 1995a. The fifth Bracey report on the condition of public education. Phi Delta Kappan 77: 149-60. Bracey, Gerald W. 1995b. Final exam: A study of the perpetual scrutiny of Am erican education Bracey, Gerald W. 1996. The sixth Bracey report on the condition of public education. Phi Delta Kappan 78: 127-38. Bracey, Gerald W. 1997. The seventh Bracey report o n the condition of public education. Phi Delta Kappan 79: 120-36. Calhoun, Daniel Hovey. 1973. The intelligence of a people Princeton, N.J.: Princeton University Press.Cannell, John Jacob. 1989. How public educators cheat on standardized achievem ent tests: The "Lake Wobegon" report ERIC Reproduction Document No. ED 314 454. Cohen, Sol, ed. 1973. Education in the United States: A documentary histo ry New York: Random House.Darling-Hammond, Linda. 1988. The futures of teachi ng. Educational Leadership 46 (November): 4-10.Darling-Hammond, Linda. 1992. Educational indicator s and enlightened policy. Educational Policy 6 (September): 235-65. Darling-Hammond, Linda, and Carol Ascher. 1991. Creating accountability in the big city school systems Urban Diversity Series No. 102. New York: Nationa l Center for Restructuring Education, Schools, and Teaching. ERI C Reproduction Document No. ED 334 339.Darling-Hammond, Linda, Arthur E. Wise, and Stephen P. Klein. 1995. A license to teach: Building a profession for 21st-century schools Boulder, Colo.: Westview Press. Delpit, Lisa. 1995. Other people's children: Cultural conflict in the c lassroom New York: The New Press.Diegmueller, Karen, and Millicent Lawton. 1996. The road not taken. Education Week (April 24). Retrieved September 27, 1997, from the World Wide Web: http://www.edweek.com Douglas, Davison M. 1995. Reading, writing, and race: The desegregation of th e Charlotte
26 of 33schools Chapel Hill, N.C.: University of North Carolina P ress. Education Commission of the States. 1997. Accountability: State Policies Denver: Education Commission of the States. Retrieved Septe mber 27, 1997 from the World Wide Web: http://www.ecs.org/ecs/24aa.htm Educational Improvement Act. 1992. Tennessee Acts C hapter 353. Educational Improvement Acts Amendments. 1997. Tenn Acts Chapter 434. Retrieved September 27, 1997 from the World Wide Web: http://www.legislature.state.tn.us/bills/100gahtm/1 00_chap/pubc0434.htm Fine, Michelle. 1991. Framing dropouts: Notes on the politics of an urban public high school Albany, N.Y.: State University of New York Press. Finn, Michael. 1997. State school systems bill pass es Senate. Chattanooga Free Press (May 31).Fisher, Thomas H. 1996. A review and analysis of the Tennessee Value-Added Assessment System: Part II Nashville, Tenn.: Comptroller of the Treasury. Fliegel, Sy. 1993. Miracle in East Harlem: The fight for choice in pub lic education New York: Times Books.Friedson, Eliot. 1984. Are professions necessary? I n The authority of experts edited by Thomas L. Haskell (pp. 3-27). Bloomington, Ind.: In diana University Press. Fuchs, Douglas, and Lynn S. Fuchs. 1995. Special ed ucation can work. In Issues in educational placement: Students with emotional and behavioral disorders (pp. 363-77), edited by James M. Kauffman, John W. Lloyd, Daniel P. Hallahan, and Terry A. Astuto. Hillsdale, N.J.: Erlbaum.Furstenberg, Frank. 1991. As the pendulum swings: T eenage childbearing and social concern. Family Relations 40: 127-38. Galambos, Louis. 1970. The emerging organizational synthesis in modern American history. Business History Review 44: 279-290. Ginsburg, Alan L, Jay Noell, and Valene White Plisk o. 1988. Lessons from the wall chart. Educational Evaluation and Policy Analysis 10 (Spring): 1-12. Glass, Gene V 1990. Using student test scores to ev aluate teachers. In The new handbook of teacher evaluation (pp. 229-40), edited by Jason Millman and Linda Da rling-Hammond. Newbury Park, Calif.: Sage Publications.Haefele, Donald L. 1992. Evaluating teachers: An al ternative model. Journal of Personnel Evaluation in Education 5: 335-45. Hess, G. Alfred. 1991. School restructuring, Chicago style Newbury Park, Calif.: Corwin Press.Hillsborough County Public Schools. 1997. Our students' test scores reflect academic
27 of 33achievement Tampa, Fla.: Office of Communications and Governm ental Relations. Hoff, David J. 1997. White House, GOP craft agreeme nt on testing. Education Week (November 12): 1, 23.Holmes Group. 1986. Tomorrow's teachers East Lansing, Mich.: Holmes Group. How midstate schools stack up. 1994a. Nashville Tennessean (October 15): 4A. How midstate schools stack up. 1994b. Nashville Tennessean (October 16): 6A-7A. Johnston, Robert C. 1997. Just saying no. Education Week (April 9). Retrieved September 27, 1997, from the World Wide Web: http://www.edweek.com Jones, Ken, and Betty Lou Whitford. 1997. Kentucky' s conflicting reform principles: High-stakes school accountability and student perfo rmance assessment. Phi Delta Kappan 79: 276-81.Jones, Lyle V. 1996. A history of the National Asse ssment of Educational Progress and some questions about its future. Educational Researcher 25 (October): 15-22. Justiz, Manuel J., and Marilyn C. Kameen. 1988. Inc reasing the representation of minorities in the teaching profession. Peabody Journal of Education 66 (Fall): 91-100. Kaestle, Carl F. 1973. Joseph Lancaster and the monitorial school system: A documentary history New York: Teachers College Press. Katz, Michael B. 1992. Chicago school reform as his tory. Teachers College Record 94: 56-72.Katznelson, Ira, and Margaret Weir. 1985. Schooling for all: Class, race, and the decline of the democratic ideal New York: Basic Books. Kirst, Michael W., and Christopher Mazzeo. 1996. Th e rise, fall, and rise of state assessment in California, 1993-1996. Paper presented at the An nual Meeting of the American Educational Research Association. ERIC Reproduction Document Number ED 397 133. Klausnitzer, Dorris. 1996. Pupils need practice on that first "R." Nashville Tennessean (November 15): 1A, 8A-10A. The Tennessean 's statistical analysis retrieved September 27, 1997 from the World Wide Web: http://www.tennessean .com/schools/ (defunct link) Koretz, Daniel. 1992a. NAEP and national testing: I ssues and implications for educators. NASSP Bulletin 76 (September): 30-40. Koretz, Daniel. 1992b. What happened to test scores and why? Educational Measurement: Issues and Practice 11 (Winter): 7-11. Koretz, Daniel, and Edward Diebert. 1993. Interpretations of National Assessment of Educational Progress (NAEP) anchor points and achie vement levels by the print media in 1991 Santa Monica, Calif.: The RAND Corporation. ERIC Document Reproduction No. ED 367 683.Kozol, Jonathan. 1991. Savage inequalities: Children in America's schools New York:
28 of 33Crown Publishers.Kyle, Charles, and Edward Kantowicz. 1991. Bogus st atistics. Latino Studies Journal 2 (May): 34-52.Labaree, David F. 1988. The making of an American high school: The credenti als market and the Central High School of Philadelphia, 1838-1 939 New Haven, Conn.: Yale University Press.Labaree, David F. 1992. Power, knowledge, and the r ationalization of teaching: A genealogy of the movement to professionalize teaching. Harvard Educational Review 62: 123-54. Lawton, Millicent. 1997. Riley delays national test s' development. Education Week (October 1). Retrieved October 2, 1997, from the World Wide Web: http://www.edweek.com Lipsky, Michael. 1980. Street-level bureaucracy: Dilemmas of the individua l in public service New York: Russell Sage Foundation. Madaus, George F. 1988. The distortion of teaching and testing: high-stakes testing and instruction. Peabody Journal of Education 65 (Spring): 29-46. Madaus, George F. 1991. The effects of important te sts on students. Phi Delta Kappan 73: 226-31.Maeroff, Gene. 1976. Aptitude test lag is puzzling experts. New York Times (September 12): 17.McDonnell, Lorraine M. 1997. The politics of state testing: Implementing new stu dent assessments National Center for Research on Evaluation, Stand ards, and Student Testing (CRESST) Technical Report 424. Los Angeles, Calif.: CRESST. Retrieved November 14, 1997, from the World Wide Web: http://cresst96.cse.ucla.edu/Reports/tech424.pdf McGill-Franzen, Anne, and Richard L. Allington. 199 3. Flunk'em or get them classified: The contamination of primary grade accountability d ata. Educational Researcher 22 (January-February): 19-22.McGrew, Kevin S., Mike L. Vanderwood, Martha L. Thu rlow, and James E. Ysseldyke. 1995. Why we can't say much about the status of stu dents with disabilities during education reform. NCEO Synthesis Report 21. Minneapolis, Minn .: National Center on Educational Outcomes. ERIC Reproduction Document Number ED 396 475. Mears, Walter R. 1997. Infighting breaks out over c ensus. CNN News (October 2). Retrieved October 2, 1997, from the World Wide Web:http://www.allpolitics.com/1997/10/02/ap/census (de funct link) Meier, Deborah. 1995. The power of their ideas: Lessons for America from a small school in Harlem Boston: Beacon Press. Meier, Deborah. 1997. How our schools could be: Sta ndards, top-down mandates, and grass-roots communities. Rethinking Schools 11 (Summer): 8-9. Merrow, John. 1997. Testing . testing . testing Public Broadcasting System. Script
29 of 33retrieved September 27, 1997 from the World Wide We b: http://www.pbs.org/merrow/tttscript.txt Miller, Matthew. 1997. Surprise! National school st andards exist. U. S. News and World Report (November 17). Retrieved December 23, 1997, from t he World Wide Web: http://www.usnews.com/usnews/issue/971117/17stan.ht m Milliken v. Bradley 1974. 418 U.S. 717. Missouri v. Jenkins 1995. 515 U.S. 70. Retrieved September 27, 1997, from the World Wide Web: http://supct.law.cornell.edu/supct/html/93-1823.ZS. html National Center on Educational Outcomes. 1994. Educational accountability for students with disabilities NCEO Policy Directions Number 3. Minneapolis, Min n.: National Center on Educational Outcomes. ERIC Reproduction Document No. ED 378 775. Retrieved November 19, 1997, from the World Wide Web: http://www.coled.umn.edu/NCEO/OnlinePubs/Policy3.ht ml National Commission on Excellence in Education. 198 3. A nation at risk Washington, D.C.: Government Printing Office. Retrieved Septemb er 27, 1997, from the World Wide Web: http://www.ed.gov/pubs/NatAtRisk/index.html Neill, Monty. 1996. Assessment reform at crossroads Education Week (February 21). Retrieved September 27, 1997, from the World Wide W eb: http://www.edweek.com Noble, Audrey J., and Mary Lee Smith. 1994. Old and new beliefs about measurement-driven reform: "Build it and they will come." Educational Policy 8 (June): 111-36.Ohio Association of Independent Schools v. Goff 1996. 92 F.3d 419 (6th Cir.). Retrieved September 27, 1997 from the World Wide Web: http://www.ljextra.com/cgi-bin/f_cat?prod/ljextra/d ata/external/1996/08/9608012.c06 Orfield, Gary. 1969. The reconstruction of southern education: The schoo ls and the 1964 Civil Rights Act New York: John Wiley & Sons. Orfield, Gary. 1993. The growth of segregation in American schools: Chan ging patterns of separation and poverty since 1968 Alexandria, Va.: National School Boards Assocatio n, Council of Urban Boards of Education.Orfield, Gary, Susan E. Eaton, and the Harvard Proj ect on School Desegregation. 1996. Dismantling desegregation: The quiet reversal of Br own v. Board of Education New York: New Press.Peterson, Bob. 1997. We need a new vision of teache r unionism. Rethinking Schools 11 (Summer): 1, 3-5.Peterson, Kenneth D. 1995. Teacher evaluation: A comprehensive guide to new di rections and practices Thousand Oaks, Calif.: Corwin Press, Inc. Ravitch, Diane. 1983. The troubled crusade: American education, 1945-1980 New York: Basic Books.
30 of 33Ravitch, Diane. 1995. National standards in American education: A citizen 's guide Washington, D.C.: Brookings.Reese, William J. 1995. The origins of the American high school New Haven, Conn.: Yale University Press.Rose, Lowell C., Alec M. Gallup, and Stanley M. Ela m. 1997. The 29th annual Phi Delta Kappa/Gallup poll of the public's attitudes toward the public schools. Phi Delta Kappan 79 (September): 41-56.Roush, Wade. 1996. A census in which all Americans count. Science 274: 713-14. Ruggles, Patricia. 1990. Drawing the line: Alternative poverty measures and their implications for public policy Washington, D.C.: Urban Institute Press. Sanders model to measure "value added." 1991. TEA News 22 (May): 5. Sanders, William L., and Sandra P. Horn. 1994. The Tennessee Value-Added Assessment System (TVAAS): Mixed-model methodology in educatio nal assessment. Journal of Personnel Evaluation in Education 8: 299-311. Sapon-Shevin, Mara. 1993. Gifted education and the protection of privilege: Breaking the silence, opening the discourse. In Beyond silenced voices: Class, race, and gender in United States Schools (pp. 25-44), edited by Lois Weis and Michelle Fine Albany, N.Y.: State University of New York Press.Schell, Jonathan. 1975. The time of illusion New York: Knopf. Shepard, Lorrie A. 1991. Will national tests improv e student learning? Phi Delta Kappan 73: 232-38.Simmons, Warren, and Lauren Resnick. 1993. Assessme nt as the catalyst of school reform. Educational Leadership 50 (February) 11-15. Sizer, Theodore R. 1992. Horace's school: Redesigning the American high scho ol Boston: Houghton Mifflin Co.Sizer, Theodore R. 1995. Will national standards an d assessments make a difference? In Debating the future of American education: Do we ne ed national standards and assessments? (pp. 33-39), edited by Diane Ravitch. Washington, D.C.: Brookings Institution.Skocpol, Theda. 1991. Targeting within universalism : Politically viable policies to combat poverty in the United States. In The urban underclass (pp. 411-436), edited by Christopher Jencks and Paul E. Peterson. Washington, D.C.: Broo kings Institution. Smith, Mary Lee. 1991. Put to the test: The effects of external testing on teachers. Educational Researcher 20 (June-July): 8-11. Smith, Mary Lee, and Claire Rottenberg. 1991. Unint ended consequences of external testing in elementary schools. Educational Measurement: Issues and Practice 10 (Winter): 7-11.
31 of 33Spring, Joel. 1989. The sorting machine revisited: National educational policy since 1945 (updated ed.). New York: Longman.Starr, Paul. 1982. The social transformation of American medicine New York: Basic Books. Starr, Paul. 1987. The sociology of official statis tics. In The politics of numbers (pp. 7-57), edited by Paul Starr and William Alonso. New York: Russell Sage Foundation. Stedman, Larence A. 1996b. Respecting the evidence: The achievement crisis remains real. Educational Policy Analysis Archives 4 (April 4). Retrieved September 27, 1997 from the World Wide Web: http://olam.ed.asu.edu/epaa/v4n7.html Stedman, Lawrence A. 1996a. The achievement crisis is real: A review of The manufactured crisis Educational Policy Analysis Archives 4 (January 23). Retrieved September 27, 1997 from the World Wide Web: http://olam.ed.asu.edu/epaa/v4n1.html Strike, Kenneth A. 1990. The ethics of educational evaluation. In The new handbook of teacher evaluation (pp. 356-73), edited by Jason Millman and Linda Da rling-Hammond. Newbury Park, Calif.: Sage Publications.Tennessee Department of Education. 1996. 21st Centu ry Report Card. Retrieved September 27, 1997 from the World Wide Web: http://www.state.tn.us/education/rptcrd96/index.htm l Thurlow, Martha L., Judy Elliott, James E. Ysseldyk e, and Ron Erickson. 1996. Questions and answers: Tough questions about accountability s ystems and students with disabilities Minneapolis, Minn.: National Center on Educational Outcomes. ERIC Reproduction Document No. ED 404 802.Tyack, David B. 1974. The one best system: A history of American urban ed ucation Cambridge, Mass.: Harvard University Press.Tyack, David B., and Elisabeth Hansot. 1982. Managers of virtue: Public school leadership in America, 1820-1980 New York: Basic Books. Tyack, David B., and Larry Cuban. 1995. Tinkering toward utopia: A century of public school reform Cambridge, Mass.: Harvard University Press. Vinvoskis, Maris. 1988. An "epidemic" of adolescent pregnancy? Some histori cal and policy considerations New York: Oxford University Press. Weick, Karl E. 1976. Educational organizations as l oosely coupled systems. Administrative Science Quarterly 21: 1-19. Weiss, Carol H. 1988. Interview study. In Reporting of social science in the national media (pp. 21-171), edited by Carol H. Weiss and Eleanor Singer. New York: Russell Sage Foundation.Weiss, Janet A., and Judith E. Gruber. 1987. The ma naged irrelevance of federal education statistics. In The politics of numbers (pp. 363-91), edited by Paul Starr and William Alo nso. New York: Russell Sage Foundation.
32 of 33 Werum, Regina. 1997. Sectionalism and racial politi cs: Federal vocational policies and programs in the predesegregation South. Social Science History 21: 399-453. Wiebe, Robert H. 1967. The search for order, 1877-1920 New York: Hill and Wang. Will Washington cut our COLA? 1997. Solidarity (January-February). Retrieved October 1, 1997, from the World Wide Web: http://www.uaw.org/solidarity/9701/03b.html Wilson, Thomas A. 1996. Reaching for a better standard: English school insp ection and the dilemma of accountability for American public schoo ls New York: Teachers College Press. Wirth, Arthur G. 1992. Education and work for the year 2000: choices we fa ce San Francisco: Jossey-Bass.Wise, Arthur E. 1979. Legislated learning: The bureaucratization of the A merican classroom Berkeley, Calif.: University of California Press. Wise, Arthur E. 1994. Choosing between professional ism and amateurism. Educational Forum 58: 139-46. Wise, Arthur E., and Jane Leibbrand. 1993. Accredit ation and the creation of a profession of teaching. Phi Delta Kappan 75: 133-36, 154-57. Return to Table of Contents About the AuthorSherman Dorn firstname.lastname@example.org Sherman Dorn is Assistant Professor in the Departme nt of Psychological and Social Foundations at the University of South Florida. He received his Ph.D. in history at the University of Pennsylvania in 1992 based on his wor k on the history of dropout policies. He is currently looking at the history of special educ ation in Nashville, Tennessee, from 1940 to 1990.Copyright 1998 by the Education Policy Analysis ArchivesThe World Wide Web address for the Education Policy Analysis Archives is http://olam.ed.asu.edu/epaa General questions about appropriateness of topics o r particular articles may be addressed to the Editor, Gene V Glass, email@example.com or reach him at College of Education, Arizona State University, Tempe, AZ 85287-2411. (602-965-26 92). The Book Review Editor is Walter E. Shepherd: firstname.lastname@example.org The Commentary Editor is Casey D. Cobb: email@example.com
33 of 33 EPAA Editorial Board Michael W. Apple University of Wisconsin Greg Camilli Rutgers University John Covaleskie Northern Michigan University Andrew Coulson firstname.lastname@example.org Alan Davis University of Colorado, Denver Sherman Dorn University of South Florida Mark E. Fetler California Commission on Teacher Credentialing Richard Garlikov email@example.com Thomas F. Green Syracuse University Alison I. Griffith York University Arlen Gullickson Western Michigan University Ernest R. House University of Colorado Aimee Howley Marshall University Craig B. Howley Appalachia Educational Laboratory William Hunter University of Calgary Richard M. Jaeger University of North Carolina--Greensboro Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Rocky Mountain College Dewayne Matthews Western Interstate Commission for Higher Education William McInerney Purdue University Mary P. McKeown Arizona Board of Regents Les McLean University of Toronto Susan Bobbitt Nolen University of Washington Anne L. Pemberton firstname.lastname@example.org Hugh G. Petrie SUNY Buffalo Richard C. Richardson Arizona State University Anthony G. Rud Jr. Purdue University Dennis Sayers University of California at Davis Jay D. Scribner University of Texas at Austin Michael Scriven email@example.com Robert E. Stake University of Illinois--UC Robert Stonehill U.S. Department of Education Robert T. Stout Arizona State University
1 of 2 Contributed Commentary on Volume 6 Number 1: Dorn The Political Legacy of School Accountability Systems 13 January 1998 Craig B. Howley Appalachia Educational Laboratory firstname.lastname@example.org Search as one may, one encounters redundant libraries of treatises and guides on test uses and even misuses--including uses in po licy. I had searched in vain for a critical, contextualized, discussion of high-stakes testing. It seemed merely to be an accepted fact of life. A critical reading of the hi story and distortions of accountability-as-performance-testing apparently di d not exist, until the appearance of this article. Sherman Dorn has done education a good turn with his analyses in "The Political Legacy of School Accountability Syst ems." Special virtues in this piece include Dorn's clear-headedness about profess ionalism, sensitivity toward the history and formation of educational discourse, the importance of community context and diversity, and, generally, a steadfast refusal to see in history an inevitable progress. Dorn's work will be especially helpful in the prep aration of an article about accountability in the rural context, which a collea gue and I are just beginning. Accountability, it seems, is needed because schools have become so remote from their publics, and the social construction known as the public is itself losing coherence. Rural schools are allegedly very close t o their "communities" (their public). Widespread evidence for this claim is much thinner than one would suspect, but in most rural schools, faculty and staff are ne arly all local people who interact continually with one another in social and civic en counters outside the school walls. Perhaps this sort of informal phenomenon is what co nstitutes the oversight for which accountability schemes are intended (unconsciously, of course, in the minds of the framers of such schemes) to substitute. If this is so, the substitution is particularly unsuited to the terrain of rural existence. Dorn is especially to be thanked, as well, for not demonizing tests. Standardized, norm-referenced tests are both the pr oducts, and the poor innocent victims of the technocratic worldview. They are not going away anytime soon, and they can be theoretically helpful in understanding the pattern of a child's accomplishments. Dorn notes the utility of some of these tests for parents of special needs kids; the truth is that most parents could pr ofitably take a similar interest and discover a similar utility. Most teachers of my acq uaintance do not, however, find aggregate classroom or school results particularly helpful. They understand the game
2 of 2and they are cynical, widely. The one usage for which norm-referenced tests, amo ng the gamut of all "standardized" instruments, exhibit wondrous utilit y is quantitative research. But, of course, bureaucrats, politicians, and government fu nctionaries (a.k.a. "policy makers") have even less respect for researchers tha n for teachers. More's the pity; but this is a very useful article for those with the institutional leisure t o write and think about schools.
1 of 1 Contributed Commentary on Volume 6 Number 1: Dorn The Political Legacy of School Accountability Systems 8 January 1998 Lyle V. Jones University of North Carolina email@example.com Sherman Dorn presents compelling reasons why we mus t attend to the political legacy of educational reform when thinkin g about the pros and cons of national school achievement tests. He emphasizes th e ambiguities that surround the targets for accountability: public schools as insti tutions, teachers and school administrators, students, public policy, etc., many of which cannot be well served by a single index. Dorn's case is bolstered substantially by a case s tudy of the (failed) efforts in Great Britain, following the passage of the Educati on Reform Act of 1988, to establish mandatory national tests (see Black, 1994 ). Recently, I have argued that we are failing to adequately consider the lessons from Britain and from other sources as the nation continues to move towards a program for national testing (Jones, 1997a, 1997b ). Dorn reminds us that an apparently objective notio n of a statistical accountability system serves to divert discussion f rom the purposes of schools and the means by which those purposes may be fulfilled. He correctly concludes that there is high risk that "the domination of crude st atistical evaluation of schools will continue, to the detriment of schools, children, th eir families, and the public." References Black, P. J. (1994). Performance assessment and acc ountability: The experience in England and Wales. Educational Evaluation & Policy Analysis 16, 191-203. Jones, L. V. (1997a). National tests and education reform: Are they compatible? [On-line]. Available: HTTP://www.ets.org/research/pic/jones.html Jones, L. V. (1997b). National standards, Yes; nati onal tests, No. [On-line]. Available: http://ericae2.educ.cua.edu//ft//nattest/oped5.htm