xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c19989999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00101
Educational policy analysis archives.
n Vol. 6, no. 10 (May 22, 1998).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c May 22, 1998
Educational standards and the problem of error / Noel Wilson.
Arizona State University.
University of South Florida.
t Education Policy Analysis Archives (EPAA)
xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 6issue 10series Year mods:caption 19981998Month May5Day 2222mods:originInfo mods:dateIssued iso8601 1998-05-22
1 of 4 Education Policy Analysis Archives Volume 6 Number 10May 22, 1998ISSN 1068-2341 A peer-reviewed scholarly electronic journal. Edit or: Gene V Glass Glass@ASU.EDU. College of Education Ari zona State University,Tempe AZ 85287-2411 Copyright 199 8, the EDUCATION POLICY ANALYSIS ARCHIVES.Permission is hereby granted to copy any a rticle provided that EDUCATION POLICY ANALYSIS ARCHIVES is credited and copies are not sold. Educational Standards and the Problem of Error Noel Wilson School of Education The Flinders University of South Australia Abstract This study is about the categorisation of people in educational settings. It is clearly positioned from the perspective of the person categ orised, and is particularly concerned with the violations involved when the error components o f such categorisations are made invisible. Such categorisations are important. The study estab lishes the centrality of the measurement of educational standards to the product ion and control of the individual in society, and indicates the destabilising effect of doubts about the accuracy of such categorisations. Educational measurement is based on the notion of e rror, yet both the literature and practice of educational assessment trivialises that error. The study examines in detail how this trivialisation and obfuscation is accomplished In particular the notion of validity is examined an d is seen to be an advocacy for the examiner, for authority. The notion of invalidity h as therefore been reconceptualised in a way that enables epistemological and ontological sl ides, and other contradictions and confusions to be highlighted, so that more genuine estimates of categorisation error might be specified. Contents Part 1: Positioning Chapter 1: Positioning the study: content and metho dology Chapter 2: Positioning the writer: experience Chapter 3: Positioning the writer: philosophy and v alue Part 2: Context
2 of 4 Chapter 4: Power relations Chapter 5: Power relations in educational settings Chapter 6: Standards, myth and ideology Part 3: Tools of analysis Chapter 7: Four frames of reference Chapter 8: Equity, frames and hierarchy Chapter 9: Instrumentation Chapter 10: Comparability Chapter 11: Rank orders and standards Chapter 12: An inquiry into quality Part 4: Error analysed Chapter 13: Four faces of error Chapter 14: What do tests measure? Chapter 15: The psychometric fudge Chapter 16: Validity and reliability Part 5: Synthesis Chapter 17: Error and the reconceptualising of inva lidity Part 6: Application Chapter 18: Competencies, the great pretender Chapter 19: National tests and university grades Part 7: Concluding statement Chapter 20: Out of the fog References Acknowledgments I wish to acknowledge the help of staff and student s at the Flinders Institiute for the Study of Teaching for their help, support, enco uragement, stimulation and companionship over the past three years. In particular I want to acknowledge the support of my supervisor, John Smyth, for his courage for accepting me as a student in the fi rst place, for his clear and incisive help when I asked for it, and sometimes when I didn't, a nd most importantly for showing me that there are still persons working in hierarchica l systems who have been able to maintain their integrity in their search for truth and justice. About the AuthorNoel Wilson is an ex-teacher, researcher, writer who has now o fficially retired and lives in the Adelaide Hills in South Australia. He still writes stories and novels which search in vain for publishers. He is deliighted that his m ind works better now at seventy than it did at forty. Every now and then he has a little fo ray back into the educational field. He is a long odds optimist because he believes that so oner or later schools will get better. And he'd be pleased to engage in dialogue about thi s thesis. For more specifics about the author, read Chapter 2.He can be reached at firstname.lastname@example.org.
3 of 4 Copyright 1998 by the Education Policy Analysis ArchivesThe World Wide Web address for the Education Policy Analysis Archives is http://olam.ed.asu.edu/epaa General questions about appropriateness of topics o r particular articles may be addressed to the Editor, Gene V Glass, email@example.com or reach him at College of Education, Arizona State University, Tempe, AZ 85287-2411. (602-965-26 92). The Book Review Editor is Walter E. Shepherd: firstname.lastname@example.org The Commentary Editor is Casey D. Cobb: email@example.com .EPAA Editorial Board Michael W. Apple University of Wisconsin Greg Camilli Rutgers University John Covaleskie Northern Michigan University Andrew Coulson firstname.lastname@example.org Alan Davis University of Colorado, Denver Sherman Dorn University of South Florida Mark E. Fetler California Commission on Teacher Credentialing Richard Garlikov email@example.com Thomas F. Green Syracuse University Alison I. Griffith York University Arlen Gullickson Western Michigan University Ernest R. House University of Colorado Aimee Howley Marshall University Craig B. Howley Appalachia Educational Laboratory William Hunter University of Calgary Richard M. Jaeger University of North Carolina--Greensboro Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Green Mountain College Dewayne Matthews Western Interstate Commission for Higher Education William McInerney Purdue University Mary P. McKeown Arizona Board of Regents Les McLean University of Toronto Susan Bobbitt Nolen University of Washington Anne L. Pemberton firstname.lastname@example.org Hugh G. Petrie SUNY Buffalo Richard C. Richardson Arizona State University Anthony G. Rud Jr. Purdue University Dennis Sayers University of California at Davis Jay D. Scribner University of Texas at Austin
4 of 4 Michael Scriven email@example.com Robert E. Stake University of Illinois--UC Robert Stonehill U.S. Department of Education Robert T. Stout Arizona State University
1 of 9Part 1: PositioningChapter 1: Positioning the study: content and metho dology Chapter 2: Positioning the writer: experienceChapter 3: Positioning the writer: philosophy and v alue nrrr The project grew out of a general critique of asses sment theory and practices, and in particular of the way in which the notion of err or in measurement is obfuscated. The fundamental research question that informed thi s study is: How is error in measurement of standards obscured i n most practical events involving assessment of persons? The study that subsequently developed Clearly positions the writer in terms of the experi ence, philosophy and values that he brings to this study. Develops some tools of analysis of the educational assessment process that enables a more stringent critique of the nature and extent of error in the measurement of standards. Establishes the centrality of the notion of the edu cational standard to the categorisation, production and control of the indiv idual in society. Shows how the professional literature on educationa l measurement is based on the notion of error, and at the same time trivialises t hat notion. Re-examines some of the fundamental assumptions of educational assessment generally and psychometrics in particular. Indicate s some of their most blatant self-contradictions and fudges. Reconceptualises the notion of invalidity, and posi tions the field of educational categorisation here, from the perspective of the ex amined, rather than with validity, which is an advocacy for the examiner. Applies some of this analysis to a study of compete ncy standards in general, and in particular University grades, and national liter acy testing as developed in the Australian context during the 1990s. As can be seen, the initial research question has g enerated action as well as understanding, a tool to repair the damage resultin g from the critique, and a way to reduce some of the violence it implies. Relevant Literature The relevant literature is extensive as well as int ensive, as the Bibliography shows. The extensiveness was necessary, as many of the misconceptions and fudges and
2 of 9contradictions that characterise the field of educa tional assessment have been caused by a myopia regarding knowledge outside the arbitrary boundaries within which the field encloses itself. Within the field of educational measurement the cri tical studies which most overlap mine are: in the United Kingdom, Hartog & R hodes (1936), Cox (1965); in the United States, Hoffman (1964), Nairn (1980), Airasi an (1979), and Glass (1978); in Australia, Rechter & Wilson (1968). The Hartog & Rhodes study clearly showed the enormo us instability of the measurement of standards in Public Examinations in England. The sneakiness of some of the research techniques in no way detracts from the dramatic incisiveness of the data. Cox did a similar job and ended up with a similar h orror story on measurements of University grades. Hoffman directed his critical at tention to the detail of multiple choice testing. Nairn's critique of the work of Educationa l Testing Service, and in particular the part it plays in College Entrance, is devastating i n its implications. Airasian's book is a comprehensive critique of competency testing. Glass attacks the measurement of standards at its most vulnerable point; there are n o standards, or at least none that psychometrics can produce. And Rechter & Wilson's s tudy indicates the confusion about how to reduce error that accompanies public examini ng in Australia. On the other hand, most of the literature on reliab ility and validity is pertinent to this study, because, when its discourse is repositi oned from examiner to examined, it provides more than enough invalidity information to self destruct. Most studies of error in the measurement of standar ds are however much more specific in their focus than is mine. Their minimal effect on practice has perhaps partially been due to the fact that their critiques were in terms of their own discipline of educational measurement; a discipline that owes its very existence to the claim to accurate judgments. In terms of general style and s cope this study is perhaps closer to the work of Persig (1975; 1991), who delved, articulate ly if deviously, much more deeply into the notion of quality. Within the field of power relations and the constru ction of the individual the studies most similar are those published in Foucaul t and Education (Ball,1990), in particular those that take off from Foucault's plac ement of the examination as a central apparatus of power/knowledge. This study is significant in that it brings these t wo diverse fields of educational assessment, and the power relations that pervade ed ucation, into much closer contact, to expose their interrelations, and allow the critique to cross fertilise. Importance of the study The initial question addressed is how the whole mat ter of error in measurement of standards is obscured in most practical events invo lving assessment and measurement. This is directly related to the centrality of the n otion of the educational standard to the categorisation, production and control of the i ndividual in society. For if the notion of the standard is crucial to the maintenance of po wer relations, and its empirical realisation is prone to enormous error, then the wh ole apparatus of power/knowledge that depends on it is in jeopardy. I argue in Chapters 4 and 5 that the examination no rmalises and individualises, and is impotent without the notion of the measured standard, the sword that divides, the wedge that produces the gaps; and how important it is that these measures of standards be seen as accurate if current societal structures are to be maintained. One view of immorality is that it is behaviour that destabilises a social system. So
3 of 9if playing the game is inevitable, is questioning t he rules not so much dangerous as despicable, immoral to the point of being unthinkab le? Is this the reason for the great silence about the enormous errors in any measure of standards? Does this account for the erasure from public consciousness and discourse of the obvious fact that educational standards as a thin accurate line have no empirical existence, and attempts to measure in relation to that line no instrumental reality? In Chapters 6 to 17 thirteen sources of invalidity that contribute to the error and confusion of all categorisations of individual pers ons are detailed and elucidated, indicating how this silence in professional and pub lic consciousness might be filled with a deafening noise. In Chapters 18 and 19 of this study I apply some of the analytic tools developed to the contemporary scene in Australia, and demonstrat e how the noise may be turned into a coherent critique of practice. In 1997 competency standards, as a form of assessment, have become, and are becoming, the major credential ing instrument for both educational and vocational courses and jobs. In addition, they are now the basis for job descriptions. In defining what training is required for a job, wh at prerequisites are required to attempt a job, what the job is, and how performance on the job is to be assessed, the cycle of fantasy created by this controlled semantic reducti onism is complete; the material world of education and employment has become textualised in terms of competencies (Collins, 1993; Cairns, 1992). The fragility of this theorisi ng is exposed when examined in terms of the reconstructed notion of invalidity developed in this study. In Universities students are still categorised in t erms of grades loosely defined. What do they mean? How error prone are they? And in the schools all Australian states have agreed to introduce tests of literacy. Certain ly they will introduce tests. But what will they measure? And with what accuracy? Again th e reconstructed notion of invalidity is used to critically evaluate such ques tions. Methodology and the critique of practice The study roves beyond the artificial constraints o f psychometric theory and test practice; into ontology, epistemology and the metap hysics of quality; into the nature of instrumentation; into the relations between equity and assessment frames of reference; into the fundamental notion of comparability; into the detail of the relation between rank orders, standards and categorisations; and into the minefield of the psychometric fudge. Is there method in this diverse madness? Where is t he methodology that informs this wild profusion? The study aims to expose the m adness that underlies much of the current method. So what is a methodology that under mines methodologies? One such method is critical analysis, the analysis of the educational discourse that comprises the field of assessment. The polices and practices of educational assessment become fused in the discourse in which they are emb edded (Ball, 1994). Discourses are about what can be said, and thought, but also about who can speak, when, where and with what authority. Discour ses embody the meaning and use of propositions and words. Thus, ce rtain possibilities for thought are constructed . We do not speak a dis course, it speaks us. We are the subjectivities, the voices, the knowledge, the power relations that a discourse constructs and allows (p22). Analysis of such discourses may not be used to dete rmine the truth. Yet such analyses may be very sensitive to the uncovering of untruths, by determining the extent to which they embody "incoherencies, distortions, s tructured omissions and negations
4 of 9which in turn expose the inability of the language of ideology to produce coherent meaning" (Codd, 1988, p245).How would such untruths be established? First, by uncovering self contradictions, within th e overt discourse, or between the unstated assumptions of the discourse and the facts that the discourse establishes. Second, by exposing false claims, claims that may b e shown with empirical evidence constructed within its own frame of refere nce to be untrue. Third, by detailing some of the psychometric fudges on which many assessment claims depend to maintain their established meaning Fourth, by indicating how repositioning the discour se may dramatically change its truth value. Fifth, by establishing four discrete epistemologica l frames of reference for assessment discourse as currently constructed, and indicating the confusion when one frame is viewed from the perspectives of the ot hers. Sixth, by noticing frame shifts within a particular discourse, with the resulting confusion of meaning. Seventh, by exposing the ontological slides and epi stemological camouflages necessary to sustain many truth claims. So in this study I will substantiate the contention that some of the explicit and implicit "truths" embedded in assessment practices are falsifiable; that empirical data constructed from their own assumptions denies the a ccuracy they assume; that this data is not only adequately detailed in the literature, but further, that the notion of error is the epistemological basis of much of that literature. A ll of which makes the public silence about the presence of error even more puzzling. I shall show that the epistemological and ontologic al grounds for the whole field of assessment of individual persons are enormously shaky. I shall also explain how the literature about the very notion of validity is fou nded on a biased position, so that the sources of invalidity are much deeper and wider tha n is admitted in practice, even though clearly implied in theory and its attendant discourse. I shall indicate the complexity of the notion of in validity, with its practical face of error. Error includes all those differences in rank ordering and placement in different assessments at different times by different experts ; all the confusions and varieties of meaning attached to the "construct" being assessed; and all those variabilities arising out of logical type errors, issues of context, faulty l abelling, and problems associated with prediction. To further complicate the matter error has a different meaning depending on the assessment frame of reference. And I will show that estimates of the extent of the confusion along many of these dimensions may be eas ily estimated. This is a critical study. Foucault (1988) says: There is always a little thought even in the most s tupid institutions; there is always thought even in silent habits. Criticism is a matter of flashing out that thought and trying to change it: to show that things are not as self-evident as one believed, to see that what is a ccepted as self-evident will be no longer accepted as such. Practising criticism is a matter of making facile gestures difficult (p155). Using Foucault's terminology, this is a critical st udy designed to make facile assessment gestures about standards difficult.
5 of 9Methodology and inquiry systems After a twenty three page discussion on data and an alysis relevant to construct validation, which to Messick (1989) means all valid ation, he concludes . test validation in essence is scientific inqu iry into score meaning nothing more, but also nothing less. All of the exi sting techniques of scientific inquiry, as well as those newly emerging are fair game for developing convergent and discriminant arguments to buttress the construct interpretation of test scores (p56). I would broaden this to refer to any categorisation produced by transforming a continuity into a dichotomy. And for now I want to leave aside the obvious bias in the word "buttress," and focus here on inquiry systems themselves. For Messick (1989), conservative as he is, accepts that because observations and meanings are differentiall y theory-laden and theories are differentially value-laden, appeals to multiple perspectives on meaning and values are needed to illuminate latent assumptions and action implications in the measurement of constructs (p32) Churchman (1971), elucidates five such scientific i nquiry systems of differential values and epistemology, roughly related to philoso phies espoused by Liebniz, Lock, Kant, Hegel and Singer. Mitroff (1973) has develope d and summarised Churchman's systems. Very briefly, the Liebnizian inquiry mode begins with undefined ideas and rules of operation, ending with models that count a s explanations. The Lockean mode begins with undefined experiential elements, and us es consensual agreement to establish facts. The Kantian system shows the interdependence of the Liebnizian and Lockean modes, and uses somewhat complementary Liebnizian m odels to interrogate the same Lockian data bank, to ultimately arrive at the best model. The Hegelian mode uses antithetical models to explain the same data, leavi ng it for the decision maker to create the most appropriate synthesis for a particular pur pose. In this mode values of enquirer and decision maker become exposed. Finally, the inq uiry system of Singer (1959), is one of multiple epistemological observation, where each inquiring system is observed from the assumptions of the others, and each methodology is processed by those of the others. Churchman (1971) paraphrases Singer clearly and cle anly: "the reality of an observing mind depends on it being observed, just as the real ity of any aspect of the world depends upon observation" (p146). How do these inquiry systems link to the seven ways of demonstrating untruths, or nonsense, detailed in the previous section? It is t he Singerian inquiry mode that best characterises this study as a whole. Although parti cular modes have been utilised for particular critical purposes, this is in itself jus tified by the Singerian inquiry mode. So whilst the first three methods listed are clearl y in the Liebnizian and Lockean modes, the other four involve the explication of sh ifting sets of assumptions, and belong to the Singerian mode. In particular the examinatio n of compatibilities between the four frames of reference for assessment on the one hand, and equity definitions, power relations, instrumentation requirements, and notion s of comparabiltiy and quality on the other, demonstrate clearly that to the Singerian en quirer, "information is no longer merely scientific or technical, but also ethical as well" (Mitroff, 1973, p125). The "conversation pieces" and "stories" used to dem onstrate the absurdity of some
6 of 9assessment claims belong to the Hegelian mode. Chur chman (1971) explains: The Hegelian inquirer is a storyteller, and Hegel's thesis is that the best inquiry is the inquiry that produces stories. The u nderlying life of a story is its drama, not its "accuracy". Drama has the logica l characteristics of a flow of events in which each subsequent event partially contradicts what went before; there is nothing duller than a thoroughly c onsistent story. Drama is the interplay of the tragic and the comic; its bloo d is conviction, and its blood pressure is antagonism. It prohibits sterile classification. It is above all implicit; it uses the explicit only to emphasis e the implicit (p 178). Strategy of deterrence The general strategy used to make the case for the invalidity of most current assessment practice is borrowed from military polic ies of nuclear deterrence. It is a strategy of overkill. Of the thirteen sources of in validity developed in this study, any one would, if fully applied to current assessment pract ices, take them out, neutralise them, render them inoperable. To nullify this attack on v alidity of tests, examinations and categorisations generally, it is necessary to destr oy not one missile, but all of them. Methodology and structure of the study The study has been presented in seven parts: Positi oning, Context, Tools of Analysis, Error Analysed, Synthesis, Application, a nd a Concluding Statement. Part 1 Positioning : All descriptions of events, all writing, is positioned; makes certain assumptions, is viewed from a particular pe rspective. Part one positions the study in terms of focus and method, and the writer in ter ms of experience and philosophy. In this opening chapter I position the work in term s of its general content and methodology, and show how it all fits together. So Chapter 1 briefly summarises what the study is about, what literature is most similar in both content and style, what is the importance of the study and its possible impact, an d in this section how it is structured. In Chapter 2 I show how the study is positioned in terms of some of the learnings accrued from the professional and life experiences of the author. In Chapter 3 I indicate how the study is positioned in terms of philosophy and value, and how that relates to some contemporary li terature. Part 2 Context: Assessment involves events that o ccur in, and are given meanings in, a social context. In Part 2 I elucidat e some aspects of that context. In Chapter 4 I focus on the way power relations bot h violate and produce those who act out their lives within their influence. In particular the centrality of the examination is exposed in the production of the mod ern individual, defined as an object positioned, classified and articulated along a limi ted set of linear dimensions. In Chapter 5 the argument in Chapter 4 is applied a nd developed in terms of educational assessment. In particular I examine the crucial part that the standard plays in the whole mechanism of defining cut-offs for abnorm ality and non-acceptance, and how important it is that these standards be seen as acc urate if current societal structures are to be maintained. In Chapter 6 I focus on the cultural meanings that attach themselves to the notion of the standard, and assign the idea of the human s tandard to the mythological sphere, a place apart from critical thought. I examine the em otional intensity of discourse about the standard, its significance as an article of fai th, and how this is related to the maintenance of control and good order.
7 of 9 Part 3 Tools of analysis: In Part 3 some tools fo r looking at specific assessment events are developed. In Chapters 7 to 12 I examine four different epistemological frames of reference for assessment, and relate thes e to notions of equity, to hierarchical structures, instrumentation, comparability, rank or ders and standards, logical types, and quality. These chapters introduce some independent, fundamental, and rarely discussed aspects of underlying assumptions involved in event s culminating in the assessment of students. Inadequacies in any one of these aspects would, in a rational world, be enough to destroy the credibility of most student assessme nts. I will contend that all practical assessments of people contain major inadequacies in most of them. In Chapter 7 four different frames of reference are defined; four different and largely incompatible sets of assumptions that under lie educational assessment processes as currently practised: First is the Judges frame, recognised by its assumption of absolute truth, its hierarchical incorporation of infallibil ity; second is the General frame, embedded in the notion of error, and dedicated to t he pursuit of the true score; third is the Specific frame, which assumes that all educatio nal outcomes can be described in terms of specific overt behaviours with identifiabl e conditions of adequacy; fourth is the Responsive frame, in which the essential subjectivi ty of all assessment processes is recognised, as is their relatedness to context. Because of their contradictory assumptions, slides between frames result in confusion and compound invalidity. Chapter 8 shows how certain assessment frames are i nherently contradictory to certain definitions of equity, themselves contradic tory to each other and to the power structures in which they are enmeshed. As such, tho se assessment frames and notions of equity that contradict the enveloping hierarchical structure will be seen, accurately and probably unconsciously, as potentially destabilisin g, and will consequently be ignored, nullified, or corrupted into acceptability. Chapter 9 looks at Instrumentation. In this chapter we look at the conditions and invariances required in events involving measuring instruments if such events are to have credibility; in particular the notion of a Sta ndard that theoretically defines the scale, and its confusion with a standard of acceptability, which is to be measured by the instrument, and which requires a scale in order to be located. The various assessment modes are analysed in terms of their instrumental error. On these grounds alone all are found to be invalid. Chapter 10 takes up the issue of comparability. Wha t can be compared? Fundamental distinctions between more and less, bet ter and worse are examined their relations with uni and multi dimensionality shown, and the implications for rank ordering of students in tests and examinations unea rthed. This leads to further examination of the differential privileging of sub groups and individuals when marks are added. The essential meaninglessness of such additi ons becomes apparent. In Chapter 11 the relationship between rank order a nd standard is teased out in more detail: In particular the meanings given to th e standard in the Judge and General frames of reference; how logical confusions prolife rate when discourse jumps from one frame to the other; and how all categorisations inv olve standards and rank ordering, even though many advocates of "qualitative" assessment m ethods may want to deny this. Chapter 12 leads from the implications of the Theor y of Logical Types for assessment practices to an examination of the disti nction between standard and quality. When the standard is seen, realistically, as unable to perform its function, quality is the notion with sufficient mythical, ideological, and i ntellectual status to replace it. This would produce a very different learning milieu. Part 4 Error analysed: In Part 4 the tools develo ped in Part 3 are used to
8 of 9discriminate particular sources of confusion and er ror within assessment events designed to categorise students. In Chapter 13 the meaning of error in each frame of reference for interpreting assessments is considered. As the meaning of error changes with assessment mode, so do the methods designed to reduce such error. Proce dures to reduce error in one frame are seen to increase it in another. From a perspect ive of oversight of the whole assessment field, this is another source of confusi on and invalidity, particularly as it is rare for any practical assessment event to remain c onsistently within one frame of reference. Chapter 14 addresses the question: What does a test measure? In terms of social consequences the answer is clear. It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be n amed. The person who does the test has already accepted the name of the test and the m easure that the test makes by the very act of doing the test. So the mark becomes part of that person's story and with sufficient repetitions becomes true. My own conclusion is that tests have so many indepe ndent sources of invalidity that they do not measure anything in particular, no r do they place people in any particular order of anything. But they do place the m in an order, along a single line of "merit," and that is all they are required to do. Chapter 15 shows some of the ways in which psychome tricians fudge; by reducing criteria to those that can be tested; by prejudging validity by prior labelling; by appropriating definitions to statistical models; an d by hiding error in individual marks and grades by displaced statistical data, and imply ing that estimates are true scores. A number of specific examples of fudging are detailed In Chapter 16 some of the more recent work on valid ity is discussed, and its positioning as advocacy demonstrated. I conclude th at in practice the very existence of validity is established, validity is indeed made ma nifest, through the denseness of the arguments about invalidity criteria used to refute such existence, together with the reassurance that the battle continues, and some gai ns have been made. Reliability is also discussed as a problematic, rat her than as an obvious prerequisite to validity. I conclude that most of t he mechanisms designed to increase reliability necessarily decrease validity. Part 5 Synthesis: In Chapter 17 the notion of inv alidity is reconceptualised, having both discursive and measurable components. T hirteen (overlapping) sources of error are examined, all contributing to the essenti al invalidity of categorisations of persons. Part 6 Application: In Chapter 18 I apply the phi losophical and conceptual positioning, tools of analysis, and the reconceptua lised sources of error developed in this thesis to the competency based assessment policies and practices of Australia in the 1990s. I show how the notion of competency standard s is overtly central to the whole competency movement, the introduction of which is s hown to be overtly politically motivated. Thus the crucial links between political power and educational standards that are argued for in Chapters 3 and 4 become transpare nt. I then go on to examine the invalidity of competency standards in the light of the thirteen sources of error specified in the previous chapter. Chapter 19 presents two specific applications of in validity sources; the first relates to national literacy testing, and the second to Uni versity grades. Impact
9 of 9 Assessment practice is permeated with mythology and ideology; with confusions and contradictions; with epistemological and ontolo gical slides; with misrepresentations of frames of reference for different assessment mod es; with logical type errors and psychometric fudging, in which the constructs that determine error--labelling, construction, stability, generality, prediction--ar e either ignored or severely constrained in the determination and communication of error, in those rare cases where personal error and likely miscategorisation is publicly admi tted. I have no expectations for this study, but some hop es. A whistle blowing study is like a joke--its impact is a function of timing. An d the best timing can only be determined in retrospect. My hope is that it will l ead to a reduction of the violence that is attributable to the suppression of error in the cat egorisation of people.
1 of 10nrn As I take the epistemological position that all kno wledge is based on experience, value and reflection, and all experience is influen ced by prior knowledge, it seems important to indicate some of those life experience s that led me to the particular ontological and epistemological positions that info rm this study. To do otherwise is to infer either their universal superiority, or their complete arbitrariness. In this brief autobiographical note I outline some of those significant life experiences and concomitant learnings as they impin ge on this study. This is neither arrogance nor self-indulgence (Mykhalovskiy, 1996). For if thirty years working in the field of educational research and assessment is not relevant to this project, then either the work, or the project, or both, must surely be t rivial. Education This study has had a long gestation. Forty nine yea rs ago I sat for my matriculation examination in English. I had a choice of four essa ys, and chose one called "Examinations." I rubbished them, unwisely it seems I got a B grade which compared unfavourably with the second highest mark for Engli sh at my prestigious public school. That I'm still at it today indicates that non-confo rmity is not necessarily related to inconsistency or nonperserverence. What I learnt fr om this experience is that meaning and judgment are affected by context, and that appr opriateness is one criterIon for the recognition of quality. Two years of study in the University Engineering fa culty convinced me that I did not want to be an engineer, and left me with one in valuable legacy; on every engineering drawing the measurement of each dimension, and the limits of accuracy within which the product must be fabricated, are indicated. In p ractice, because error was inevitable, the statement of acceptable error was as important as the magnitude of the dimension. Keeping within acceptable error was a major determi nant of quality of product. This practice of indicating errors in measurement contin ued for calculations in Physics, the subject of one of my majors when I transferred to t he Science faculty. I decided to become a teacher. Moving to Education was a culture shock. I could only write scientific prose sparse and unadorned, tight and dry, logical and on the surface devoid of any emotional involvement. So wri ting two thousand word essays was a problem; I generally said all I had to say in two hundred, and regarded the rest as superfluous padding. I could state my case, but had lost my personal voice. What I learnt about assessment was at the level of "helpful hints to beginning teachers." The massive literature on educational as sessment and evaluation was then, as it is now for most teachers, unknown to me. I was t rained for survival, not for problematising tradition. I learnt what was implied The game of testing had produced me, so it couldn't be all that bad.Teaching I taught in high schools and tested students more o r less the way I'd been tested. Maybe a few less essays and considerably more short answer questions. The process was simple. I sat down, wrote some questions to compris e an examination paper, the students
2 of 10did it, I marked it, added up the marks, and then g ave them a percentage or converted it to a grade. How was it done? Easy! Was it a problem ? No! How accurate was it? Nobody, including me, ever asked! After three years I joined the Royal Australian Air Force as an Education officer, teaching some basic physics to photographers, some nuclear physics to air crew, and some instructional technique to officers. Because I was teaching it, I learnt the technology of lecturing. It was assumed I could acc urately assess all this. I averaged about six lectures a week, so they were very well p repared. With so much time, I diverted myself by writing pantomimes and musicals. I was beginning to find my voice. Two years of work at the RAAF School of Technical T raining had me writing syllabuses as well as teaching basic maths and phys ics. I talked to electrical fitters who had come back for training after two years in the f ield as electrical mechanics. None of them had used any of the eighty odd hours of mathem atics in the Mechanics course. I suggested to the administration that they save time and money by leaving out the mathematics. It was explained to me that its releva nce to work was irrelevant. It was necessary for the high level of trade classificatio n. I was beginning to understand the economic and political character of credentialing.Assessing My last year in the RAAF was spent in the trade tes ting section. Fifty item, two hour, multiple choice tests were used to credential students who had spent from three to twelve months in training programs, with hundreds o f hours of practical and theoretical assessment as part of the course. My attempts to po int out the absurdity of this were usually met with the response that it didn't matter because they just kept on doing the trade tests till they passed. I was becoming aware that in the world of work, as well as in the world of education, ritual was more important t han rationality. Teaching again Observing that the influence Education Officers had on training seemed to diminish as they were promoted, I went back to teac hing in a private coeducational high school. I found that what had taken twenty hours to teach to highly motivated technicians took five times as long to teach to sup posedly more intelligent high school students. In my second year I told the matriculatio n physics class I did not intend to teach them. Rather I would try to create an environ ment in which they could learn. I would assume they could read the syllabus and the t ext book. They worked individually or in groups, developed their own notes, devised th eir own experiments. They completed the course by the end of June, after which I agreed to give some consolidating lectures, and class time was spent doing past examination pap ers and improving answers. That, after all, was the task on which they would be judg ed. Their results in the external examination were extremely high. I had learnt to se parate the ritual of teaching from the facts of learning. Next year I tried the same process. The students re fused to cooperate. They collected notes from other schools. They insisted I teach them. After a month I had little choice. We went back to "normal" teaching methods. They got "normal" results at the end of the year. I learnt that dependency has as mu ch attraction as autonomy, for the price of autonomy is personal responsibility. Two other events were significant over this period. The first was a question asked
3 of 10by Michael, a student; What exactly is an electron? I had no idea. The question had never occurred to me. I'll let you know, I blustere d. A month and many hours of reading later, I responded. Do you remember, Michael, you a sked me what an electron is? No, he answered. I'll tell you anyway, I said, unperturbed I wrote "Properties of an electron" on the blackboard, and under that heading listed some of them. The class looked on in silence. I looked at Michael. Yeah, he said, those are its properties, but what exactly is it? Ah, I said, now that's a question you'll have t o ask the Rabbi. I had started to grapple with ontology. I was thirty years old.Writing The second involved the writing of A programmed cou rse in Physics (Wilson, 1966). This was a linear program covering year 11 a nd 12 Physics. In reviewing what I had written I was dissatisfied with the presentatio n of force field theory. Finally I wrote this part as a dialogue between a physicist and a s tudent. The result was much more satisfying in that the nature of a field in physics could be discussed as a problematic, rather then presented as a scientific conclusion. M y first excursion into epistemology required discourse rather than didactic prose to co mmunicate its meaning. Assessing again Because of my experience with multiple choice tests in the RAAF, I had been working with Australian Council for Educational Res earch on the construction of multiple choice physics tests. When a full-time pos ition came up I applied for it. For the next six years I was to work as a test constructor. I learnt a lot about the nature and mechanics and rituals of testing, about the truisms and tricks of the trade. For example, that only "items" between thirty and seventy percen t difficulty were chosen because others did not contribute economically to the separ ation of students; that seemingly almost identical questions often had very different difficulty levels; and it was almost impossible to tell, without prior testing, how diff icult a test item was. Central to the theme of this study, I also learnt, at the level of practice and praxis, the great secret about error, about the fallibility of the human judge, about the vagueness and arbitrariness of the standard. Not in that lang uage, of course. Psychometrics provides a more prophylactic discourse about marker reliability and predictive validity and generalizability. Even so, it was impossible to miss the point. Or was it? I did a course in educational measurement at a local univer sity to sharpen up my theoretical skills. We learnt the statistical theory and all th e little techniques for reducing error, like short answer questions and multiple marking. And at the end of the course--a three hour essay type examination marked by the lecturer and t hen given a grade. And nobody said a word! Even more amazing, when I raised the matter with a few of the other students, they seemed unaware of the contradiction. I was lea rning that tertiary studies do not necessarily invoke reflective critical thinking. There were two other outcomes of this experience of constructing test items that were important. The first related to the discourse, the arguments about the best answer that characterised the panel meetings. The second r elated to the values and effects of this particular testing program, and how to deal with th at (Wilson, 1970). As we got better at writing "distractors" for multi ple choice questions, we found advocates among the "expert" panel for some of the distractors as the best answer, rather than the one chosen by the test writer. Of more pot ential educational significance was
4 of 10the argumentation itself, and its effect on our abi lity to think sharply and clearly within the fields being discussed. Tests themselves can ne ver produce improvement in individual performance; but our experience suggeste d that argumentative discourse about test items could. A serendipidous piece of re search at one school confirmed this. One hundred students thus engaged for about twenty hours raised test scores on each of three multiple choice papers by half a standard dev iation, despite the ACER publications that claimed these tests could not be "taught" (Wil son, 1969). The second experience related to educational values and our attempts as "examiners" to grapple with this. None of the fulltime test constructors approved of the Commonwealth Secondary Scholarship tests as an educ ational intervention. They were a politically inspired election gimmick. We were awar e that they would have an influence on what schools taught, and possibly how they taugh t, even though they were supposed to be "curriculum free" as well as value free. As a result we took "educational value" as a major criteria for test validity, at least at the l evel of our own personal discourse. The material we chose for tests must face the question "would education be improved if teachers did try to prepare students for this sort of exercise, for answering these sorts of questions on these sorts of information or issues, for engaging in this sort of thinking and problem solving?" I was learning that no test w as value free, and that these tests were certainly informed by a (possibly idiosyncrati c) view of educational relevance. Groups During these years I also had my first experience i n unstructured groups, and experienced at first hand the power of such group i nteractions to produce major changes in social behaviour in the participants; within the microcosmic society of such groups, as they developed, there was opportunity to take risks revisit social experience, and re-construct social meanings. I learnt how powerful such groups could be in raising awareness, loosening counterproductive behaviours, and reframing experiential meanings (Slater, 1966).Research When at age forty I was appointed to head the newly established Research and Planning Branch in the SA Education Department, a p osition I held (with planning dropped half way through), for the next thirteen ye ars, my major claim to expertise was in the area of testing and assessment. The Director s never allowed this to influence their decisions about committee membership, and during my sojourn with them I was never appointed by them to any departmental committee con cerned with assessment. Nor, for that matter, am I aware of any decision made by the Department that was informed by research that the Branch carried out. When research knowledge was consistent with Departmental policy assertions it was utilised; whe n it didn't or wouldn't serve those interests it was ignored. I was learning that resea rch knowledge was an instrument of power, a weapon for rationalising decisions, rather than a springboard for rational decision making (Cohen & Grant, 1975). It was partly this insight, as well as a belief tha t my clients were students and teachers rather than administrators, that determine d that most of my own research would be concerned with classroom practice. I also notice d that most educational research dealt with special groups and special problems, leaving t he "normal" educational assumptions and practices unsullied by any critical research pr obes. So I directed most of my action research to the "average" classroom; that is, I sou ght out the commonalities of
5 of 10educational experience rather than the differences. In the first few years I spent considerable time wi th teachers looking at improving assessment practices in schools. One thing in parti cular became apparent during these discussions--that most of what I had learnt as a pr ofessional test constructor was irrelevant to the assessment issues that concerned teachers in classrooms; these were not the sort of descriptions that helped children learn better, or helped teachers teach better. When I wrote Assessment in the primary school in 1972 the then Director of Primary Education wrote a foreward in which the final parag raph stated "some people would question his suggested limitation on testing. Whate ver one's views, teachers will find the report thought provoking and valuable". In other wo rds, I disagree with him, but respect his different viewpoint. As Directors became more m anagers and less educators in the 1980s, this sort of clarity and openness, this up f ront honesty, was to become increasingly rare.Politics In 1974 a thirteen year old schoolgirl was suspende d from her high school and refused to accept the suspension on the grounds tha t it was unfair. She returned to the school and was subsequently removed forcibly by pol ice. The incident resulted in a Royal Commission, and the Royal Commissioner found that the girl and her parents were a "trinity of trouble makers". (Royal Commissi on, 1974). It was never suggested that the setting up of the Commission had anything to do with the fact that the girl's father was an endorsed labour candidate and a perso nal friend of the Minister of Education, and that the Principal of the school was the brother of the shadow Minister of Education. Nor was it ever suggested that the unite d front of the Education Department officers and secondary principals had anything to d o with the highly conflictual situation then existing between the Minister and the high sch ool principals. I thought that most of the overt conflict at the sc hool was due to communication problems between the girl and certain members of st aff, and certainly not due to the severity of the crime, which was trivial. In such c ases it seemed to me to be the job of the professional staff, not the student, to resolve the conflict. So I gave evidence on behalf of the student. I was the only member of the Department to do so. What I learnt from this episode was that the structural violence embedded in institutions is evidenced not by the severity of the punishment when rules ar e breached, but by the severity of the punishment when the sanction, whatever it is, is no t accepted. I could see that accepting any sanction reinstates the power structure; in fac t, breaking the rule enables such re-establishment to become visible, enhancing the p ower relations. But not accepting the sanction is extraordinarily threatening because it destabilises the power structure, challenging its very existence. It also became clea r to me that none of the Departmental officers, or the Royal Commissioner, could see this Social development research As the development of social skills was a major obj ective in the stated curriculum of almost all school subjects, I initiated a major project on social development. It lasted four years, attracted two major grants, and at one stage involved six full time and six part time researchers (The Social Development Group 1979). As a starter to this I took six months long service leave and a round the world trip. I spent some time visiting people and relevant projects in the United States, Canada, and England. I talked to
6 of 10teachers at primary, secondary, and tertiary levels about the social development of their students, and how they were able to facilitate that development. They all described the social development of their students during a year, whether six or twenty six years old, in the same terms; tentative, inarticulate, immatur e to confident, articulate, sensitive. It was obvious that what they were talking about had l ittle to do with developmental skills. My experience in unstructured groups suggested to m e that it had everything to do with developing groups, with the way that power, af fect and trust relations change if they are allowed to. I had already spent six months read ing the literature on social skill development. It was often interesting, but utterly uninformative in regard to classroom practice. And we had asked teachers to describe mat ure social skills; they responded with good descriptions of conforming behaviour. I c ould see that shifting the focus to the social group, to the context of social action, prod uced an array of possible teacher interventions, informed by group development theory We started with a project about developing social skills. We ended with a project o n developing the classroom group; for only in a developed group would the demonstrati on of mature social skills be appropriate.Rebelliousness One incident that occurred on this journey deserves a mention, as it relates to the question of what constitutes experience. In London I went into a coma for two weeks, during which time I convulsed and hallucinated and was fed by a drip and lost 12 kilograms in weight. I was diagnosed as having vira l encephalitis. My hallucinations had a clear story line. They all involved adventures with semi humanoid monsters who were trying to kill me. The f inal scene had me lying on an operating table with ten humanoid gun barrels at my head. The odds were stacked against me, and death was immanent. I had time only for one statement. "You will only kill me," I said, "to prove that I cannot control y ou. Yet if you kill me for that, then I have completely determined your actions." They left I came out of coma, and requested some food. With some trauma, I had learnt that the rebel is as tied to the system as the conformist. If I wanted to change the system, I wou ld have to take a different stance; one of autonomous action, rather than rebellious reacti on. I would need to tap the ambivalence of those in power, not their antagonism Back in Adelaide, the social development project go t under way. I read the literature on (small) group development theory, and realised that most of the models could be reframed in terms of distributions of powe r and affect relations; and because of my physics background, I conceptualised these in te rms of fields; properties of the space between rather than of the agents mediated by the f ields. My personal ontology was developing, and ten years later more complex notion s of power relations (eg Foucault) would find nourishment in my conceptual space.Politics again Part of the condition of the research grant was tha t separate reports be written for the major participants in the study; researchers, a dministrators and curriculum writers, teachers, students. I wrote the booklet for student s. It was entitled How to make your classroom a better place to live in (The Social Development Group, 1980). It described the four stages of development of the classroom gro up, how students might experience these stages, and how they might respond to that ex perience. Four different responses to
7 of 10each situation were constructed, and were overtly c ategorised as positive and negative; the negative responses, with which students would i dentify and be familiar, were likely to be not constructive in moving the group onward; the other two responses, one involving individual action and one group action, w ere ones which might help the group develop. The booklet was designed for classroom dis cussion. Before the book was distributed a question was aske d in the South Australian parliament about the book. Was it not encouraging s tudents to respond negatively? The Director General responded by ordering that the boo k be shredded. Flattered if furious with this treatment, I pointed out the conditions o f the grant, and requested specific information about exactly what was objectionable in the book, so that it could be amended and reprinted. After some months the answer came back; two words, "fascist" and "fairy," had to be removed; the positive respon ses must come first; and there must be an overt statement that the positive responses w ere "better". In addition, only teachers involved in developing their class groups could dis tribute this book to their students. I interpreted this to mean that there was nothing s pecifically at fault with the book. It was the ideology of the book, with its implicit aim of empowering students, that had caused the over-reaction. Yet the rhetoric about sc hools applauded the empowerment (autonomy) of students. Unwilling to confront the c ontradiction, the Department had to settle for limitation rather than complete suppress ion. For of course developing the classroom group meant that the power relations betw een teachers and students changed. If this happened in enough classrooms not only clas sroom structures, but school structures, would have to change. The implications of the research were radical rather than progressive. Inservice training was essential if the findings of the research were to be propagated, if practice were to follow theory. So f our researchers, now highly skilled in working with teachers, were retained for a year to produce inservice materials and work in schools with teachers. A year later, despite pro testations, all had been returned to classrooms. An invaluable human resource for the di ssemination of ways of developing the classroom group was annihilated. Fifteen years later teachers still struggle with rebellious classrooms and search for answers in ind ividual psychology, curriculum statements still highlight the development of socia l skills rather than the social context for mature social behaviour, and teachers still say "groups don't work" because they don't understand group development theory. In 1980, I was beginning to learn what I knew by 1990; that nothing really changes unless the power structure changes, and hierarchical power structures are immensely stable and resistant to change (Wilson, 1991). Consciousness One further event in 1979 is pertinent to this stor y. At Findhorn, an intentional community in Scotland, I experienced some shifts in consciousness (without drugs or intention, with detachment and interest), that seem ed very similar to those experiences described by mystics, and generally described under the rubric of the perennial philosophy. (Bucke, 1901; Huxley, 1946; Wilbur, 197 7,1982,1991; Wilson, 1992). These experiences, and subsequent ones, make it imp ossible for me to take Freud's easy way out (Freud, 1963), and discount such events bec ause I have not experienced them. Such experiences have been immensely significant in the history of the past three thousand years, for they have provided the bases fo r the world's great religions. The mythologies and structures that are the social mani festations of these initiating mystical events have taken very different cultural forms, bu t all have retained, within their core practices, considerable congruency with their sourc e as a particular state of
8 of 10consciousness. This is important because it points to one exit from the maze of confusion created by the acceptance of the relativi ty and cultural determination of all human values (Wilbur, 1995).Peace and violence By 1982, Ronald Reagan's unique combination of mons trous stupidity and apocalyptic hardware had stirred the coals of fear still glimmering under the weight of twenty years of psychic numbing and denial, of huma n refusal to seriously consider the high probability of a nuclear holocaust that could destroy all life on the planet. Everywhere the peace movement flourished. Learned j ournals of all sorts from medicine to engineering, from physics to art, began to featu re articles about nuclear war and its effects. Most unlikely bedfellows, Marxists and chu rchmen, pacifists and retired admirals, feminists and builders labourers, would a ll shout out their protests. Where were the children in all this? I decided to f ind out. There was some American data from surveys. I decided to tap a rich er source; children's fantasies of the future. The data was devastating (Wilson 1985). For many it was a post-nuclear war world, barren landscapes and destruction everywhere For nearly all it was dehumanised, people existing either as passive recipients of tec hnology, at the best comfortably mindless in a plastic world, at the worst slaves of the machines or robots that grind mercilessly along their efficient and pre-programme d paths. An unstoppable high-tech, high-destruct world. Like many who start with a naive view of peace as t he absence of war, my reading and reflection soon led to more sophisticated under standings; towards peace as the absence of fear at a psychological level, and as in compatible with injustice and repression at the social level. And I began to unde rstand how injustice was often not so much a matter of human intention, as a product of h istorical man-made structures, continually reproduced through the human facility o f role-taking, and the moralities and ideologies that are able to transform efficient vio lations into noble virtues. At fifty I was beginning to articulate a world-view. During the international year of peace, schools wer e all expected to get involved. Believing that in dealing with violence we should b egin in our own back yards, I prepared a kit for schools entitled Programs to red uce violence in schools (1986). It included ideas for involving students, teachers and parents, for collecting information, and for taking action at a school level. It also in cluded a paper on understanding violence, in which I tried to make overt the links between violence, school structures, social control, and justice. Complete with words of encouragement from the Director General of Education, the kit went off to one hundr ed high schools in South Australia. One school got the project off the ground and colle cted data from students and staff. Then they stopped. During the year, many schools pl anted trees for peace. I was developing a feel for the absurd.Writing again Two years before, buttressed by a report by the hea d of another educational research organisation, the Department disbanded our s. I was sent out to graze in the country at Murray Bridge for two years as an Assist ant Director Curriculum, where I managed to get two of the social development advise rs back into business, before I retired gracefully. There was nothing further I cou ld do within the system. I was ready to
9 of 10write, and had two young daughters at home that I w anted to spend more time with. I was learning the difference between jousting with w indmills and hitting my head against a brick wall; one is a noble quest, the other just plain masochism. The writing and the daughters got together into a b ook called With the best of intentions (Wilson, 1991). The book deals with the structural violence embedded in the hallowed institutions of family and school. I had d ecided to self-publish the book before I began, and as a result was able to give clear rei gn to my personal voice(s) and style. The book is egalitarian in that it treats children as fully human persons; it is iconoclastic in that it challenges many of the sacred myths and structures of child-rearing; it is written with passion and humour. It is informed by empirical data and overt in its philosophical world-view. The arguments are dense, but the presentation is, I hope, sufficiently varied and light to make its message a ccessible. With modifications that are essential to the context, I hoped to use a similar approach in this thesis. The current study A large number of significant learnings have emerge d for me from the current study. I want to refer to the two that I have found the most significant. The first relates to my extensive reading of Michael Foucault, the secon d to my grapplings with ontology. There were two major insights from Foucault; the fi rst was his analysis of how culture produces and expresses rather than reduces and represses; that if the person is one dimensional, this is not because society has ta ken away the other dimensions, but that society, through its relations with the person has produced a one dimensional person. The second insight was the centrality given to the examination, in all its forms, to the construction of the individual in the modern world. It was from this springboard that I could leap to observe the standard as the bu llet in the examination gun. An equally important learning from Foucault relates not to insight, but to style; not to his immense data base and sometimes lugubrio us argumentation, but to the soaring rhetorical passion that marks his insightfu l conclusions; his demonstration that "scientific" writing does not need to be dull and p ortentous, but can legitimately use the full creative resources of the language, helped me to feel much more comfortable in using my own voice for this work. My own philosophical gropings into what is knowable what is describable, led to some surprising conclusions. Such delving was neces sary, because any assessment is a description. In practice it is a description of a p erformance of some kind in context, even if in theory it purports to be a description of som e attribute or quality of a person; this I had known for a long time. To move from here to the insight that all knowledge is a description of events involving a relationship betw een at least two elements, and thus to appreciate the slide made when the description is p inned to one particular element, represented a major reframing of much of my earlier thinking. Summing up There are at least five levels in all this: The eve nts that I was a part of; the manifest behaviour that constituted my part of thos e events; my particular recall of that experience; the meanings I verbally constructed fro m that recalled experience; and the meanings and reactions that you, the reader, constr uct from all that. Truth is not an issue here. Awareness and truthfuln ess are. I can only assert my truthful intentions. Regardless, the reader will ma ke his or her own judgment about the
10 of 10value of the position from which they interpret me as coming.
1 of 10nrnrIn this chapter I spell out in more detail the phil osophical stance that I take in this study, so that my assumptions about social life and social relations are up-front. Whilst these assumptions are consistent with the le arnings of the autobiographical sketch give in the last chapter, I have not felt it necessary, or advisable, to enter into any sort of justifying dialogue regarding my position. This is not a philosophical study, and I have always regarded justification as a loser's gam e. So I have presented my philosophical position as a set of assertions with an internally consistent logic; I have briefly described the epis temological, ontological, and axionomic assumptions that have informed this study and described how that position fits into current post-positivist, interpretivist, and post-modern paradigms. The chapter ends with a brief outline of the assess ment process constructed from my particular position.Philosophical assumptions : What is knowledge? What is truth? I will call an event any interaction where a change or a difference is observed or otherwise sensed (Bateson, 1979). Interactions invo lve some relation between elements of the event. Differences involve some relation bet ween the elements, or the states of an element over time, that constitute the difference. So all events involve some relation between elements. And because all events involve a perception, so all events involve a perceiver. The perceiver may be automated as an ins trument that senses the difference or reacts to or records the change. As Maturana (1987) expresses it, "Everything is said by an observer" (p65).Any experience is experience (action, feeling, perc eption) of an event, either directly, or as recalled or as transformed in memory or action. So all experience involves relations. As all knowledge must finally depend on experience, all knowledge involves knowledge of relations; so all knowledge is constructed out o f relational events. To experience an event does not necessitate giving a meaning to that event, but does require a state of awareness or consciousness, from which the event is viewed. For example, an experience may be represented by a patt ern or abstract painting which embodies relations without embodying meaning. Givin g a meaning to an event requires some theoretical underpinning, some ideas or ideals ; some knowledge of relations derived from other events, or possibly, if mathemat ical relations are construed to constitute meaning, derived from acts of imaginatio n that transcend (are transformations of) known relations. Mathematics can be regarded as a special case of patterning, and whether mathematical propositions or systems have m eaning in themselves is moot. I don't think they do. Some post-structuralists want to deny experience that excludes meaning and thus language. My experience denies the ir denial. Their assumptions refute my denial. Stalemate. But then, I'm writing this th esis.
2 of 10I use the term meaning to involve more than predict ion, which mathematics can sometimes help to accomplish. Meaning involves some reason, some purpose, some intention, some value. Thus meaning is inevitably e mbedded in language, itself embedded in human discourse. Unless we take a mysti cal view and define the meaning as the experience itself, or rather as a particular encompassing experience, in which case discourse stops and the world in its oneness pulsat es. In this thesis I shall hold to the more mundane view. To do otherwise is not to procee d. In this epistemology, experience precedes pattern, and pattern precedes meaning. "Whether we are talking about unicorns, quarks, inf inity, or apples, our cognitive life depends on experience" (Eisner, 1990, p31). Meaning will then usually in its turn, but not necessarily, pre-empt and distort experience, w hich will then in its turn influence events. Buddhist meditation is designed to limit th is distortion; which brings its participants on this issue close to post-positivist s like Phillips (1990), who seem ultimately to define objectivity as the reduction o f bias of various sorts. Meaning is socially constructed because language is socially constructed. What passes for knowledge in common language is a social concur rence in a particular culture about acceptable meanings embedded in discourse. On the o ther hand, experience is constructed out of relational events not necessaril y linked to any particular culture, and the construction of patterns or relations in respon se to that experience may also sidestep, or transcend, social patterning or common meanings. In other words, I hold the view that creation is immanent in all events, and in all perc eption of events, and change is more than the imposition of some random variation. Usual ly, however, we may assume that patterns are also culturally influenced.Data is a particular form of knowledge constructed by particular people for particular purposes. Such purposes always involve the construc tion or isolation of events in which the observer is directly, or indirectly through ass ociated theory, involved; for example, measuring devices involve the observer at one step removed. Thus all data, being knowledge, is constructed from events, constructed and/or observed for particular purposes. All data, to be used, must have either a predictable pattern, or a meaning, or both. So if data is to be useful, it must have link s to other relational events, or have links to (uneventful) abstract relations.It follows that, in this world, there are as many p otential truths about an event as there are experiences of the event. To the extent that al l experiences of the event are the same then there is a case for "the" truth. But how would this be known? Any attempt to know this would involve the sharing of meanings, which a re certainly socially constructed and can be as varied as the cultures and relations and metaphors that are used to make sense of them and communicate them. So agreement about on e meaning, one truth, represents conformity about social construction as much as it does concomitance of experience. Ironically, in a social context the idea of multipl e truths is unificatory, whilst the notion of one truth is fundamentally divisive; in practice the notion of one truth contradicts the collaborative ethic and supports interaction charac terised by entrenched positions. Search for "the" truth is often productive within a closed space of cultural assumption, but does not lead to open inquiry outside that spac e; rather it invokes defensiveness, and if necessary violence in order to sustain its invio lability. Inevitably it leads to fragmentation and conformity, as contradictory elem ents break away to form their own
3 of 10"truthful" reality, and all else becomes subservien t to "truths" current fashion (Feyerabend, 1988).One more point about multiple truths; such a claim does not contain the inference of the catastrophic consequence that all "truths," that is socially acceptable beliefs, are equally useful or sustainable, or that some cannot be falsi fied. At least at the level of physical definition, it is demonstrably false that I am cons tructed entirely of green cheese. Such a claim is not a valid contender for any claim to a t ruth beyond that of a very idiosyncratic and metaphorical form. Truth claims about events ca n never be proved, but some truth claims can be demolished through procedures of cont radiction. If data belongs to an event, it cannot be attribute d to a particular agent or aspect of that event. It is common and comforting to attach data t o particular objects or participants in an event, and to the extent that all other particip ants and relations that constitute the event are held constant and made overt, to that ext ent attributing the data to a particular agent constitutes a valuable shorthand in descripti on and discourse. For example, to attribute a certain tensile strength to a steel bea m is convenient, but has meaning only in regard to an event at which, at a certain temperatu re, the beam is stretched in a machine until it breaks. The time span within which this (h ypothetical) event generates the same data is quite long. But over a thousand years, the steel beam no longer has this property; which is shorthand for saying it will behave differ ently in the event that it is stretched. Not only that, but any engagement in events will af fect the tensile strength in an unpredictable way; if an unbroken part of the beam is stretched again it will be found to have a different tensile strength; as it will after multiple vibrations as part of a bridge. So experiments in the physical and biological scien ces do not produce data about the object, or measure properties of the object being i nvestigated. They produce data about the event that is the experiment. Most experiments describe the behaviour of physical or biological objects under particular boundaried, tha t is, controlled circumstances. The information they give therefore is not so much abou t the "natural" world in which we and they live, as it is about the "controlled" worl d that is the experiment, and sometimes becomes habitualised as technology. Most social res earch has fallen into this trap of misrepresentation of the source and attribution of data. Social events, or indeed interactional events of an y sort involving living things, have time spans of small duration. Indeed, identical eve nts are impossible to create because social relations, and the participants involved in them, continually change. Even if we could hold all the conditions constant as we do for the steel beam, the data still cannot be attached to the person because, even more so than f or the steel, the person of tomorrow is a different person; and part of the difference i s attributable to the experience involved in obtaining the data.It follows from this epistemology that most psychol ogical descriptions of people are shorthand and problematic descriptions of social ev ents, from which most elements that constitute the event are camouflaged. The label is attached to the person even though the events which produced the data involved social inte ractions. This is an example of faulty labelling. In particular it applies to any notions of skill and competency that do not clearly define the context of their application.So the issue of objectivity is not that things exis t independently of the mind; the issue is whether things (elements) have properties independe ntly of the events used to describe
4 of 10them. To say that a thing is real (has material exi stence) is very different to claiming that its "properties" are real and belong to it.Ontology: What is the nature of social reality?Within the meanings constructed above ontology prec edes epistemology in that social relations are a particular case of an event in whic h two sentient beings (probably both human), are involved. By implication the event is t he "reality." Something is happening "out there" that is producing a difference. Thus so cial experience is a particular form of experience of an event, and social meaning a partic ular construction of that experience. On the other hand, epistemology precedes ontology i n that all meanings are socially constructed, and are thus ultimately dependent on s ocial relations and that includes the meanings we ascribe to ontology.Regardless, the two domains interlink with no incon sistency in terms of the idea of social relations and the idea of knowledge being a function of experience of relational events, and meaning being socially constructed.Using relations as a primary explanatory factor neg ates the notion of causality, at least in a simplistic sense. Events are construed as interac tive systems where everything effects everything else; patterns of mutual influence repla ce causality as an explanatory principle. This has been generally accepted in Phys ics since the work of Einstein and Eddington early this century. It has always seemed odd to me that the more complex the system in which the event occurs from physics thr ough to biology through to social relations the more frantically the idea of cause is clung to. Further to that, the idea of "reality" is similar t o the idea of "truth"; a redundancy, an unnecessary complexity, an irrelevant diversion. It contributes to conflict rather than to productivity. It seems more useful to talk about wh at aspects of social relations intrude most on experience, and are important to the intens ity and duration of that experience, and the effects that it generates. In this regard I would make four assertions about social events, conclusions from my own experience and refl ection: knowledge of social relations (that is, data genera ted within human interactions), is usefully construed in terms of the power and aff ect relations of the participants in the event; in particular, asymmetrical power rel ations generate different data than do symmetric power relations; and positive aff ect different data to negative affect (Foucault, 1988). an event occurs within specific localised power and affect contexts; this is not to suggest that this event might not itself be embedde d in power relations (economically, racially, nationally or gender influ enced) which push the effects and experience of the event in particular direction s, but does put less emphasis on such grand power relations. events are dynamic, not static situations; they are characterised by movement, by change. They exist in time, which could be consider ed one measure of their change. So data about social interactions, which ma y often be characterised by power and affect relations, will change over time a s the power and affect relations themselves change. I assume that any new social rel ationship (any social event characterised by people who have not met before in that configuration) will
5 of 10initially be asymmetric in respect to power, and mo ot in respect to affect. The relational changes will affect the data generated t hrough interaction, which includes discourse, and vice versa.Fixed societal structures (e.g., hierarchies) cryst allise power relations and negate change. To the extent that they are successful they may produce knowledge, consensual interpretations, limited by the very bou ndary conditions that make its production possible; fixed societal structures also in time, contradict the flow of interactional life, and produce social pathology. Axiology: What values are embedded in the processes and product of the research? Whose interests are served through them?No knowledge is value free. As Lincoln (1990) puts it, "given the criticism from all quarters, . only the most intransigent or the m ost naive scientist still clings to the idea that inquiry can, or should, be value free"(p82). B eing socially constructed, knowledge produced from inquiry is related to the meanings an d purposes and structures within which it was composed; and it will tend to confirm or negate those relations involved in its construction, depending on the interests and at titudes and assumptions and awareness of the researcher. Even if data could be produced t hat was independent of those elements and relations, that very independence is itself a v alue position, which could be construed either as objectivity, because it has transcended b ias, or as ideology, because it camouflages the power relations from which its bias necessarily derives. As a researcher my task is to contribute to the mea ning system that helps me and other people make sense of their experience in the partic ular class of events with which this study is concerned. They will make sense of it if i t is a story that links in some way with their experience, and at the same time is not contr adictory to their experience; experience that is, of course, already partly inter preted in terms of other stories. As an educator my task is to change people; educati on is nothing if it does not result in change. And as change is inevitable, but may be in many directions, there is obviously an obligation on the part of the educator to specif y the direction in which change is intended.As educator-researcher I must interact with the peo ple with whom I wish to do research or educate. I do this through process (how I do the research), and product (what I produce as a result of the research). If I do not p roduce the data I investigate, but merely interact with data produced by someone else, this s imply pushes the value problem one step backwards; their data was not value free. So i f I accept their data without criticism, then I am accepting and perpetuating the values tha t affected its construction and effects. If I question that data, I question the social valu es embedded in it, as much as the social effects that are manifested through it.If whatever I do involves interactions with people, and the construction of knowledge, then whatever I do affects both the meanings of peo ple, and the social relations involved in those meanings. This is not to say that describi ng "what is" implies approval and acceptance of what is. Rather it is to claim that t he very description of "what is" implies a way of viewing the world, a relationship with the situation, an involvement in the construction of the data, that pre-empts the meanin g of the data by hiding the value assumptions behind the very mechanisms of its const ruction; becomes, that is, symbolic violence, unless made explicit (Bourdieu, 1977). Mo st quantitative research and much
6 of 10qualitative research is in this sense symbolically violent, in that the sources of its power are disguised.Unless I wish to engage in a value contradiction, i t seems necessary to have an awareness of the direction in which I wish to move people's overt and covert experience of social relations and the meaning systems constru ed within their influence; and to use processes and meanings that are congruent with thos e purposes. My autobiographical note indicates that much of my work over the past thirty years has been involved with the nature and practice of viole nce in its various forms, especially as it affects young people.My construction of the concept of structural violen ce (Wilson, 1992) indicates that I regard fixed hierarchical structures, in all their multifarious visible and disguised forms, as inevitably connected to structural violence and hence to social injustice. Due process within legal systems is necessary to alleviate, or control, some of the social fallout, but is not sufficient to ensure social justice at its root manifestation, which requires more equalitarian structures.Peace and social justice are ideals that have many forms and faces that change over time. On the other hand, physical and structural and emot ional and symbolic violence are constructs amenable to more specific definition, an d hence more easily recognisable in particular social events. For this reason, I feel m ore comfortable having as a basic value the reduction of violence, which I could universall y advocate, than with the increase of social justice, which is more nebulous because of i ts many-faceted nature; on this view, increase in social justice that is not associated w ith reduction in violence would be problematic, involving as it does an internal contr adiction. If beliefs (truths) are multiple, then so must be t he values that are implied in those beliefs, or which inform them. How then can any par ticular value position be maintained as superior to any other?In regard to the specific events that involve me an d others in this thesis, I would answer that while the value of reducing violence is not ne cessarily superior to others, in the context of this work it is consistent with: 1. The learnings (culture and gender influenced as they are) that I have constructed out of my life experiences. 2. The ontology and epistemology which I have descr ibed, which inform the assumptions on which this study is based. 3. A view of life and living that involves ideas of growth, change, and flow at both individual and social levels. As such it is consist ent with many views of personal enlightenment and social justice. 4. Processes likely to favour the survival of human life on the planet at a time when the technology is available, and primed to des troy it (Schnell, 1980). 5. That universal attunement and compassion which i s one aspect of the experience described as mystical, as cosmic conscio usness, or as the perennial philosophy, which transcends historical and cultura l boundaries, and contains a sense of the sanctity of each individual person (Wi lber, 1991). Slotting into the social research field: How does t his epistemology, ontology and
7 of 10axiology fit into the social research field as curr ently constituted? Some doyens in the research game still regard quali tative social research as an exotic rather than a native plant, and as such something t o be treated with caution because of its possible ecological effects on what had previously seemed to be a very secure and threat-free environment. Specifically, many testing experts still live in a positivist world (Shepard, 1991). As well, most teachers are quite c onvinced that their tests measure their student's attainments; the correspondence theory of knowledge may well be discredited, and philosophically empiricism may well have been d ead for forty years (Smith, 1993), but in schools and colleges and universities and wo rk places it is alive and kicking. However, a rich literature has developed from the d ebates involving qualitative research over the last ten years (Burgess, 1985; Eisner & Pe shkin, 1990; Guba, 1990 Popkewitz, 1984; & Smyth,1994).So with some reservations qualitative research is n ow accepted and respectable, even though practice severely lags theory. The reservati ons are currently crystallising as sets of questions and answers about how to recognise "go od" qualitative research. For example Carr and Kemmis (1985) describe five formal requirements for any adequate and coherent educational science (p158). Criteria a nd caveats are being constructed that will undoubtedly in time result in a new orthodoxy (Lincoln, 1990). Feyerabend's (1988) assertion that "science is an essentially anarchic enterprise; theoretical anarchism is more humanitarian and more likely to encourage progress than its law-and-order alternatives"(p5), provides as much discomfort in t he research world, be it quantitative or qualitative, as in the world of politics or the family. Smith's (1993) work clearly indicates that clarification of the problem of crit eria is central to any real progress. It is also necessary if any substantial change in educati onal practice, and associated structural relations, is to occur.At this point in time, however, the limits of the f ield are blurry, and the demarcations between various camps subject to border skirmishes. So at least one reason for my position not fitting into a specific ontological, e pistemological, axiological, or methodological tent is that such tents are not clea rly differentiated between the encampments. Having said that, it is possible to no minate some camps to which I do not belong, and some camps to which I partly belong, wh ere I would not feel too uneasy sitting in some of their tents.It is generally agreed that there are three basic p ositions; empiricist (post positivist), interpretivist (constructivist), and criticalist (S mith, 1994; Lincoln, 1990). It is also agreed that this is an over simplification.Briefly, empiricists argue that there is a reality out there to be discovered, that it is single and measurable, and that causal laws explain and pr edict it (Smith, 1994). Carr and Kemmis (1983) characterise the interpretiv e approach to social science as aiming "to uncover the meaning and significance of actions" (p92). The interpretive position is that truth is constructed by people, an d always involves a social context and social interactions. So truth is relative and multi ple. This position has two strands, the ethnographic (Sherman & Webb, 1988), and the ontolo gical strand (Eisner, 1988). The difference is in the way hermeneutics is regarded. In the ethnographic strand, hermeneutics is a method of achieving interpretive explanation; in the ontological strand hermeneutics is more concerned with the idea that a ll knowledge, all representation is
8 of 10dependent on the primacy of experience (Schwandt, 1 990). Regardless, "hermeneuticists of all measure and variety agree that any interpret ation of meaning must take place within a context" (Smith, 1993, p16).Carr & Kemmis (1983) regard post-positivist and int erpretivist accounts to be similar in that "the researcher stands outside the research si tuation adopting a disinterested stance in which any explicit concern with critically evalu ating and changing the educational realities being analysed is rejected"(p98). However some constructivists (Lincoln, 1990), more recently advocate an abandonment of "th e role of the dispassionate observer in favour of the role of the passionate participant (p86). This is a position with which I concur. Smith (1993) elucidates other similarities and differences in the various positions: Interpretivists take antifoundationalism to mean va rious closely related things such as that there is no particular right or correct path to knowledge, no special method that automatically leads to intel lectual progress, no instant rationality, and no certitude of knowledge claims. These are ideas, of course that interpretivists share at one level or a nother with postempiricists and critical theorists (p120). He goes on to point out that "differences of conseq uences are readily apparent as these points are elaborated upon more specifically"(p120) and presents his own view that the demise of empiricism means that it is time to m ove beyond the need for a theory of knowledge and the various dichotomies . of subject versus object, facts versus values . this is in marked contrast to attempts by post empiricists and critical theorists to elaborate a s uccessor theory of knowledge by either modifying or recasting, respect ively, the empiricist understanding of these dichotomies (p120). The criticalist position also has two strands. In t he first belong critical social theorists, ranging from traditional Marxists uncovering the "c ontradictions of economic conditions and relationships", to a variety of other critical perspectives, where "the focus is on the ideological distortions inherent in a broad range o f historically formed social and cultural conditions" (Marshall, 1990, p181). Smith (1990) sums up the critical theorists project: "critical inquiry can reveal our objective historical conditions: tie this knowledge to the expunging of false consciousness, distorted communication, and so on; and thereby promote emancipation and empowerment" ( p193). Critical theorists then have a clear agenda of social transformation, based on a particular historical perspective, to which they have appropriated the "objective" lab el. As Carr and Kemmis (1983) express it, they aim to "reawaken the power of crit icism and the power of praxis criticism and praxis being the critically enlivened forms of what we usually refer to as theory and practice" (p186).The other strand of the criticalist position is the post-structural, post-modern strand, which includes some feminist perspectives. The conc entration here is on the construction of social reality through language and discourse, and the way in which this serves dominant groups and interests. The emphasis in research is on discourse analysis, in order to expose such inequities (Smith, 1994). F oucault's work is sometimes attached to this strand, though he himself did not accept th e classification. And I would agree. This is important, because the writings of Foucault considerably influenced this study.
9 of 10So where does my position fit into all this? I am n ot a positivist or empiricist. I do believe that empirical data can be collected about events; it's just that I don't believe that in relation to social events such data is very stab le, can be replicated without considerable error becoming evident, or can be just ifiably attached to a particular participant constituting the event. Any such data v iews that event from a particular position, with particular boundaries, with particul ar interests and values influencing the collector.On the other hand truth claims are sometimes explic it, and often implicit, in theoretical formulations or interpretations involving social ev ents. And some such claims can be directly contradicted by empirical data, by effects or consequences that are directly observable.In terms of ontology, of the nature of reality, I d o not fit neatly into any of the camps; empiricist, interpretivist or critical. I am probab ly closer to being a sceptical mystic. Rather than enter into that potential bog, in this thesis I have bypassed the question of "reality" and begun with the notion of social event s, which involve the participants in social experiences.I am constructivist or interpretivist in as much as I see all knowledge as multiple and constructed. Eisner (1990) agrees that experiences are the basis for cognition and knowledge: "thinking and knowing are mediated by an y kind of experiential content the senses generate...our language refers to referents we are able to experience, recall or imagine"(p91). However, as Schwandt (1990) points o ut, this ontological basis of experience is not common to all interpretivist meth odologies. Perhaps my main point of departure from the critica list perspective is at the ontological level; certainly I see relations as fundamental in as much as they constitute the mechanisms through which difference and change occu r, thus making events experiencable. But I do not wish to "objectify" the se into some grand historical schema on the one hand, nor overemphasise their dependence on gender relations or particular discourses on the other. Rather, I see power and af fect relations as a "heuristic fiction" that has great generality and elegance as an explan atory and generating principle. However, I am clearly allied with them in their wis h to reduce the violation of persons through the transformation of social structures and in seeing social research as a legitimate way to help people make sense of the soc ial world in a way that gives them some leverage to change it for the better. By "bett er" I refer to a decrease in violence. A model for the assessment processThis thesis is concerned with a particular type of social event called assessment. It is particularly concerned with the assessment of indiv idual persons. I assume that such an assessment results in a categorisation of some kind Such a categorisation involves a bifurcation of data, itself dependent on judgments about criteria and standards. Given the ontological position of the above discuss ion, the assessment process involves (at least) five stages (events) and a context. In a ctual practice some of these stages may be omitted or fused. Such fusion or omissions may c onstitute a source of confusion or error.
10 of 101. Test production: An event (experiment, test) is devised to produce data. Such an event will involve an interaction between the asses sed person, and instrumentation of some kind. The instrument may exist in the asses sor's head, or may be produced as a physical artifact (a written test). The test p roduction process also involves explication of a theory-practice link of some sort, and some prior judgments about a relevant task. 2. Test experiment: The person being assessed does the test, by performing what is required in the testing situation. This is the firs t stage of data production, and this event is completed when the test is completed. 3. Data production: The second stage of data constr uction occurs when the assessor interacts with the testing process directl y, or with products from it. eg. a performance or a completed test paper. This interac tion involves an interpretation of the data. 4. Judgment process: This results in a categorisati on of some kind; it involves a comparison of the data with the standard, either di rectly, or by comparing with data about other students. This process assumes the existence of the standard as a stable and replicable element in the event. 5. Labelling process: At least two labels are invol ved; the name of what has been assessed (described), and the name that describes t he level of performance (compared to the standard). The multiple label is c onstructed from the whole assessment process, and is legitimately attached to those events. In practice it is more likely to be attached to an element of the tes ting event (the assessed), or to an even more remote theoretical construction relate d to the assessed (some skill or ability). 6. All of these processes are embedded in relations of power which reproduce and invigorate themselves in the processes. And all of these processes (events) are potential sources of error and confusion in the ind ividualised material product of this whole process the documented labelling and c ategorisation of the assessed person. Summing upNegating notions of truth and reality does not nece ssarily lead to chaos or alienation, but may presage a search for greater clarity of assumpt ion, for greater precision of value, and hence for greater wisdom in action.
1 of 17nrn rnnn rnrn nr Power is defined in terms of relational fields rath er than of personal or role attributes, of power as ruler and ruled. Arendt and Foucault artic ulate the construct differently in that they differentiate violence from power. I choose a broad definition of violence as any violation of personhood; so both force and physical violence are subsumed as sub-categories of that construct; and violence beco mes a necessary aspect of asymmetric power relations, inevitable in hierarchies.The other side of power relations is now highlighte d; the side that produces rather than denies, that constructs rather than destroys. That is, I deal in some depth with Foucault's (1992) assertion that "power produces; it produces reality; it produces domains of objects and rituals of truth. The individual and th e knowledge that may be gained of him belongs to this production"(p194). In particular, I look in detail at what is produced through two specific mechanisms fabricated within a symmetric power relations: the processes of disciplinary power, regulated through surveillance and penalty; and normalisation, achieved through linear labelling an d sustained through the cult of individualism.I look briefly at some of the "scientific" discipli nes, and the micro-cultures that sustained them and helped provide their assumptions, theories and data. Finally in this section Bourdieu's construct of sym bolic violence, and the notion of habitus through which it is humanly experienced, sh ows how difficult it is, when playing the game our culture dictates, to recognise its lim itations. Defining powerWhat characterises social life is affect and effect ; affect refers to those aspects of relating that are characterised by polarities such as emotio nal closeness-distance, of like-dislike, of attraction-repulsion, of affiliation-separatenes s. These affect relations are apprehended viscerally, experienced directly throug h the body. In the vernacular, in the field of sense relations you "feel the vibes."
2 of 17Power refers to those aspects of relating that tran slate influence, that make a difference, that have an effect. The actions of one affect the thoughts or actions of another. The poles of a power relation could be characterised by such descriptions as dominant-submissive, controlling rebellious, have want, strong weak. So within the field of power relations, what one person does affe cts a second, which affects a third, and so on. Such effects ripple onwards and outwards from human interactions in patterns that are indeterminate; yet even so the patterns ar e sometimes decipherable and probablistically predictable, for the fields that a ffect the patterns are stable and translatable.For example, in all cultures there are families, gr oups of people genetically related whose patterns of interaction are relatively stable whose ways of behaving towards one another are consistently patterned; the parent infl uences the child, the parent's demands produce action, the power vector is from parent to child. Yet even so the child's behaviour must influence the parent's behaviour, if only to maintain the parent's controlling function. In this sense power relations involve mutual influence, even though normally asymmetric, and translated into action inv olve dynamic events. Such events are acted out in power fields, such as family or school or workplace, where the rules of the game are understood, and the overa ll direction of action influence predictable. In this sense the influence is not so much person to person as role to role; the relationship of parent to child overrides the r elation of the person Jack to the younger person Julie. For this to occur we must assume some mechanism for the learning of relational roles, for the internalisation of the po wer injunction. For if we locate the power in a relational vector out there in the space between, we must also explain by what psycho-social means people in the field are mo ved to act. More of this later. Affect and power relations are not mutually exclusi ve; strong affect can generate high intensity in the field of power relations. And doub tless asymmetric power fields are capable of generating considerable affect, both pos itive and negative. Even so, the two notions are separate, the two fields initiate diffe rent experiential effects, and are associated with different states of consciousness. Love and power are not synonymous. And which is stronger is moot. Like Bourdieu (1990 a), "We leave it to others to decide whether the relations between power relations and s ense relations are, in the last analysis, sense relations or power relations"(p15).Regardless of their relative strengths, their confu sion produces dysfunction in societal relations, and pathology in individual people; love that degenerates into power play destroys itself; and power that masquerades as love is a sickening violation. However, this is too large a contention to debate in this th esis, and is not directly related to our major theme (Laing, 1967).To summarise, I have defined power relations as the dynamics of mutual influence. In most situations such relations are activated in fie lds whose pattern is perceived by those who enter the field in terms of role relationships, or less consciously simply as appropriate behaviour, a predisposition to act in a certain way. People engaged in such fields are both activated and constrained, but by n o means wholly determined, by the role expectations or predispositions (habitus) whic h, for individuals at either pole of a power relation, are activated by their entry into t he field.
3 of 17So let's see how this definition fits into the hist orical meaning of such concepts as power, force, strength, and violence. Power and RuleTraditionally the essence of power has been rule an d command; or alternatively the act of ruling and commanding has been attributed to a f aculty called power. This need to dominate was seen as an instinct in man, a psycholo gical necessity. Force and violence in social life was thus inevitable, for they were n ecessary components in the command strategies of a leader. Combine this psychological instinct with the social requirement that the first learning of civilisation is that of obedience, and the two poles of a largely unidirectional power relation are accounted for. To command and be obeyed is thus the essence of Power. And the basic building block for monarchy, hierarchy, and their complex transformations into the modern state has b een constructed (Arendt, 1970, p36).A look at any parliament in action, or a peep into any political party meeting, leaves little doubt that this paradigm of the fight for do minance is still central to the inner workings of government; certainly jostling for plac e in the political party pecking order is a major preoccupation of politicians, particular ly of those who aspire to top positions. However, tradition also specifies an alternative po wer game. This was the idea of representative government, whe re obedience is to laws that have the people's consent rather than to dominant men, and e lected leaders remain dominant only with the support of the people. This second paradig m undoubtedly has a much wider gap between vision and practice than does the first, an d a fundamental question of political science has always been about whether this is ideol ogy rather than reality, a fairy story that disguises and soothes the experience of most p eople of powerlessness, of alienation. Regardless, in most modern states there is some bal ance, some checks within limits, of the power of the state and the tyranny of its accom panying bureaucracy, articulated through the opinion of the people.Arendt (1970) argues that all government tyrannic al, monarchical, oligarchical, democratic, bureaucratic, or whatever, depends fina lly on the support, the "qualified" obedience, of the people:nnnnn n!rnnr"nrrr###$%nr&"nn&nn"nrnnnn&r"nnnnrr&nnn$'%# nnr&"rnnnrrrrnnnrrr#nn"rr"rnnrnrn#(r"nnnr#
4 of 17 r(rrrnnrrnnrnn#)n&*&*#+rnr"n ,r*n&nrnrr"nnnnrnrn-nrrrnrnrnrrnrnr&n&rn$+'../&'%#Bourdieu postulates the existence in the social wor ld of objective structures, in addition to symbolic systems, and independent of consciousne ss and desires of agents; structures which guide and constrain their practices and repre sentations, which produce a predisposition to act in certain ways (p123).Foucault (1988) also moves well beyond the notion o f "Power with a capital P dominating and imposing its rationality upon the to tality of the social body." In fact, Foucault goes on to say, "there are power relations They are multiple; they have different forms, they can be in play in family rela tions, or within an institution, or an administration or between a dominating and a domi nated class" (p38). Foucault (1988), like Bourdieu, uses the relational power structure as a fundamental explanatory principle: "The characteristic of power relations is that, as agents in the structure, some men can more or less determine othe r men's conduct, but never exhaustively"(p83). So power relations precipitate all "the strategies, the networks, the mechanisms, all those techniques by which a decisio n is accepted and by which that decision could not but be taken in the way it was"( p103). Or in retrospect, that's the way it seems. Power and violenceYet like Arendt, Foucault (1988) wants to remove co ercion, brute force, from his notion of power relations. He says:nrrnn&n&*&n"r#0#+rn&n-rnrr"&nrrnnrrnrr&n&r"nn#1r&n&*#1r&n&"nn#,rnrn"$23%# 4rnrnr"r!n#n5rrrn"n66nn7nrr
5 of 17nrn6n6rnn#nrnr"#5$'.2&%nr (rnnrrnrrnnr#8nnrnnnnnnnnnnrrnrnr#nr"nn&nrn!&n###n$n%###rn"r&&rrnnnnr"nnr##$r%rnnn&n"nnn$/%# 8nnrnn,r&*nrnrnrn5nr#nnnnnnnnnnn!nr&n9nnn!rnrnnrnr&n#+r:;rnr&nnn6n-nrn6rrrrrr#,rnn""nr"nnnrr-#n$'.%6nrn###nrnr&nn"nnnrn)nrnnrnrnr6$%#rnrnn&-n"n"n:8rrrrn&rrn&&*r""nr6rn6rrnnnn":,rnnnn(nrrnrnrrnnnr&nn$rrnn%n#r&r5nnnrnnnnnnnnr"n"n"nnrrrnr&n!n"nr&nnr&"rnnn"n#8nrr*nnnrrnrnr&n&r"n""!rrnrrnnnnr)n!nrnnrn-&rnr&*"n#="nnnnn
6 of 17nrr"n&n"rn&&nnr"nnrnr"nnnn-n&nnrrn"&nrn"n# +n$'.<3%nr"nrnn"n ,r&nn"n$%"nnr###nnnrnrrn"nn"*r&*rnnnnr"nnrn-"nrnn"rr&nn#8nr&rnnrn rn&n"nn###nnnrn$'%# &&nn*n-nrnn"n&""nnrrnnnrnrn nrr-rn&rn!r-nrr&nrnrnr&*&nr&,rn#4nrrnn"nrnnnr"nn#n"rn&&nn*!*nr"rnnnnnn#rnrnr&rrrnnn#,rrnr&rrn&nnnrnnr"!n&nnr#>nnnnrnrnnnr"nrnnnrnrnn nr"68n9#6n"n"nr&*rnnrnnnn"n"nrnr#?nr&rnnrrnnnnnnnnn#nnnnnrn-nn&!nnnnr#@n&nnn&rrn"
7 of 17&n!rnnnnrrnnn#8nn-n!&"rrrn&rrr#8nnn!rrrrnn#"nnrnnA#5rrnnnrn#rrnn n9""n&r"n&rrrn-#Bnnn nr"?rn"nrnnrr"""nnnnnn#8nnrnnn#,rn"nn#rn"nnrnrrnrrrn&rnrrnrnrnnnnr$(n'..''%#rnrnnnrnnrnnnrnn!nr"n&&"nn#8r"nnrnnn#n"nn""nrnnn# r?nrnrrn"n#,r#n)nrrrr""nrrnnn#nr#59n&nr&nr#6,rrr&n&nnnn#,rn&""6$5'.22''2%#,r"r&nnrrrrn#nrrrnn##nrrn&"r"n#@"nr"nnn!rnrn""nn&nr#-r6"6nnrn:8-nnnrnr"rrn"nnn:5rnrn&nnnnnnn:nn9-&59"&
8 of 17rrnrnnrn+9&rn"&&"nrn"# r?"rrrnnrnrn#5$'..%rnn&nn>nn nr ,nrnrrnnrnnnrnnr"n&rrr#,rnrnnrr!r"rnrnrnrrnrnr#>nnrrrnrrn"&!rnrr&*n"##rnnrrn)&rrnnrnnnn-nr&*rrnrn&*n$'2<%# 5nr6nn6rnn#,rn-nnnnnr6rr&"n6n6n n*n6rrnnrrr&n"rr"#1rn:(rrrn&n: ,rnrrr "nrnnrnn!nn-rn#nrrrn n"6r6rrr66n&n"nrnnnnn#,rn&rn&&r&n&"rnn&"n"n nrnnn"r"n"rnrr-"n!n&66nnnnnnn#>n-&rnnrn&nrnnrrr&"n&rnn $'<<%# ,rrn"nrnn ,r-rrrr&*r
9 of 17n$n&nnn-%"$nnnnn%&r"$n&n%r$rnn%r&$6n6-nn%$nn%#r&nrnr&rrnrnn"nnrn$'<2%# ,rr"nn"nn"nnrrr"nrn&n#nnrnrn n"nnnn&rrrrnnnn*n&rnrnrnnrrnn$'<%# Disciplinary power uses the twin instruments of obs ervation and judgment, and the judgment is by necessity judgmental; is categorised by a satisfactoryunsatisfactory dichotomy. Such normalizing judgments are so pervas ive as to override their specific instances. "Humanistic" teachers may protest that t hey punish the misbehaviour and not the person; this may be true of their intentions, b ut does not describe the effects. Again Foucault spells it out; the judgments not only dimi nish the aberrant behaviour; they also produce the person:,rrr7nnnnrnn&n"r"rnrnr"r"#+nrnn*n"6nr6!rnrnnnr-nn"$'2'%#This translation of act into essence, of misbehavio ur into attitude, of error into ignorance, of absence into inability, is one of the political functions of Psychology. This transformation of event into label is an epistemolo gical error, a misrepresentation of the functioning process, but is crucial to the construc tion of those "individuals" of whom Foucault speaks. For as he indicates so clearly, th at individual first constructed in the eighteenth century, that educated individual being continuously recreated in "developed" twentieth century countries, is not characterised b y passion, creativity and an independent mind. On the contrary, the individual i s a person cleverly moulded by disciplinary power to be utterly reasonable (that i s, to deny emotion), completely responsible (that is, to deny spontaneity and creat ivity), and to be loyal and dependable (that is, to deny independent thought and action).Illich (1971) reached similar conclusions:Cnrr"rr""
10 of 17nn#,rnn&nnnr&n"nn#@rnn,rnrrr&-nrrr&r"nrrrrn$3%# ,rnrnnrnrrn"r5n n#8n"""nn#6,rnr"nn""nnnrnnnnrr rn #8nrn 6$5'..'23%#rr$%nnnnr&nrrrr#+r"nnnn&nnnr"nn"n"#8nnnr&nnr!8&n&*nrnrrn&*nr"nn"n#,rrrnnnn&nn!rnnnr&nrn#,rrnn&nnnrn&rnnnn#nrnnnnrnn"rn&r"nn&rnrr n,r"rnr"rnnnn"#n&rn"nn7nnnn&"&&nrr"rnnnrn#,rnn-&nnnn7nnnrn&nrr &rr*n)nnnn"&nnnn#$,rn"nrrn%n&rn"n"nr&n"nrnnnrn7nnnrrnn#8rrrn-n)nnnnr"&nn#8nrr-nn-n#n"&nn*nnn#0n#,r-n#0rr#nn&nnnnr#nr""nn"rn)n&nrnrr-#,rnnnnrr nnnnrnr&rrnrr
11 of 17&"nrrnrnn#,r""rnnnrnn#,r"nnnn&rrr"nr"rr#8r"")rrnrnnrnrn#8nr'.8n""n*nnrn""nnn7*nnn#,rnrrrrnrrnnnn"n7n7(n"rnrrn#(-rnnr-r-nn&r-nr#(nrrnnrrrnnnn7*nnn-#8nnnr,rrn&rn!rn&rn$rn9-%nrrnn n#,r&nnnr&nnn# !(nnrn"nrrrn"rrrnnr6&6rr&rrnn#,rn"nr&n"rnrn&nnnnn"&rnnnn#,rn9n"r&nrrn&n&n&nn&nn#,rn"&&nn&rr-nnrrr&rnrrrn# "+-nrrnnnrnrrrnnnrr"nnnrn#5nr 1r&rrr&&nr
12 of 17rrnrnrnn-&r&rnnnn*-&&rn nnn"#+r"nnn nnrnnrr-r#,rn*rnrrnnn&rn"rrn&rrnnn7nnrrn#nrrnrn"-&nrrrrnnr"nrn"r&nr&nn,rnnrnnr*,rnr"&nrnnn!###,r6nrnn6rr"r&n"nrn$5'..%# 1rn&nrn&&nrnrnnnrnrnrrn-nr"6rrr8n9:6n-nnrrnnn)nrnnrnnrnr-rnn&nrr#4rr"nrrnr#,rrrrnr-nrr&nrrnrn)r"nrnrnn&nr-n#n;nnr&&nr-nnrn!@nnrr&nr-nnr!n;nn,rnn#$?n&rnnnnrrnrnn&r%#8nrr&nr-n"r&rrnnnnnrrn#(rr"r-n"rnnnn!-n&*&n!-nrnnnn)nrnnr&*n#-nrnnrnnnn*r"nrnnrrnrn$&rnn%nn
13 of 17r#(rnrnnnrnnrnrr"rnrnrn#,rnn"nr-nrnnnrrr&*rnr&nrnnrnnrrrnnnnr-nnnr&r&r&"&rrnrnnn&&rrnrrnnr#8rnrn"nrr #r"nn"n&nnrrnrnr#nnnnrn"nnnnnnnrrnnnrrrr*rrn"rrn-#+n&!rrn)n"nnrnrnrnnn rr-nrn###rnn###rnnrn$5'..%# 0rn&nrrnnnrrnrn#85rrnrnrnnnnrn)nrnnnn#n8rrrnnrnrn&rrnnrnrn&nnnrr&rn&nnn7n&7nrrnn7n7rnnr&nnnnn#8nrn5$'..%nrnrrn#nrrnrn&r"r)n" nrnnr$rnn%r"&nrrnnrr"rnr*nnrrnn&rnrr&n"nnrrn)nrnnnn-rr"&nn"nn"nr6n6r"&!rnr
14 of 17&nnnrnnnnn&!rnrnrrr"nnnnnn)&rrrnn&nnnn"n"r!rnrnnnrrr&&n"rrr7rn)$3%# @r"&nrn59-rnn#5n@n $+'../%r n#nnrn&r&-nrnnr&n-nnnn#51-n$3'73%nDn$27.<%nrnnr"rrn)rrrrrnn&rnnn# Before discussing further the place that the examin ation plays in disciplinary power, I want to examine in more detail the notion of symbol ic violence, and the particular way in which it is concerned in the continuance and int ensification of violating structures through the imposition of meanings.The child who is beaten by her father, and is then told that it is God's command that she must always love and respect her parents as indeed her parents love and respect her, and whatever they do is for her own good, is being subj ected to symbolic, as well as physical violence. Her experience of being violated is being contradicted and negated. She is told that she is not being violated, but is being helped and loved. And it is not her parents who wish this, but God. She is unable to see that t he perpetrators of the violence, and of the meaning system, are both primarily concerned to maintain their own, and each other's, authority structures; that is, the hierarc hical power structures that have become institutionalised as family and church. And it is t he institutions themselves, not parental love or god, that legitimise the violence, and the justification for it. So these structures become stronger, and the human victims more confuse d and powerless. Let's take another example from schooling. Some you ng people are denied the right to continue their studies. Schools deny them access to further education and hence exclude them from a number of occupations. This is obviousl y a violation and unjust, even before we look at the inequalities of exclusion in terms of social class, gender and race. How is this exclusion achieved? Schools impose what specific knowledge and skills will be taught, and in so doing define what is useful an d legitimate knowledge, and how it will be taught, learnt and assessed. And these proc esses discriminate against certain groups, and certain particular sorts of people.The exclusions are legitimated supposedly through t he professional judgment of the teacher, who is able to distinguish a "pass" from a "failure." In fact, this is not true. It is the institution itself, the school, that legitimise s the exclusion, and inclusion. For the teacher outside the institution, no matter how high ly qualified professionally, cannot accredit. On the other hand, the institution can ac credit with a multiple-choice,
15 of 17computer-marked assessment system that completely b ypasses the professional teacher. So what are in fact rather arbitrary impositions by the school are disguised as professional judgments about skill, ability, and in telligence, and then codified pass or fail with the appropriate label attached to the stu dent. These judgments are then accepted as legitimate by all parties involved, including th e great bulk of excluded students, who know at one level that they have been duped, but do n't know how. In these two examples I have tried to elucidate the particular properties of symbolically violent meanings. Firstly they are meanings imposed and legitimated by institutions of authority. For example, by institutions that contro l morals or education or health or information. Secondly they are designed to convince that what is violent is indeed not so. That what is unjust is indeed just. That what i s inequitable is indeed fair. That is, meanings that are symbolically violent negate our e xperience and feelings. And thirdly, the authority appears to come from a source other t han its true one. From God or some moral or professional source, rather than being del egated from less visible power structures of church, caste or class (Wilson, 1991, p26). These are specific examples of Bourdieu's (1990a) m ore general proposition that@"&"n#"rrnnnnr&nnrnrrr&n&rn#####n&*"&"nnrn&&n&$%# +rrnrnnnn!&rrnrrrnn-nnnrrrnn!n&rnnnr"&n$rnnn%rnnnn#+rnnn&nrnnnrnn&nn"nn&nn-&nnnn6rnrn6n6rnn6$+'../2% ,rrnnr&nr&nrnnr&nrnnn&nrnrnr&nnnnrr&*"rrnn&r&*"rrnn#$.%###rn"nrnrn"nn$nrn%&n#,rnnr"rnrrrnn&$+'../'/%# #
16 of 17(rnnr6"6nnrrrnnnnr&nn&rrr+r>&&nrnnrn&nnrnn#8nrrr&n6rnnnnrnr&6$+'../3%#,rr&rrrnnnnn67&n"rrr&nr&n&n6$+'../%#+$'../&%rnrnrrr&nn ,rr&rrr&nnnnn#0rnnnnnrrnr#1)n *rr&&r&nnr7&&r"rnnr,rr&nnr&nr&n"n&rnnn&r7nnr&n&*"n7&!rnnnnrrrrnr"nr7nrn7r&rr"rrrnnnr"rnr$3%#So the rules of the game construct the players, who in turn construct their own particular version of the game. And those who play the game th e best are the winners who continually reproduce the game in its infinite vari ety, and create the illusion of freedom whilst the rules become ever more fixed, for,r-rrrr##nnrn&rrrrrnnnrn&nnrnrnrn###,rn&-n&rnrnrrr&nrrrnnrnnrnr&r7nn7nr$rnnrnrrnr"&nnn %r"rrrnnrnnn"$+'..//%# +$'../%rnr)nrnn9nr")nr"rnr$3<%#
17 of 17$r8nrr8r""r&-rnnrrrrnnr#,rnnnrnrrr&r"nrrr"rnr""nn#8nrrnnnnnrrrnn rnnrnnnrnnrnn"n!nn"n&"n&*nnnnnn#8nrnr8rrrnrnr"n&rrnrnrrnr"nrrrn&nE
1 of 6nr In this chapter, I take the more general ideas abou t power relations discussed in Chapter 3 and apply them to educational systems and institu tions; in particular I unearth the many small social control mechanisms that pervade t he school, and what sorts of people are produced by those mechanisms. I then examine th e examination; how it normalises and individualises, and how it is impotent without the notion of the standard, the sword that excludes and rewards, the wedge that produces the gaps. That brings us to the focus of this thesis, the sup pression of error. There is a field of educational scholarship devoted to educational eval uation and measurement. Thousands of books. Hundreds of Journals. Most of the literat ure in the field is about errors in measurement. And of course, errors in measurement i mply errors in the measurement of standards. Yet in classrooms and universities and p ublic examining boards, on school reports and graduation and proficiency certificates there is a great silence. It is as though this literature did not exist. Even prestigious tes ting agencies skim the surface of the error issue. The question is why? Why this suppress ion of the obvious empirical fact that educational standards as a thin accurate line have no empirical existence? It is to this question that the remainder of the chapter is addre ssed. I examine the crucial part that the standard plays in the whole mechanism of defining cut-offs for abnormality and non-acceptance, and ho w important it is that these standards be seen as accurate if current societal structures are to be maintained. Restrictions, penalties, productionsIn the day to day operation of the school the power relations are activated through an array of petty restrictions and micro penalties, un related to the supposed primary function of the school as an institution designed t o maximise learning. In most classrooms the policing of these restrictions takes a considerable amount of teacher time and often consumes more physical and emotional ener gy than does their teaching function. In many large High Schools in Australia, the major activity of the Deputy Principal is to deal with children with whom teache rs are having disciplinary problems. We are obviously dealing here with what is a major part of the school curriculum, regardless of whether it appears in the official st atement of syllabus. There are restrictions on appearance and dress; on what may be worn, and how long or short it is; whether this be skirt, shirt, pants, h air, necklace, ear rings whatever differentiates from the norm; whatever distinguishe s an idiosyncratic persona; whatever, by whatever means, makes a public statement about p ersonal autonomy. The restrictions will not be specified in detail, for fashions chang e too fast for that, and student creativity is limitless. However, the judgment of the school i s, in retrospect and by definition, impeccable in these matters, and their verdict will rarely be contradicted, and never successfully challenged, by students. (or parents, for that matter). Significantly, school spirit, cooperation, health and safety, economy, eq uality, fraternity, are all likely to be
2 of 6part of the supporting ideology. But never conformi ty, for this would contradict the school ideological aims of developing individuality and autonomy. Yet surely conformity is what is being produced here; conformi ty, and the acceptance of the social sanctions that non-conformity bring.Body, movement, speech and relations must be decoro us: body and clothes must be not only clean, but tidy. Movement is both restricted a nd restrained: students should remain seated and never run in the corridors. Speech shoul d be proper: slow, well-articulated, free of slang, swearing and salacity, respectful in address and tone, and preferably in the dialect of the upper middle class. And social relat ions should be moderate, free of all excesses; of love or hate, of enthusiasm or alienat ion, of spontaneity or cliquishness, of autonomy or dependency.As well as physical and emotional containment, ther e is temporal curtailment. Work is restricted to what the timetable dictates. Maths mu st not be done in the history lesson, history must begin at 10 am., and no one may visit the toilet until 12.50 pm, unless they shame themselves by asking permission, and then onl y maybe. There are a whole range of penalties utilised to re assert the power structure should any of the multitudinous restrictions of the school be breached: further physical containment during recesses, deprivations of various sorts, pet ty humiliations such as standing in corridors or outside offices, threats and harassmen ts of various kinds, and finally physical punishment, suspension or expulsion. In 19 97 in Australia the most popular fashionable sanction is called "time out", a broad notion that contains various shades of physical isolation, and which schools insist is not a punishment. The penalties are really of no significance. It is the acceptance of the pen alty, which reinstates the integrity of the power structure, that is important. It is important that some students rebel, so that the power relations might be demonstrated (Wilson, 1990 ). So what is produced through these restrictions and penalties? What is learnt? First, temporal regularity. There is a time to start and a time to finish, a time to sit and a time to stand. And these times are planned and arranged and policed by others. What is learnt is that time is determined not by the imperatives o f life as they manifest themselves, nor by any plan that might make for some personal produ ction, but by the dictates of people in authority, by the demands of an institution.Second, physical containment. There is a space to b e and a space to sit, and sit, and sit. What is learnt is that the demands of the body are not important, and it is preferable to forget that you have one.Third, emotional contraction. What is learnt is tha t the exuberant emotional and psychic field must be reduced to the physical limits of the body, so that feelings and emotions are pacified, and the self reduced to placidity.And finally, what is learnt is that all this has no thing to do with the maintenance of power relations, or the production of a social bein g, but is an unfortunate addendum to another far more important purpose; a necessary pre requisite for effective learning of the knowledge specified in the school curriculum. What is learnt is to misrecognise the social function of schooling.Illich (1971) summarises the situation, calls it fo r what it is, and sees only one solution:
3 of 6r rr!"##r$%&'() nnrrBefore accepting or rejecting Illich's ultimate sol ution, let's look more closely at some of the specific mechanisms that produce this "alienate d institutionalization of life." First we look more closely at the examination, and at the particulars of its function. Foucault (1992) certainly affords it pride of place among the mechanisms of disciplinary power which he elucidates:*+,"-#,-r"*r+-r+!"r""r"*r#r+&./') *+ r&r)r0rrr#r"%+-rrr#r0rr*r
4 of 61 1# #+ rrr#rr2&.334)rn rr+ +11 11-,%,&.35)It is at these crucial points that define exclusion that any error becomes unacceptable. These are the points that define, not so much the n orm, but the gaps that define abnormality, unacceptability, dangerous deviance. T he normal is indeed defined by a broad grey band, but it is essential that the abnor mal be determined by the thin red line that separates. And that line, that thin red line w here the blood flows, is the standard. Standards and swordsFoucault does clearly show how the battle lines are drawn up. He displays the deployment of troops and the strategy of the battle With unerring accuracy he pinpoints the diversions and ambushes and the misinformation and propaganda that camouflage the major thrusts.Even so, he pays almost no attention to the major w eapon which ensures success, to the one notion without which the whole structure is uns table; he downplays the construction that turns a house of straw into a house of bricks, and allows that momentous separation between the good little three little pigs, and the big bad independent wolf. Could it be that his academic self wished to retain this last b astion of its own identity? Regardless, without the steel edged standard to cut off the tail with a carving knife, and without the standard chippy chippy chopper on the b ig black block to lop off the heads that are too way out, disciplinary power is reduced to a shadow. The notion of the norm is dependent for its existence on the notion of the not-norm, on the notion of the abnormal. And the abnormal owes its existence to th e act of separation. Regardless of how disciplinary power is deployed, w hether through the micro-penalties of day to day detail, or the graduation rituals of national examinations, or definitions of insanity, the thin line between the acceptable and unacceptable must be drawn. And it can only be drawn by evoking the idea of a standard of an cut-off point that can be accurately determined and applied. All this regardl ess of whether we want to evoke
5 of 6democratic values, or scientific values, or aesthet ic values, or other "expert" values in determining the standard, and then measuring it.For without the notion of the standard there can be no classifications, no qualifications, no exclusions. There can be no norm, because there is no abnorm. There can be order, but without the standard there can be no disorder; Without the standard, we can still construct an order of merit, but cannot differentia te excellence, or determine exclusion; we can still individuate by placing on a line, but we cannot delineate winners because we cannot define losers. A race where everyone gets a prize is like a race where no one gets a prize; it loses its purpose as a race, and soon b ecomes a game that no one wants to play. Gilbert was right: "When everybody's somebody then no one's anybody." The blade must be sharp. There is no room for error There is some aesthetic beauty, some notion of swift justice, black and violent as it might be, in a blade that cleanly and swiftly decapitates. Yet a mangled hatchet job will inevitably evoke horror. And so it is with any application of the standard. The acceptanc e of classifications and exclusions, both by those who apply them and those who are thei r recipients, are dependent on the precision and truth of the standard. Without these qualities the whole examination exercise becomes exposed as a political ploy to ord er and control, to reward and exclude, to hold in place vast structures of inequi ty. In short, it becomes exposed as a hatchet job. A place to hideIf it is indeed true that the notion of standard is central to the maintenance of cultural identity as we live it, as central perhaps as was t he notion of God to the cultural identity of life lived in the Twelfth century, then we must not be surprised that the notion is highly resistant to empirical contradiction. Nor sh ould we be surprised that those who are aware of any such contradiction have some reali sation of its traumatic nature, and of the necessity to keep it secret.The human mind is remarkably efficient. Socially in clined as it is, it realises the only way to keep a secret is to hide it away. So the sec ret becomes a secret from one's own consciousness, locked away down there where angels fear to tread. The unconscious is nothing more than this; the space where we hide wha t we know from our conscious selves because the knowledge contains a truth that is too hot to handle, an awareness too destructive to life as we know it.Would the social world we know really collapse if t he notion of the standard had to go? Would we dissolve in chaos, or move gently onward t o build a better world? Or would we simply find another subtly socially reconstructe d lie to replace the one we'd lost? Summing upWe have seen how central the notion of standard is to the maintenance of the social structures of power in which we are enmeshed, and t o education's crucial social function of categorisation.
6 of 6There are affect components involved here; the bear er of the standard is clothed in fancy emotional underwear, wears a colourful mythical cos tume, and carries a sceptre that denotes moral high ground. In the next chapter we e xamine some of these other dimensions of the assessment fairy tale.
1 of 10nrrrr After a brief look at myths and rituals, and the sp ecial place they hold in our thinking a place apart from critical thought, I assign the ide a of the human standard as currently understood to this mythological sphere.I look at the emotional intensity of discourse abou t the standard, its significance as an article of faith, a basic assumption, an ideologica l king-pin, and at who gains from the non-recognition of its problematic classification. Specifically, I show how the notion of a standard of behaviour in families helps to mainta in the family structure; then I examine in some detail the mechanisms the school uses to ma intain "emotional" standards by denying the reality of human feelings, and how this is related to the maintenance of control, of good order. FlagsWhen the army begins to march, or the Governor retu rns to his residence, the event is heralded by the raising of the Standard. The flag i s the symbol of their power. When we salute the flag, we do obeisance to that power, in which glory resides. And, when power is embedded in the relationships of human structure s, we salute the standard, we pay homage to the strength of those structures, simply by our willingness to play our designated part within them; in short, by our subse rvience to structural dictates, and our acceptance of relational obligations.This language is hard to live with, this descriptio n too intense for comfort. We need a softer cushion on which to fall, a more prophylacti c myth to justify our allegiances and comfort our losses. As we shall see, we will find s uch justification in the world of moral values.These relational structures often have no visual sy mbol to represent them, though particular versions of them proliferate in the form of corporation logos, school and family crests. These are usually of limited emotion al impact. More successful have been brand names for clothes, where the image behind the symbol has been so successfully assimilated that not only are consumers willing to pay much more for the product, but are proud to become walking advertisements. Some Ja panese corporations and some sports teams have managed to construct songs that f it the bill. But in general the "flag saluting" within families, schools and workplace ha s been accomplished more through particular discourses with words and body language than through responses to visual symbols. Discourse and value mythsI use discourse here to describe not only "what can be said and thought, but also about who can speak, when, and with what authority. Disco urses embody meaning and social
2 of 10relationships, they constitute both subjectivity an d power relations"(Ball, 1990, p2). Discourses thus constrain the possibilities of thou ght, and are defined by what is absent from them as much as by what is produced through th em. So what are the key elements of discourse around st andards? What are the words and phrases that trigger a "flag" like response? For wh ilst it is true that most social structures can, if necessary, muster some physical force in the form of army, police, courts, psychiatric hospitals, masculine muscle to deal w ith minor perpertations of the structure, the inherent strength of the structure i s vastly greater than such disciplinary mechanisms that may be utilised. Just as in a cryst al it is the individual molecular bonds which bind the crystal in its hard, rigid and deter minable form, so it is the acceptance and actioning by each person of the appropriate rel ational roles between people that account for the maintenance and solidity of the soc ial structure. So how constitute a symbolic reminder, a conditioning stimulus, a ritua listic nudge and wink, that stimulates and fortifies the memories of our proper relationsh ips to those who lead us or are led by us, to those who love us or whom we should love, to those to whom dues are owed, or to whom we owe our dues?The gross but honest dictates of parent-child relat ions are not effective with adults, or for most children for that matter, raising as they do s o much overt rebellious reaction. "Do what you're bloody well told" does not trigger the appropriate response. The linguistic flag carries much more powerful symbols in its armo ury. Looking upward, we see Duty, Loyalty, Respect, Discipline and Strong Leadership all emblazoned on the High Standard in gold letters. And looking downward, the cold sharp chisel of Efficiency nestles neatly in the caring hand of Institutional Love. It is important to understand that once these abstr actions are incorporated into a personal value system, so that they become part of a way of being, a way of institutional living, the ground of faith on which hierarchical life is p remised, then dependence and obedience all become responses that inhabit moral h igh ground, for they are necessary to maintain, not the hierarchy, but the values in whic h it is now delicately clothed. And the violations they entail work efficiently underground in this hallowed space. Further to this, the more intense and horrible the violations involved, the more pervasive and enduring the myths and values that provide the cover up and justify the carnage. The Freudian myth embodied in psychoanalysis regarding the sexual fantasies of children is a good example. The myth enabled child sexual abuse a nd incest to be disguised and trivialised for a hundred years, as we are only now beginning to realise; sexual abuse of the child became translated through therapeutic dis course to sexual fantasies of the child aimed at the adult (Masson, 1991; Miller, 1984). Th e myth of the glory of war has required the joint barrage of visual human slaughte r on television, together with an appreciation of the probability of global nuclear e xtinction, to diminish its insidious hold on our thinking. And even now the monster will not lay down and die. And there is another aspect of enduring myths that we must not forget. Such myths do truthfully represent a part of the human condition. Many children do sometimes act seductively towards their parents. There is a form of transcendence in the self sacrifice and comradeship that is a part of some men's experi ence of war. Yet when these myths are used to disguise the carnage, rape and pillage that are their major manifestations, then such myths become not the harbingers of truth, but their disguises.
3 of 10What I am asserting in this thesis is that the myth of the human "standard" is just such a myth in the more "civilised" wars of structural vio lation in which our lives are embedded, wars no less destructive of human life an d potential because their weapons are so insidious and subtle: Wars to which at this time in our history it is now appropriate to turn our attention, so that we may, in a non-violent way, bring about their cessation. Standards and disciplineTalk about raising educational standards evokes int imations of glory and solidarity, of battles won and lost, of remembrance of our depende nce on elite leaders and arcane specialists. Who talks of the shocking implications of lowered standards and the necessity to keep them high, and to whom do they ta lk? Who are the flag-bearers to defend us from the horrors of mediocrity, and the h ellish consequences of the (inevitable) average? What do such utterances heral d, and do what do they respond? (Wood, 1987, p214).In the public arena, whether that be the political castle of public affairs, the media circus of public relations, the disciplinary field of the public service, or the common ground of the public house, talk of raising standards is inva riably linked with the idea of better discipline. Contrarily, the cause of lowering stand ards is clearly tied in public discourse to soft leaders and the inevitable anarchy which th at is fantasised to produce. So "standards are also values to which people aspir e or lament the decline in or lack thereof." (Norris, 1991, p335). People talk about r aising standards when they perceive a slackness in the ropes of control, when they see a sloppiness infiltrating the verities of life, when they begin to be fearful about life's di minishing certainties. Talk of standards is talk about conservation, about protecting the pa st in its imagined superiority and security, and defending the future through strong l eadership. "Discipline," "Respect," "Standards," "Leadership" are almost interchangeabl e words in a discourse that lauds the good old days and decries the soft underbellied fre edom and license of the present. It is the language of the old talking about the young, of the powerful talking about the rest of the world, of the mind talking about the body, of m en talking about women. And these days, let us be fair, of some women talking about m en. By implication, it is discourse that defends appropriation and privilege, and the s tructures of inequity in which they flourish. Suffering togetherHeraldic and educational standards both also share a deep emotional component, digging deeply into the well of group identity that tribes and political parties, multinationals and nation states, know so well how to bring bubbling a nd boiling to the surface. We all know the clarion cries that activate the emotional unity that is evoked and manipulated by demagogues the Fatherland, the Motherland, Our Land, Our Nation, Our Church, Our Family, Our Team, Our God, whatever its particu lar form. Words that recall our common heritage and our common destiny, and the myt hs and ideologies that surround that communality; we lose our individual and insign ificant identity in the power and
4 of 10communion of the group, and are seduced into forget ting our fear even as we lose our freedom.Through such languaging the notion of standards and their conservation becomes emotionally tied to our deep sense of wanting to be long, wanting to have our place in the social world. And of course, our place in the socia l world is dependent on the survival of that social world in which we have our place.At the very least, discourse about standards will b e emotionally charged. Talk of changing educational standards is like talk of chan ging the flag. It triggers all the fears of change in the social realities, be they ever so vio lating, for which the standard, and the flag, are symbols.By insisting in this thesis that educational or abi lity standards have no empirical reality, I cut much more deeply into the social fabric. For su ch a claim not only undermines the standard, but also by association denigrates the so cial reality that it represents. The metaphor is not changing the flag, but destroying i t, on the grounds that the social order that it pretends to represent is a delusion, very d ifferent to the one that it does indeed refer to. A delusion whose continuance, furthermore is largely sustained through the emotional effects of the inviolability of its recur ring symbol, the flag. The person who destroys the flag is inviting extrem e social response, for such is its emotional content that many people will identify th is map with its territory. For them, to destroy the flag is to destroy the social order it represents, and thus to destroy their identity within that order. Emotionally, social sym bol and social reality are contiguous. For many people, this contiguity overlaps and symbo l and referent become identical. In this state of mind, cognitive arguments and empiric al data have as much impact as falling animals crashing into rocks. As much impact on the rocks, that is. In an analogous way, to criticise the notion of edu cational or job standards on the grounds that they cannot in practice be measured or logically sustained is to destabilise the symbol of the meritocritous society, the compet itive capitalist order that it supports, and the cult of individualism that, almost alone, i t defines and constructs. Emotionally, these four constructs standard, competition, meri tocracy, and individualism, are deeply intertwined. To threaten one of them is to threaten all. And to threaten all is to threaten each one of us, you and I and him and her. For it i s to threaten that social order in which we all, in our own way, or more likely in a way tha t the structure has imposed on us, has found our place. Fact or faith the sociological imperativeSo the standard is a social construct whose meaning is not dependent on any empirical evidence to support it. The flag is not a bit of cl oth attached to a pole; it is an idea, a social construct, with which most of us, individual ly and in a group, interact in fairly well-defined ways. In a similar way, money is not a piece of paper with pictures and writing on it. It is again a social construct which most people are willing to agree has a certain meaning which includes an intense emotional component. But again, a social construct dependent on faith for its continuance. L ose that faith, and the value of the money evaporates.
5 of 10Likewise the notion of a standard: It is a notion, an idea, a social construct that helps bind together the social structure that brings orde r to our lives. If, as I have suggested, it is a very fundamental construct, one which is centr al and crucial to other social constructs which in this time and place are thought to have particular value in constructing (and thus validating and justifying) t he social relations in which our lives seem inextricably enmeshed, then even more reason f or letting it alone, for not subjecting it to too critical inspection, for not u ndermining a fundamental article of faith. Articles of faith do not need empirical evidence to support them, and are extremely resistant to empirical evidence that casts doubt on their logical consistency or their stability or their contradictions to other articles of faith. For articles of faith tend to develop around themselves other ideas and ways of r elating that are reasonably consistent with them. These coordinations then cons titute a way of living in the world, a set of habits that helps give a sense of stability and thus timelessness in a world in which change is inevitable on every street, and chaos is just around the corner. They constitute, in other words, what we call social reality. They m ight more accurately be called the social fantasies we construct and live that help ma ke the conditions of our lives, and the lives of selected others, more bearable.And if this cuddly teddy bear turns out to be a rea l dragon, destroying the lives of many more than it supports, then all the harder to slay it. The psychological imperativeWhen we are dealing with the educational assessment of students we must add the teacher's psychological necessity for accuracy. At some level teachers all know how important their assessments are to the futures of t heir students. They all are aware of its use in social stratification, and its more negative function of the excluder, and the destroyer of personal dreams. And this mechanism op erates through self exclusion as much as exclusion by any external force.This is the load the assessor carries: for the stud ents themselves usually accept the judgments made of them, and compose their lives acc ordingly. This is self imposed as much as it is dictated by any external agency. So t hrough their assessments, teachers have monstrous effects on the future lives of their students. This is an acceptable load if the assessments are very accurate, and do in fact m easure the capability of the student. But if they are enormously in error, what then? Wha t is the psychological price of instigating massive inequity, enormous misplacement ? Instrumental valueThe notion of "standard" has a particular function in the value conglomerate of respect-discipline-efficiency that is a major part of the ideological glue that helps hold hierarchical systems firm. For the standard is the value that mediates between ideology and structure, between the moral values, and the re lational power systems that they support. The standard defines the point of action a t which any disjunction between value and experience is challengeable.
6 of 10Let's see this in action in two hierarchies; first in relation to respect in the home; then in relation to emotion in the school. The familyIn a family, duty, obedience, respect, discipline a re continuous, rather than binary, constructs. That is, children are more or less duti ful, or obedient, or respectful. One child is more disciplined than another. So how do we know when we reach the point where acceptability is breached, where unacceptability is reached? We know because what has occurred is below the standard. As parents we "know there are standards of behaviour that must be observed. And the disciplined child is one who knows, accepts, and behaves within the limits of these acceptable stand ards. And these standards are not of my making as a parent, but something that "society" demands. I may have very high standards, in which case I may be tougher (and henc e more moral) than most others. Or I may be softer (and hence more humane or emotional) than most others. But the myth of a "standard", that point of demarcation between acc eptable and unacceptable, is implicit in both these positions. And my duty, as a parent, is to maintain this standard. That this standard has no empirical stability (cert ainly not for the group and generally not for the individual) is insignificant in the lig ht of its logical necessity to maintain the structural stability of the family. After all, how can a parent ever demonstrate the extent of power difference if that difference is never con fronted with an explicit, implicit, or fantasised challenge? Sexuality and schoolThe hierarchy that is the school is much bigger and less personalised, so is harder to hold firm. So there are many standards of behaviour to h old emotion in check, and many standards of cognition with which to gain leverage on the mental processes. This is equally true for both teacher and student. We like to make an ideological separation between school discipline and the school discipline s, yet the processes by which each are engendered are similar if not identical.So how are emotions in a school controlled through the imposition (or better still the personal incorporation) of standards? Firstly there is the professional standard of distance, of objectivity, of detachment. Emotional involvement, whether positive or negative, is taboo. Professionally the emotions are controlled by pretending that they do not exist. On the positive side the standard is tha t low level of affect described as "friendly interest." For young children this may be expanded to "fondness" unless you are male and the student is female. On the negative side the standard, the limit of negativity, is a low key sternness that accompanies correction. Essentially these low level affects are seen as acceptable nuances of cog nitive behaviour. Neither anger nor love have any place within the pr ofessional role of the teacher. To indulge either is seen as a breach of professional ethics. Such standards are justified by claiming that any relationship with students involv ing emotion would be dangerous to the students involved and unfair to the others. Dan gerous because escalation could lead
7 of 10either to violent or sexual outcomes. An example of the catastrophic consequence justification. This disguises the stronger and more immediate danger, of course, which is to the stability of the power relations. Legitimate anger at the inequities hidden in that structure, or of love that transcends it, both pose fundamental threats to its continuance. For the student in school emotions are also ignored They have no place and so do not exist. Any acting out of emotions however is given high priority and the school disciplinary structures are immediately brought int o play. The emotions are ignored, but the behaviour is punished. This is equally true reg ardless of whether positive or negative emotions have inspired the behaviour. Indeed, the s chool authority is much more comfortable with handling the acting out of negativ e feelings of fear or anger or revenge or envy than it is with any overt expressions of lo ve or sharing or student cohesion, so easily interpreted as solidarity and hence politica lly suspect as potentially destabilising. Emotional intimacy between students, or between a s tudent and teacher, is rightly seen to be incompatible with the power relations that de fine the school structure. Two students who actively demonstrate their passion are likely to be dealt with more harshly (probably by expulsion) than are those who actively act out their hostility. Hostile students allow the school to demonstrate its own po wer. Loving students can only highlight the emotional vacuum of the school's stru cture; and incidentally expose the obsession with sexuality that underlies its prohibi tion. That the taboo is so seldom breached is evidence of the school's enormous power especially so during adolescence, where for many students it is their major preoccupa tion. Demonstrated or inadequately disguised love between a student and teacher, even if completely non-sexual in its overt manifestation, e vokes a response amongst teachers almost as powerful as the response to incest. Outsi de the context of the school, love between people of different ages is an accepted nor m, so long as the differential is not too great. Within the school context, it is condemn ed on the grounds that it is an abuse of power. The assumption is that the teacher has ab used his or her power over the student and manipulated the student's affection. No w whilst this may be true in some circumstances, and whilst the roles in the school h ave doubtless influenced the relationship, intense emotional relationships that develop between the two people (rather than between their partial selves in role) are much more than this. They are as common and as intense and as potentially fulfilling as are such relations occurring in any other social context.To understand the strength of the taboo we must und erstand that it is not so much the abuse of power that is involved here, but its elimi nation, its disintegration, its transcendence. Love and power are incompatible rela tions (Laing, 1967). Love is a state of openness and mutuality in which the other is acc epted in his or her wholeness, where there is trust in the flow of positive affect, of c ohesiveness. Control is the denial of such trust, and structures defined by hierarchical power relations are thus structures permeated by mistrust (Maturana, 1980). Hence the n ecessity to control and punish. So love relations between a student and teacher are not taboo because they might lead to sexual relations, or because they are unfair to oth er students, or because they represent an abuse of teacher power, or even because they mig ht represent a malicious manipulation of the teacher by the student. Or beca use of the many additional justifications for the taboo that we could construc t and fantasise. All would possibly at times contain some grain of truth, and all would mi ss the target by rendering it invisible.
8 of 10The fundamental immorality of such relations is tha t they are contradictory to the structure of the school, to its defining power rela tions, and are thus a fundamental threat to its continued existence.It is equally important to understand that this fun damental reason for the taboo will be disguised in any particular case by evoking the con cept of standards. The teacher is at fault because she has breached a professional stand ard of conduct which involves the abuse of power. The student will be at fault becaus e he has not realised his vulnerability and has not allowed himself to be sufficiently prot ected by the benevolent authority which has defined the standards of student behaviou r. Like so many rules in a school, this one, about loving teachers, does not appear in the rule book. Even so, no student would truthfully claim they did not know that it br eached the standard of acceptable behaviour. And few would be able to rationally just ify its abolition. As described earlier, the appearance of the standar d invokes an emotional response rather than a cognitive one. It bypasses notions of equity or justice that might grow out of a rational debate on the power-control issue, on the limitation of personal freedoms. It sidesteps any possibility of an ethical discourse b y asserting that a standard has been breached, and thus by implication some act at the b est unsatisfactory, and at the worst grossly immoral, has occurred. As the interpreter o f standards, the school authority no longer seems to punish in order to defend its unequ able structure. It now punishes in order to defend a high moral principle encased with in "society's" standards. A violation of human rights has become a defence of all those t hings that "society" holds sacred, which become classified under the general rubric of "responsibility." And the use of the "standard" is the primary mechanism through which t his mystifying ideological scam is accomplished. Mind gamesSo far I have been concerned with discipline, with the way the school deals with unacceptable behaviour. Yet in educational discours e this is considered an unfortunate by product of the school's function. School discipl ine is defended not so much in its own right, but merely as a prerequisite to the maintena nce of the disciplines. After all, the "real" reason children are at school is to gain kno wledge, to become adepts of the various disciplines. Such learning, it is claimed, is dependent on the production of order, so that any control function that the school has is there to maintain the order that makes learning possible. Children are punished in school not so much for their own sake, though "god knows they must learn to be responsible for their actions", but rather for the protection of others. All must accept the disciplin e so that all may learn the disciplines. Taken as an assertion about the nature of human lea rning, this is ridiculous. To assert that the best way for children to learn is to sit t hem down at desks in a teacher dominated classroom containing thirty or forty other children and change to a different topic every forty minutes is to deny most of what we know about the variety of learning styles and efficient learning environments. It denies a hundre d years of research about how people learn.Yet still the statements about good order, which in practice means being obedient and conforming, are central to the school philosophy. T he reason is that such claims are not
9 of 10amenable to educational discourse. They are politic al statements, not educational ones. They are ideological statements designed to preserv e the structure, and not therefore touched by empirical data. As articles of faith, as fundamental assumptions, they are flag waving slogans, amenable perhaps to emotional manip ulation, but not to rational discourse.All of which is not to deny that in an authoritaria n-dependency structure, good order is necessary for effective "syllabus" learning to take place. It is, of course. But beyond that, and more pervasively, it is that structure itself t hat is inimical to learning. And it is largely in reaction to that structure that disorder occurs. The ideology of order is necessary to protect those power relations from the dangers of rational debate, and the destabilising effect of em pirical information that such debate might make visible. Teacher stressThis ability of the system to protect itself from d estabilising influences is nowhere better demonstrated than in the matter of teacher stress.While teachers "stress out" in droves trying to mai ntain order, this is considered a second order phenomena. Their "real" function is to teach knowledge and skill, and school authorities consider it unfortunate that per sonal deficiencies on the part of the teacher might cause them stress.In South Australia, "Stress Leave" is only availabl e to teachers who are classified as "sick". Stress is a deficiency label attached to th e teacher, a medical condition divorced from relational life. It may not be claimed by desc ribing either the overt or covert violations within the structure of schooling, or by explaining it as attributable to professional or personal conflict with managers or students. The price of obtaining stress leave is the absolving of the institution for any p art in its causation. (Section 30: (2A), Workers Rehabilitation and Compensation Act, 1986, South Australia) Standards and destabilisationWe have seen how the notion of standard is a crucia l ideological and mythical element in the hallowed structure of society. And an essent ial characteristic of the standard for that purpose is that it can be accurately defined a nd measured. In fact, standards can sometimes be defined and measured, but the errors c ontained in such measures are very large. I will show that they are in fact much large r than the massive literature on educational measurement and evaluation suggests.Regardless, the notion of error is intrinsic and fu ndamental to any notion of measurement, and hence to any notion of measuring a standard as it is understood in the academic literature. Singer (1959) goes so far as t o claim that "while experimental science accepts no witnesses to matters of fact sav e measurements and enumerations, yet it will pronounce no verdict on their testimony unl ess the witnesses disagree" (p101). So experimental science requires differences in measur ements before it can decide what the
10 of 10"best" estimate of the measurement is, and the very notion of measurement is predicated on the notion of error. On the other hand any error in measurement is unacceptable if the notion of standard is to fulfil it's societal funct ion in the categorisation of people. Who would accept failure or exclusion on the basis of a mark of 49 percent plus or minus 15? Or even plus or minus one?The simple professional and ethical solution is to attach an estimate of error to every application of a measurement of the standard, a hab it deeply ingrained into practice in the physical sciences. However, this so contradicto ry to structural stability in the social world that to my knowledge the issue has never been seriously raised in professional debate about examinations, and when on rare occasio ns "ability" scores are presented as bands rather than lines they are based on reliabili ty rather than validity considerations, so are gross under-representations of error; they are fudged instrumental errors, rather than errors in assessment. Summing upThe standard is a crucial part of the assessment my th that is central to the stabilisation of power structures in modern societies. As such, atta cks on its integrity, the naming of the gross errors attendant on its measurement, and expl ications of the violations to individuals that accompany its use, will be resiste d. Notions of standard have a very high emotional char ge, and those who defend standards inhabit the high moral ground, as they defend the f aith. So challenges will be rare, and will be seen by mos t people as immoral, because they threaten the social fabric.In the remainder of this thesis, one such challenge will be mounted.
1 of 12nrnr r rrr !"n #r$rr %&rrrn nrIn this chapter four different frames of reference are defined; four different and largely incompatible sets of assumptions that underlie educ ational assessment processes as currently practised.First is the Judges frame, recognised by its assump tion of absolute truth, its hierarchical incorporation of infallibility; second is the Gener al frame, embedded in the notion of error, and dedicated to the pursuit of the impossib le, that holy grail of educational measurement, the true or universe score; third is t he Specific frame, which assumes that all educational outcomes can be described in terms of specific overt behaviours with identifiable conditions of adequacy, and what can't be so described doesn't exist; fourth is the Responsive frame, in which the essential sub jectivity of all assessment processes is recognised, as is their relatedness to context. Here assessment is a discourse dedicated to clarification, rather that the imposition of a j udgment, or the affixation of a label. MythologyIn the myth of meritocracy the examination is both a major ritual and a significant determinant of success. At the heart of this ritual between the practice and the judgment, between the stress and the carthasis, is the great silence, the space where the judgment is processed.The myth gives hints of what moves in this silence, for the myth makes three claims: the race is to the swiftest; the judgment is utterly ac curate; and success is a certification of competency.These hints tap the bases of the three frames of re ference for assessment that assume objectivity. However, other assumptions of these fr ames make them mutually contradictory. This in itself would be good reason for keeping the process implicit. For the assumption that inside the black box hidden in the silence is a mechanism, an instrument of great precision, may be difficult to sustain, if it contains major contradictions within its workings.
2 of 12Four assessment systems, with four different frames of reference, have staked their claim to exclusive use of the black box, their claim to b e the best foundation for the precision instrument to measure human what? Bit hard to say what exactly. To measure, perhaps, human anything. It may be sufficient just to measur e. Or even just to pretend to measure, to assert that a measurement has been made, so that a mark may be assigned to a person. Frames, myths, and current practiceThe Judge's frame is far more often evoked than tal ked about. The focus is on the assessor's judgment of the product. The major activ ity is in the mind of the assessor. Such terms as expert and connoisseur are essential to the construction of the accompanying myth. Faith is the requirement of all participants. It is explicit in discourses about teacher tests, public examinations and tertiary assessment, and implicit in all human activities that involve the categorisa tion of people by assessors. The General frame is the basis for educational meas urement, for psychometrics. The focus is on the test itself, its content and the me asurement it makes. Such terms as reliability and ability are essential to its mythol ogical credibility. It purports to be objective science, and hence independent of faith. As such the world it relates to is static, so there is no essential activity. It is ex plicit in discourses about educational measurement, standardised tests, grades, norms; it is implicit in most discourses about standards and their definitions.The Specific frame is about the whole assessment ev ent, and is the basis for the literature that derived from the notion of specific behavioural objectives. The focus is on the student behaviour described within controlled e vents; in these events the context, task, and criteria for adequate performance are una mbiguously pre-determined. Reality is observable in the phenomenological world; the essen tial activity is what the student does. This frame is explicit in discourses about ob jectives and outcomes; it is implicit, though rarely empirically present, in discourses ab out criteria, performance, competence and absolute standards.The Responsive frame focuses on the assessor's resp onse to the assessment product. Unlike the other frames it makes no claims to objec tivity; as such its mythical tone is ephemeral, its status low. This frame is explicit i n discourses about formative assessment, teacher feedback, qualitative assessmen t; it is implicit though hidden in the discourses within other frames, recognised by absen ces in logic and stressful silences in reflexive thought. Within the confines of communal safety such discourses are alluded to, skirted around, or at times discussed; on rare occasions such discourses emerge triumphantly as ideologies within discourse communi ties. The JudgeMost assessment in education is carried out within the Judge's frame of reference. The chief characteristic is that one person assesses th e quality of another person's performance, and this assessment is final. By defin ition the Judge's assessment is free of error, and therefore any check of the Judge's accur acy would represent a contradiction of
3 of 12his function. So such a check is not only unnecessa ry, it is immoral, in that it is an act likely to destabilise the whole assessment structur e by calling into question its most hallowed assumption.The Judge's assessment may be verbal and on-site, e schewing numeration and a special testing context. However, performance is usually as sessed with tests and examinations, with merit graded in some way. It is assumed that a dequacy or excellence in performance is described accurately by the Judge. F or this to be true, it must also be assumed that the test measures what it purports to measure, and that the marking, whether by the Judge or his assistants, is reliable Again, therefore, checks of validity, that the test measures what it purports to measure, or of reliability, that the test will give the same result if repeated, are not only unnecessa ry, but are unacceptable and demeaning.Judges must stand firm on the absoluteness and infa llibility of their judgments, for this is the essence of their power, the linchpin of their r ole, the irreducible minimum of their function.Thus they are duty bound to recognise standards, to perceive with unerring eye that thinnest of lines that separates the good from the bad, the guilty from the innocent, the excellent from the mediocre, the pass from the fail Talk to them of normative curves or rank orders or percentiles, all of which imply relative standards, and they will hear you out, wis h you well, and with scarcely disguised distain send you on your way. In their absolute wor ld such matters are irrelevant. They know what the standard is, and therefore their job is simple. Simply to allocate students, or their work, to various positions above or below that standard. Set hard in a rationalist world view, this is a bla ck and white world, a fundamentalist cognitive universe. The assumptions deny the possib ility of reality checks, so the collective fantasy easily becomes the perceived tru th, as human minds and bodies contort themselves to deny their more immediate exp erience. So let us see what that more immediate experience m ight tell us if another frame of reference is chosen. The GeneralThe second frame of reference is called the General frame. I used to call it the generalizability frame, but that word has been hija cked by psychometricians. The general has been privatised and corporatised by mathematici ans. The bird has been tamed and lost its wings. The general has become severely con tained in mathematical armour. What I am calling the General frame of reference is blatantly egalitarian and inherently relativistic in its conception, but has become cons tricting, reductionist and inequitable in its mathematical application. In one form or anothe r it has dominated the academic literature in educational assessment for over sixty years. Within this frame is contained most of the received wisdom from thousands of studi es in educational measurement and evaluation.
4 of 12Its two initial assumptions are shattering. One Jud ge is as good as another. And all Judges are inaccurate. God is dead!Now as Little Jack Horner understood quite well, yo u can't just stick in your thumb and leave it there. If you stick in a thumb you've got to pull out a plum or no one will say you're a good boy. And the plum was the third assum ption: There is a stable rank order of merit. So there is a true score.And there is a stable standard. It's just that, sor ry old chap, it's just that the jury does it better that the judge. Or perhaps it would be more accurate to say that we measurement experts, we psychometricians, can do it, with the j ury's help, much more accurately than you can.nnrnnnrnnnnnnnrnrnnnrn n!nnrrn &"r'()rrnn'*rn+nnnr$nr,rn'rrnr-nn"r'-rr)-rr"n-rr'-nnnr)./n0r-nrn'(-r+-nnn-"r'r--nr$r-n*r""nnn"nn-'1*r.nr-0n"n-'nn"*-rr$"'rnrr
5 of 12*rrrr""n$n"rn"$nrn")rr/r'&r"*nrrrn-"r-)rnrr""'($rrrnn"nr*rnr)22' "'3rn-nnrn-r"'4)rnr*n*rr)/r"n$n-)n"nr$rr'5r+6r-*7r$r-rn'4$n*"n'&rr/n'nr)rr'4-nn"nrrr$r-rr*rr'4"-$-rnr2rn2r'1nrr)-/2rn2-$r'rrnn'4/n*"--r-7"n-*nn8nnrr'9rr)r"r)rr'4-rrrrr)--nn)nnr$nn'&rr')-nn"nr"n7nnrnnrr+'4r.$)r$0rnnr--)r+rr"n' nnrnnnn"nn"nrnnrnn n#n$r#!nrnnnnnnnn
6 of 12%%rrn&nnn$r'n!nnn$$nrnnnn(n&nrn$$'!nn$nrrrn!nnn&$n!$nnnnn$nnrnnrnr!nnn$rn!nnn&n$n&n"nnn$r$r$rn$rn$nn&nnrn"n$rnn)nn*$n$r+nrrn$nnrnrr&n$rr&nnnrn,!nrn-nr&nnnrn$n,n
7 of 12nr&rn&n$nn$.nnnnnnrn$$rn$rrnrnnnnrrnnnn$r$nnnr!nn!n$!n$nn&n/nr0n&r!nn$(nrnr'nnnnrn$n+n(nnnnnrn$$0$nn$n!$n"n&$rn$n$nnn$rrn$nrnn&nnnn$rr"r$nnn$r$nnnnnnn$nn$nnr nnnnnn!n$rnnnnrnrnn$nnn$nnnn$rnnnnrnnnnnnrnnnnnr$nnnn n'rnnnrn$nnnrn/r0rn'nnnnn$nnn$n&nn$
8 of 12n&nnnnn!n!nnrnn(n(n$n!nr&$nrnnn$,1 *)*n*,rnr'rrr)rrr)r+'&rrrr)r+)*rr"/n-$rrrrn'*nr)"'/nr"n7"n*)r-"'&r$'("nrr"n7n"nrrn"n7*nrr*n'9n'*n/rrnr-nn"rn/rrn'4n)n/rrrnnn)rnnnr)-*)r)rrnr-nn-' rrr-n"*n"6*nrr)r"nrrr"nr"rrrr':--n)-n6)r8r-n)rn"n':rn)nr-rr&n+n)r)r-'1-n'r$r"-r"rr-rrr-r'4rnrrrrr")nrr)rr*r'nrr)r"rnnr"-rr"*'
9 of 12r"nr)"6*nrnr)'"6*nnrr)'&':")nrn-nrn)n/'#nr"*"n"*'&-n-"rrrr'&nn)nn*rr"nrr")'9r'nnn*nr'-/nrnr)rrrr6)rn*r'nn-r-rn8rrrnn)r"rr$$r"rnn'r-*nrrr$nrrnr)r-n-nr"*$rnn-)rr"nn'r-)nnr*n-r)rnrr)rr"nr*)nn"r-*--n)"r--")r)rr)nrr)'--rrn-nnr'&r-"nrrnrr)rr6"rrrnr"nrnr-n8-nnrrnr)n*n$nnr$r-nnr-nr*-rrrr)'r-*"*n--rr"$n"nr-nr':-*--rrrnrrnn-)"*n"6*rr'r-8"nnnrrrrrrn)r*rrnr*n*r)r"nn*r)rnr'&r*r*n*r))rrn/rn-8n*nrr;<-*$r'rnn$"n'&rnnr8-r"6*;5n$;5nr/;1-)rrn8;:*-n*"nn"nnr"rrn"*n$8r"r--6)"nnrnrn;&)r$r*'5r
10 of 12rrrrnn*r' #r*rrrnr*n"6*rnr)rr6)r"rr'5rnr)n"n"6"6"6'5nrn"-r8rnr7*"nrnrrrrr7rrrr)rn"-n*-rr)""n"*n$nn"6r"r)'5rrr)6)rnr'r)"r*rr*r)rrrr'*"nrr)"rrn*nnr/n-r"'nn"nr)rr"nr/rr-r6)r"*r'n-r)rn")rnnr'"6rrnr)r"6"nr)rr)nr""/r"nnnn"$*rnr)r/r*r'&-rrnnr'r*nrn)rrrr"**nr-r)n-r**nrrn6)r-n"n)rrr'*rrnnrnrrnnr2r)2r2r2r"$r)rnrr6)r.5nn =0'4r)rr*rrr*rrn7rr))rn)r-"rnr')n)nr'"*rr4r-rnrrrrrr*r'n"*"rrnrrrnrrr)rr>)r,rnrnnr)nrrrrr*nr'
11 of 12rr"nrr#r*'rrr)"$rrr*rnr)$nn-nn"r)r22-nnn"r)r*rrr)r$r)6)rnr'rn)>),rnr4rnrr)$n-n"r*nrn"r-'1rr-r#r*/r*"$r-nnr-rr*nn-$nrn'r#r*r"n7r-r)7r#r*rrrr*r*rr'rr*rrrrr8rn' :->)*;>),rnnnn*n$r)rrrr*r'4rr*nr"n-)r*n/-nrnr'&r#r*-"6*r"r"6*rn-n->)/n':-*-,rn*;>)nrrn*r)r':rn)rrrrnnr'4rrrr"nr)r)nrrr'&rrrrnrr,rn-nrrn'#r*r*rr"nrr'4>)"r)r)"rrrrn$r--'>-,rn)nr-nrr"r)r*"r'&r#r*"nr)-nrrr)rrr'#r*>)r*n"r"6*r',rnrnr6rr"nnrr*nn'&r
12 of 124rr"r/rrr-""nr"n"*' r4r"n"-rnrr':-*rn"nnr)'rn)rr-r"nrrn*')0r.,rn0rn-rn).40rn.#r*0rn$r$9/(*'rr/-nn-nrr'
1 of 11nr nnnrnnnrnnrnnrnrnnrnrrnnr !nnrrr"nr#n$nnr#nnr$nr#n#%nnnnnnr&nnr#nrnnnnr#nnn rr#n'#nrr#r&nr(#(n#&nnnr##r&nn)nnnrrn*rnnnnn rnnnn*rrrnnnrrnnnnn&nnr+nn*rrnr#nr'nnrnnnnr)nrnn$nrnnnrn&nnrnnnn&n&nn#r,rnrr
2 of 11nrrnnrnnr&rrnnnnrnrn -n#./01.20!3rnr#rrnn#nrnnn4nnnnnr&nrnnnrn,nnn#n#nrnrn3nnnnnnr5nnn&&n&rnrn*rnn&n#&nnnrnnnnnnrnnr+nrnnnrnnnnrr'nnr#nnnnr#nnnrnr&nr#nnnnn#&n&nrn(rnnnrn#n,6&nnr,n#rnnnnrnrr'#rrrr+n#nrrnnr#n&nrrnr'nrnrnn#n&nnrnnn#nnnn#nrn"nnrnnr#nrn5 r&nnn&nn$nnrn7nn+#n+n&n&(n#nnrnnrnn
3 of 11nnrnnnrnnrrnnrnnnnnnrn(r57n*rnnnrn58rrnr#nnrnn&n#r# nnrnnnnrnnnrnnnrnrnnr57nnrrnrn#nrnnrnnrnrnr#rnrr(nrnnnrnrrnnn"#rrrnn#nnrr$n#rrnnrnn#rnnrnnrnrrrn#nnn(nr%nrnnnrnnnnnnrr%nrn###nnnrnrrnnrrr(rnnnr%nrnrrrrnrn#nn#nrnnrnnrnnnrr3n#:r:nnnnr#nnr$nrn,nn#nnrrnnnnrr'nrnnnnr#nnnrnr'rnnrnnn#nnrn r'nn;n#nnr%nrnrnrnnrnnr$n+#nrnnn#nr#nnn+
4 of 11nn#nrnn#rnnnnnrnn#nnnrn::r%nrrnnrnrrn(rnnnn#rnnrnn#rnr#nn&nn#nrnn ;nnrr=nnrnnnn#nnrn#&!3nnr53nnrrnn#nn#nrr#5(rnrnnr5nnnnnnrn;nrn::r:r:r#nnrnrr#rnnrnnnrnn#nn#nn'nn&r&nnnnnrr>&nnnr?#,r+n,r#nrnrn##nn#nnnnnrn&nnrnrnnnn&7nnrnnrr#n#nn nn%nnr!nnn&nrnnnrrnnrnn'nnnrrnn:n:nrnrnnnnrn#nnnrn::nnrn#nrnn'nrrn'#n6rnr*rnnn#n#nnnrn#nnnnr'
5 of 11rnnnrnnnn#nrnnrnnnrn5(r+nnnnn#nrnnr5n#nrr%rrnr+n3n#nnn+n(r%rr+nnnnn::nnnnnnnnnrnnnnnrnnn:r:#n:nn#:n:n:nn#nnr#n#nn#nnn#+nnnrnnn#n#nnrrnrnrnnr(nnnnnrrnnnnrnn nnn!#nnrn n!#nr rn!)nnrn#nn6nrnn#rrnrr#nrn#rn(rnnnnnnnrn nn!nnnrrn'nnrnn#nrnnn:#nnr#:nnnnn'rrnn(rn+n%::nrnnn#nrnnrrnnnrnrnnn#&+rnnrr7nnr#rnnrn#nn+ rnrnn$nrnn
6 of 11nrnn#nrnnrrn&'#nnr#nnrrn7nnnnnnnrrrnnn$n,rnrnr#rnn#nnnnnrn,rnnnrnnnnrrnrnnnrr#rnnrrnnnrrnnnnnrrr'&nnn#nrrn##+*rr#nn#nrn7nnrrnn;n&nnnn '#./0/$@#./A2!)nn+nnnnn#n#nrr rr'
7 of 11rnnrn./10!(nrnnr#nrr#nnn#n
8 of 11rn#nnrnnnr#nn#nn#nrnrn#nrnrn&nnrnn&nn-nr+nnnnnnrn::nn#rnnnnnn&n&nnn#n&nn'nn#rrn#n#n&nnr Cn#!'nn#r+nnnrnrn'nrnr#n::nnrnnnr#nnrr*rrnnnnnn#nrrnnnnrnnnn(nrnrnrnrnn&nnnn:r:nrrnnr$nnnrr$;nr#rnrnrnrnrnnnrnnnr#rnrnr%nnrrrnrn&n&nnr#nnn-nnnnnr+#nnrnr&nnnnrnrnnnnr*rn=nnrrn,rnnn
9 of 11nr#rnnrnn#rrnr::#nnnrn#-nrnrnrrnnrnnnnrnnn3n#nr&nr%+>nnnrrnn'#nn
10 of 11nnnnnr3n#nrnrnn#nnrnnrnrrrn#rnnnrnr#rnnrn#nnn#rrnn#n+nrnr#rnr#nnnrnrnnnnnrn#nrnnnnrrr*r#n#nrnnnnnnnrnrrnnn#nnn, r!">nn-rnnnnn;nrn#nnnnnn#nnrn%rnnnnrrnn#rnnrnnBrr>nnnnnnrrnnr>nnrnnrnnrn#nnrnn#nrrn#nnrr,n#nnnnnnnnnn%n,nnrrrnrrn#n&n)nnnn,nnnrn;n>nnrnnnr##nrnnrnnr#rrnrnn,r r
11 of 11nnrnnn4nnr#n#nnnnnnrrn$r##rnrnnr#n#nrn#r#nnrn+nn&nrr#nnrn
1 of 12nrnrrrrrrrrrr !"#$#%r &rr!'()'*% rrrr+,,+++++,+)-r!%.,,,+'*&/$+#"#0"01"0#+2'#****2
2 of 12****#*,-,,,-rr2 r322-r-2-r),+, r+.&,,rr)rr !%) 4r5 & !r%+rrrrr
3 of 12!)%+rrr&r+r4r! "#$#0/%rrr-5&6rr&rr+,,!%6r& "10(1(7 7.&8,9rrr&r+r, +
4 of 12r+r.))8+:r;r,4+r)rrrr+)r+rr 6&8rr& 9999r r)!%&)&;rr+
5 of 12!%r&+rrrr+rr +r+&!"#1*%; r.!"#(% r:r
6 of 125-9-9 ?4r-.?,r!@"##("1'% -@r++rrr6 4,!%!%
7 of 12rrr5&BC nr++,+ + r r!% r)r& r &! "#$#0/%+r-6 +) rr) rr-,Dr+rrr&rr<+r&r+r
8 of 12/ !r-BB8-r+B&r+Brr&BB )
9 of 12E4r!."#1(%"+C)+!r%!%!%!%rr8rr!)% rF"2n,!%!%@r,,r&+r&r+!%+rr!%rr)+r+<&5F?.+2!r%5&0(:.2"(r @95.
10 of 125r&Fr&F.&r&r6661((.r&,;99999r9999-rrr+@r,,r)rr!"#$#*04B"#1(/*4C"#$14:r"#$1'"*45"#$10/'4%&,&r! F"0%+rr+r+ ++ -rrr r+&rrr+ !%r,)-F""
11 of 12rrrr.rr&rr.4r&-nr,rrB,rr& r+rrrrr+&)&rr#rr$nr.+r&r+,rrr ;
12 of 12r,,9r,-rBrrr,rrrrr ,-r)+rr,
1 of 15nrIn this Chapter I examine the notion of comparabili ty as it applies to the assessment process. Any rank ordering of students, any adding of marks on examinations, any addition across subjects, assumes that comparisons can indeed be made. The fundamental distinction between more and less, and better and worse, is first elucidated, and this is linked with ideas of unia nd multidimensionality and notions of doing or having. This analysis is then applied to i deas of traits, abilities, and skills, and their supposed measurement in tests and examination s. Some fundamental confusions are exposed.The discussion then moves to what meaning if any ca n be given to the result when marks or grades are added, how loadings on final rank ord ers are affected by spread of marks, and how differential privileging of sub-groups occu rs with different intercorrelations. Finally, it is contended that for individual studen ts the privileging is non-predicable, and the total score thus meaningless.Goal kicking skillsnnrrrnn nn!n!n n"nnnr#n$!n!n%nnnr&%nn&nn r'(#!%r)nn!%
2 of 15"nn#%n%n!*+&,n%%%n'#-&!%&nr"nn&!n!n-)&%%&)&%n!.n/0)nr,#$nn12r$!"&nn#%n$r rFundamental to the process of arranging orders of m erit is the notion of comparability. As we have seen, the notion of standard implies the notion of order of merit, which implies the notion of more or less, better or worse For such notions to have a meaning, they must refer to some aspect, some property that is being compared, that is presumably being measured.Regardless, the first paragraph slid past a fundame ntal distinction: "more or less" is not the same as "better or worse": More or less are ter ms related to counting, to mathematics, to scales and measurements. They are l oaded with notions of objectivity,
3 of 15and solicit entry to the quantitative world; better or worse are terms related to value, to goodness. They are permeated with the aura of subje ctivity, and are related to the qualitative world, the world of valuing. The concep ts are in different domains of discourse. If the criteria is size, then two people may be compared as being more or less heavy; or their weights may be compared in terms of better or worse in regard to health. But the two ratings are unrelated. Or if the criter ia is emotionality, we may rate people in terms of whether they are more or less emotional; o r we may rate them in terms of the appropriateness or productiveness or empathic clari ty of their emotionality. Again the two ratings are conceptually unrelated. Or so it wo uld seem. What is the essence of this difference? For when we tried to explain what we meant by better, we used words like healthy, productive, emp athic, clarity: and the interesting thing is that we may use more or less with any of t hese words, even though we started off in the better or worse category. And we may als o ask of each of these new criteria whether they are better or worse; in this case ques tions preempted in the predominant paradigm because value judgments of better are alre ady built into the words chosen to describe the criteria.So what is the essence of the difference? In relati on to aspects like size or emotion or clarity, when we ask the question more or less we a re asking about intensity, about how much or how many. We are referring to the aspect in isolation from its environment. The event that produces the judgment about more or less involves our sensory relation to that aspect independent of other aspects. More or less q uestions are answered by focussing on the aspect and on no others. More or less questi ons are directly answerable. The answer may be incorrect, but such a statement in it self implies that there is a correct answer. More or less has only one meaning in relati on to a particular aspect. They can't be more and less at the same time, so the question is convergent, and presupposes a world in which there is a true answer to the questi on. So logically more or less implies a uni-dimensional aspect, a world of transitive and a symmetric relations (Lorge, 1951, p548).On the other hand, when we ask the question better or worse, we have to ask another question, In what way better or worse? Because some thing may be better in some ways and worse in others. Better or worse in what aspect s? Or better according to whom? Or better under what conditions? And when we nominate those aspects we can ask of them two questions about any comparison; more or less, o r better or worse. And so on. Essentially better or worse implies multi-dimension ality in the aspect under consideration.What does all this mean? Very simply, when we ask t he question more or less there are no further questions to ask. We move straight on to the answer. In other words, more or less questions define the end of discourse; they ar e a direct invitation to a judgment; they are the signal to stop thinking, and act; and incid entally and significantly, to accept the judgment, which comes after the thinking has stoppe d. But the question better or worse logically invites more questions about the first criteria. In what way better or worse? Which introduces more aspects, particular aspects selected in most cases from a much larger set of possibiliti es. For there are as many aspects as our conceptual imagination may produce (Lorge, 1951 p536). Yet the original aspect is reduced, even as more precision is generated by def ining aspects; and as more aspects are conceived, the potential disparities of the jud gments concerning them increase. And
4 of 15then for each of those aspects: More or less? Bette r or worse? And again, the additional questions about positioning and context are generat ed. So better or worse questions encourage further discourse, and further thought.All this is not to deny that the power relations in which such discourse is embedded may dictate that the answer to the question better or w orse be given at any time and be accepted without further thought. But that in no wa y invalidates the additional logical questions that the aspect implicitly generates. Having and doing and beingIt is obvious, but important, to make the point tha t whole entities (holons) cannot be directly compared in terms of more or less, only as pects of them (Jones, 1971, p335). One dog cannot be more than another dog. Nor can a stone be more than another stone, nor a stone be more than a dog.In like manner dogs and stones cannot logically be compared in terms of better or worse, for such a claim is meaningless without a response to the question "in what way better?" A dog cannot be better than another dog. In terms o f dogginess, dogs are equally doggy; they are equal by definition, as being classified a s dogs. Likewise with stones. And dogs and stones cannot be compared as entities because t hey are in different classes. It follows that the very act of classifying whole entities (in to classes) logically invalidates any comparisons within or between the entities that com prise them. Classes of course can be compared in terms of the numbers of elements they c ontain, but this is a different matter. Two people are being compared in terms of the relat ive merit of some task. In terms of doing, we may say that one person does it better th an the other. This is a statement about relative merit. Or we may say that one person does it more than the other. This is a statement about relative frequency, and not of rela tive merit. You may drive a car badly many times.In terms of having, we may say that one person has more of something than the other. This may claim to account for the greater merit. It is essentially a statement about the comparative number of elements in a class. But we w ould not account for a difference in merit by saying that one person had that something better than the other. Such a statement refers to the whole class and whole class es cannot be compared except by numbers of elements.So in terms of relative merit, the question of more implies a different mode of description, a different ontology, than does the qu estion of better: Better or worse is a comparison of what people do under certain conditio ns, made by some person; more or less is a comparison of what people have, or are al leged to have. As such it is logically independent of any contextual or positioning variab les. One begins to see the simplistic delusion generated by mathematical modelling.Logically then better or worse questions cannot be answered definitively until they are reduced to a criteria which comprises a class in wh ich the question better or worse is reduced to the question more or less. Logical here means relations that are transitive and asymmetric.
5 of 15Pragmatically, better or worse questions can be ans wered whenever the criteria are sufficiently understood (implicitly or explicitly) to allow consensual subjectivities of judges to give similar answers. However, as we have indicated earlier, such criteria are multi-dimensional. And as is evident from the conve rsation that began this chapter, little if any meaning can be given to a uni-dimensional de scription of this multi-dimensional entity in terms of their uni-dimensional elements. As we shall see later, one meaning of such a comparison is dependent on the relative load ings of the different dimensions. Politically, of course, better or worse questions a re answered whenever someone with sufficient status or power gives a decision. Comparing peopleIt follows that to compare people, whole people, we may compare either some parts that comprise them, or some wholes of which they are par ts. If we look at the parts that comprise them, we may look at the person's elements or internal processes; if we look at the wholes of which they are parts, we may examine the person's functions and relations in the wider environment or community, or at the cu ltural meanings in which their thoughts and actions are embedded (Wilbur, 1996).Let us compare two people in terms of their relativ e merit in Physics. We are particularly interested in their relative achievement in a parti cular course of study at year 12 level. Such a course has a range of content and objectives and involves practical and cognitive operations of varying complexities.We are obviously in a multi-dimensional world, in w hich at this stage more or less questions are meaningless. Further, any logical ans wer to the better or worse question is going to depend on the details of the answer to the prior question: In what way better? What particular aspects? Under what particular cond itions? In whose opinion? And if we intend to give a meaning as well as an an swer to a multi-dimensional comparison, what are the relative loadings of each aspect in the final judgment? Of course, we could simply ask the teacher who taug ht them, who is better? And the teacher might give a judgment. But in making sense of that judgment in terms of the original question, the implicit questions still han g there; in what way better? So after the judgment, the teacher must logically justify the de cision on the basis of criteria; and if one is not better on all possible criteria, then th e question of how the criteria are loaded to obtain the final criteria is relevant.So, either prior to or after the judgment, how migh t the discourse progress?#r)%n#r"!r
6 of 15#r)n!!r3!!r"!!r#r&n% nrrrnnrnrnnrnrrnrnnrnrrnrrnrrnrrrrrrrrrrnrrnrnnrrnrnnrrrnrnrrrnrnrrrnrn nnrnnnrnrr!rrn"nrnn#nnrnrrrrrrnrnrrnnrrrnrnnnnnrnnnrrrnnnnn"nrnnn#nrrrnnnrrrn"nrnrr$nrrnrrnrnrrrrrrnnnnrnrnrnrnrrrrnrnnnrrrrrrrrrrnrnnrnnnrn%nrr&nnn'rrnrnnn(rrrnnrr&rrnnnrnnnrrnrnnnrr&rrrrnnnrrrrnnnnn nnnnnrnnrnrnrnrrnrr&nnrrrnrrnrrnrrnnrnrrrrnrrrrrrrnrrrrnrnnrrr)nnrrrrr nnrnrnrrnrnrnrnnnnrrnrrnrrrrrnrrrrnnrrnrrnn&nnnnrnnrrnrnnnrrnnnrnnrnnrnrnrrrnn*rrnn
7 of 15r&nnrrnnrnnnrrnrrnrrrn%nrrnr'*rrrrnrrnrrrnrnrrnrnnnrrrrnrrrnrrrrrnnrrnnrrnr& rnnrrrrnrnnr"rrr"rn)nrrrnrr#nrnrnnrrrnnrnrnrnnrrnnnnrn+n%,,'rnnnnrnrrrnrrnnnrrrrrnrnrrnrr.nrrrrrnrnnrrrrrrrnrrrnnrnrrnnnnrrnrrnnnr%/' *0rnrrnrnrrrrrrnrrr!rn rrrnnnrrr!rnrrrr1rrrrnrnrrrnr nnrnrrrnrnrrrnnnnrnnnnnrrnrrrr1nnrrnrrrnrrnrrnnrrnr!rnrrrrrrrnrrrr!rnrr2r0rnrrnrrrrrnrrnrnnrn3rrnrrnrrnrrn0rnrrnrrnnr!rn !r4&nnnrnrrrn rrr&nnrrnrrnrrnnnrrnr rnrnrrnrrnnr"rnnrnnrr2rrrnrnrrnrrrrrrrrrnrnrrrrnrrnr3nnrnrnrnr
8 of 15nrnrrrrnnrr&nrnrnnnr2nrnrrrnrnnrrrr!rn5r65nnnnnnr!rn5n7rr65rrnrr!rnrrrrnnrr7n2nrrrrr!rn5rrr5nrnrrnrn!rnnrrrnrrrr!rn5rr5nnrnr+rrrnrrnnrnrrrrnrrrnrrnnrr1nnrrrnnrnnrrrrrrnrrnrnrnrnnnrrrrnrrnrr+nnrnrrrrrnrnrrrrnrrrnnrrnrrrnrrr&nn#rnrrrrnnrr&rr&nn8nnrnnnrrrnrnrrrrnrnrr&nnrnrrn"nrnrrrn"nrnr#nnrnrn#nnrnrnnrnnnrrnnnrnnrnrnrrr#rrrrrrnrrr3rrnrrrrnnnrrr!n!nrrrrrnnrrrrrnnnnnnnrrnnrrnrr)rnrrrrrrrrnrrrrn-rrnrrnrrrrnnrnrnrnr!rnrnrrrrnrrnr rrnrnrrrrn rrnrnnrrrnrnr "r"2nr5rr5rrrrrrrnrnrnnrrnrnrrrnrnnrnrrn"nrn !rnrnrnrrrrnrnr&nnrrnn rn!rnnrrrrrrrrrr6 rr!rnn
9 of 15rrrrrrrnrrr69r7rnnnrrrrnrrrrrrnnnnrrrrnrrrrnr)rrrnrrnrr#rrrrrr%nnrnnrr'nrr+rnrnrrrrrnrr r!rnnrrrrnrnnnrrrnrnrrr6 nnnnrnn)rrrrrnrnrrrrrrrr2rrrrrrrrrnr63rnnnn"nrnnrrrnrrnr#rrnnnrrrrnrr%rnn'rrrnrrn)nnnnnrrrrr!rrrrrrrrrrrrnnr*nnrnrrrrnnnrnrrrnrrr rrnrnrrrrn)nrrrrnrrrnrrnrrrrrnrrr9r7rrrrn#rrrnrrrrrnrrnr:rr::rrnrrnnrrnrrnrrrrrrrnrnrrrrrrrr/rr;rrrrnrrr/rrrnrrnrrrr(rrnrrrrnrrrrrnrrrrrrrnnrr2nrnrrrrrrrrnrrrrrr%nnnrrnrrnrrnrrrnn'rnrrnnrnrrrrnrrrnnrr%
10 of 15rrr%rrnn'n/nrrrrr*rrnrnrrnrnrrr#rrrnnnrnnrrrrrnnrrnrrrnrnrrrnrnnrrrnrr%*rnnrrnnrrrrrn)nrrrrnrnrrnnnrnnrrnrnrnrnr)rrnnrnrnnrrnrrrrrnnrnrrrnrr&rnr'9rrrrnrnrnrnrr) rrnnrn>nnnnrrnrn >nnrrrrnnrrnrn?rnrnrnrrnrrrrr >nnrrrrrrnnrnnrrnrrrrrr)rnnrnrrnnrrrnrnnrrr = rrnrnnnnrrnrrnr!rnnnnrnr #9r7rr&nnr3rrrrrr%9.+'rrrr rrrrrrnn3rnrrr*rrrnrr!nrnrr9.rrr?r rrrr9rr.rrrrnrnrr!nrrnnrnrr.+rrrr+rrr+rr.rrrr.rnr3rrr+rrnrnrnrrnrr.rr+rrrr.rnrnrrnrrrrr&rnrnnrrnnrrnrnrnr
11 of 15#nrrrnrrrnrrnrr?rrnnrn>rrrn%nnrrnrrrnn'nrrrrrrnnrrnrnnrnr#rrr%@:'nr.rnrnnnrrr"nnrrrnn(rrnrrrrr%@'rrrrn.rrnnrrrrrrrrrrrrrrrr nnnnnrrrnnrnrrn=nrrrnrrrrrrrnnn0r&nnrrrrr%
12 of 15nr"r&rr r&rr&rrr1nrnrnrrrnrnnrr%rrrrrr&rnr'nrrrnrrr2rrrnnr&nnrrrrrnrrrrrnrr&rrrrnnrrnrrnrrrrr)rnrrnrrnnrrrnn5rr5#nnnnn%nrn'n&nrnnrr&rr&nnnrnnrrrnrrnnnrr%nrrr'(rrnrnrnnrrnrr&nn)r&nnnrrnrrnrrnrr&rrrrnnrrrnrnnrr#rnrrnnnn r %%%#n%#.!!n4%%!%5 67$%%!!!%!!!#!!n!)#r7'%!!%'r'nr3r)#!n%n%! %r$%%n#nn!nn!n#n!nn%n "$ # $
13 of 15I have shown how equal loadings for a group may tak e on different shapes according to the correlations. Equal loadings for a group does n ot in practice mean equal loadings for all subgroups of that group. And in terms of indivi dual students it doesn't have any particular meaning.The question then arises, does equal loading for th e whole group of students mean equal loadings for each separate school? Surely some scho ol groups are really better than other school groups so should be differentially loaded? S ome school groups might have higher means, and some may have larger or smaller standard deviations in the sets of marks that indicate their comparative attainments. And these m ight mirror differences in intrinsic ability, whatever that means, or might be a functio n of very good, or very bad, teaching, whatever that means. But if such students are teste d internally, how would we know about their differential potential, or their differ ential attainment, as distinct from differential testing effects? And especially how wo uld we know if they study and emphasise different things, and value different cri teria, so that their results are essentially non-comparable? Or if they study differ ent subjects, with utterly different realms of discourse, such as chemistry and Japanese ? Now there are a number of ways of trying to solve t his problem, all of them more or less inadequate. McGaw (1996) summarises them well: use some external examination (either the specific one related to the subject, a single "scholastic ability" test, or some grand total score on all external examinations) to statistically adjust the internal school results; this is statistical moderation of the scho ol-based assessments. Or alternatively "use some external review and checking of schools" assessment results by teachers from other schools or authorised assessment experts to c ontrol the level and distribution of school-based results (ie consensus moderation)" (p8 2). Such moderation systems provide different processes for modifying the means and standard deviations of school scores on the basis o f comparison with other scores or other schools or other students. To the extent that the correlations with the criteria (whether the criteria are scores or actual criteria in the minds of the moderators) are high, to that extent is the moderation reasonable, and possibly invalid. And to the extent that correlations with the criteria are low, or dif ferential, to that extent is error compounded, as we have indicated in the previous di scussion. I do not intend to enter into the debate as to whic h of these is the "best" way to go, or indeed whether they all do not produce solutions wh ich are more inequitable than the problem they were devised to solve. My project here is not to indicate how such problems may be best solved, but rather to detail w hat implications such solutions have for the empirical determination of error. Comparability errorWhat is clear is that different solutions, includin g no solution, produce different results. The notion of "true score" is dependent on the noti on of some uni-dimensional trait that is obviously non-admissible when the additions invo lve not only components which have low correlations and do not claim to be about the same thing, but the different additions contain different components. (That is, d ifferent additions contain marks from different subjects) But the notion of difference in estimates requires no such theoretical
14 of 15underpinning. It is empirical data demonstrated by differences in empirical rankings or scores under different experimental conditions.Estimates of comparability errors are easily comput ed. Given that various forms of inequity are inherent in all measures of both schoo l based and external examinations; that the meaning of the final rank order is based o n relative loadings; that all means of trying to create equal loadings involve the creatio n of arbitrary assumptions and the subsequent construction of additional inequities. G iven these facts it is relatively simple to construct a number of different aggregates accor ding to the various models available (including the original raw data), and thus determi ne the range of ratings (or scores) that these produce. These empirical differences are an e stimate of the comparability error. Such a set of scores has the added advantage that i t relates to estimates for each individual, and does not confuse such individual di fferences with group statistics (such as standard error of the estimate).Note that this is not the assessment error. The com parability error is the additional error added through the procedures of summating or summar ising scores, which are independent of other sources of error described els ewhere. The ontological remainderMy description of comparability error here begs the question as to whether the whole process isn't a nonsense, because of the meaningles sness of the total score. In order to examine that notion briefly I will examine the cons truct, not of academic merit, which might be a name that we could give to the sum of ma rks on test or examination performance in various academic subjects, but rathe r the idea of athletic merit, a similar construct we might conceive in the field of more ph ysico-social endeavour.,n!n%%!!6!%n!%%899:!!%;222%8;!n7n3!n)/%nn%n)<%!n!n%n%!3nnnnnn!3n%n%!%%nn3%8:913nnnnnn! %nnn!n%n%8:9!n%%8;
15 of 15#!%*+%%n! #%)!=n!n!n=3n $!n nnnrrnnnnnrrrr!nnrnnr5r!r5rnnnr!n#rrnr!nnrnnrnrrrnr!nnrrnnnr0rrnr!nnrrrrrnnrrnr!nnrnr"nnrrrr5rrr5!rnrr!rnrrrn"nrnrnnrn"nrnrnnnnnrnrrrrrrrrnrnnnr rrrr#rnnnnr#nnrrnnnnrrnrrrrnnrnnnrrnrrrnrrrnrrnr3nnrnrr&nnnnnrnrrrnnnnrrrrrrrrrnr1nnrrrrrnrrr&rnrrrrrnrnrnnrnrrrnrrnnnr4rrrrnrrrn"nrnnrrrnnrnnn
1 of 14nr In this chapter the relationship between rank order and standard is teased out in more detail: In particular the particular meanings given to the standard in the Judge and General frames of reference; how logical confusions proliferate when discourse jumps from one frame to the other; and how the difference s in meaning are connected logically. At the end of the chapter a post-modern myth of the situation is presented. Personal day-dreamI was about fourteen when I first pondered the stic ky issue of the elusive standard. The context was heavenly, rather than earthly, theologi cal rather than educational. It concerned St Peter. It seemed to me he had a pro blem. Here he is at the pearly gates as the newly dead file by and do their thing state t heir case. And Peter, judge extraordinaire, gives his verdict; pass, fail, pass fail, fail, fail, etc, etc for millions and millions of people.And somewhere, among all of those millions were two people, so very close together in the merit of their lives. Oh, so very close! Yet th eir destiny so very different. For one, just scraping through, the joys of heaven for ever. And for the other, eternal damnation. But it didn't end there. For as thousands and thous ands of years pass, and more and more millions queue at the gate, even between these two he must make finer and finer discriminations.I didn't doubt he could do it, mind you. Well, it'd be more accurate to say that I considered that if anyone could, he could.But I wondered why he'd want to!Fifty years on, these are still the two fundamental questions I have about the notion of a standard : the people who define a standard do in f act have St Peter's god-like omnipotence, but do they have his infallibility? An d why do they want to engage in a process that is so manifestly unjust? Order and standardLet's go back a bit and tease out this relation bet ween standard and rank order of merit. A relation that I intuited at fourteen, but only re cently have systematically thought through.
2 of 14The relationship is not immediately apparent. There are some judges who are adamant that they can recognise standards and this has noth ing to do with relative merit. In fact, to them the word relative is anathema. For them, st andards are absolute. They are as solid as a winning post, they are a fact establishe d, a sign as recognisable (to them) as a green light at an intersection. Recognising that so me people play games, run races, create rank orders and random distributions and nor mal curves, they see themselves doing work of a higher order; as maintaining absolu te quality in a world trivialised by concepts of the average, the normal, the relative.So let's push them with a bit of Socratic dialogue. Or is it Hegelian dialectic?nrrnrrrrrnrnrrrrrrrrrrrnrr r!r"rrrnrr#r$rrrrnr%rrr$rrr#rrrrrr$rrrr$&rr nrrrrnrrr#rnrr'r"r$$rrrrr$rr(r(rr#$rr(nr)#rrrr$$rrr(r(rr(rrr(r#rr
3 of 14rr$rrr$r(r(rrr$rrrrrrrr#$rrrnrrrrrnrrr*rr$r$#rrr$rnnrnnrr)rrrrrr$r)nrrrrrrrrrrrrrrr$#rrrrrrrr)rrrrrr#%rr$rrrrnrrrnrr$rr$rrrrrrrrrrrrr$rrrrr(r$rrrnr+nrrrrrrrr'rrrrrr+nr$r$rr,rrrrrr-rrr$rr.#"r r$rrr
4 of 14#r/rrr"rrr)r#r nrrrrnnnrnrnrnnnnnnrnnrrnrnnnrnnrnnnrrnnrrnrrnrn rrnrnrn!rnn!nnr!nrnnrn nnrnnnrrnrrrrnnnnnnnn"nnrrrnnrrrn#rn n!$nrrrn%rrnrnnnrnnnrr&nrr nrn'nr nr ( nn)rnnnn rr&nnrrnrnrrrnnrnnnnnrnnrrn**nrnrnrnnnnnnnnnn+n+nrrrnnrnn n,r&nnrnrnnnrrr-rnr nrnrrrnrrnnnn
5 of 14rrrnrnrnnrrnrnnn$rnr&n,nrrnnrnnn/nr&nnnnnnnnnrrnnnrnn!!nnr(nr&nn)rn'nrr&nnrrnrnnnnnrrrr(n)nnrnrnrnnnnnrnr+n+%rnr(r)nr.rnnrnnnrrnnnnrrnn!rnnnr 'nnrrn$rnr0rnnnrnrrnrn$rnnn nnrnnrnrnr!nnnrnnnnnnnnnrnrnnnnnn1rnn(2n3456$7!3468$9rn346:$234:5)"rrnrrnrrnnrrrnrrnrrn$nrrrnrnnnrnnn;
6 of 14&nn rrrnn!rrnnrnrrrrrnrnnnnrnrrn nnn!nr-nnrnrrrrnrnnr7r1nrrrnrnnn1nnnrrnrr/r>rr0r0!n1n2n1r!nr9n7rrnrnrrnrnnnrnnnrrnnnnnnnrrrnrnrrnrrrnrnnnnrnnnnn,rnrr++nnn1rrnnnrnnnnrn%rrrr&nnrnrn%rr&nnrnnnrnrnrnnrrr+n+nn?nrnrnrrnrrn*rnnnnrnnnn'n!rn7rrnnnrnnnnnnnnnrrrrn n(rr )nnnrnrrrrnrn+n+nrnrnrr7rrrnnrnnrrnnr!nnrnrn9nr++ rn
7 of 14rnnrnr1rrnnnrnrnnnnrrnnr&nnnrnnnnnrnnnnn?rrnrnnnnrnn>0!rnnrrnnnnnrnnnr>n(34<6)@r1n /n/ rrn++nn nnr+ +rrrrrnn!!rrrrrnnnnrnrrrnnrnn nnrrnrn%rnrn*rnnr%rnn%rnn%rnnr%nrnnrnrr-n1n2nAn>nr!nrn"nrn1r**r nnn1nnrrn-nrnr1rnnnnrr+n+%nrrnnrnnr+n+rrn%rnnnrnnr!nrnn nn+n+rnn!n"rnnnrnnnnnnrnn'n!nrnnnnnnnnrnnnnr>nrnr nnnrnrnnrnrrn
8 of 14,rnnnnnrnnnnnrrnnn!Brnrnrnnnnnnnn!nnrnnrn!nnnrn!rn!nrnnnnnnr rnn(nnnn)nrrnr$nnnnnrnnr!nrrnn "rrr0n$#"r%"r)"rrr(n"rnrrnrrrr(nnr$rnrrrr"(nn#"rnrr0nr"(rrr(nn"rrrrrrrrnr$rrrrrrrr(nrrr$#"rn!rrrr#"rrrrrr0nr(r /rrnnn1nrnrrnnnnn!&nrnrr,rnrnrnrnr1nrrnnrrnnr1rnnrnr%rrC rrrrnr'nrrnnnrrnrrrnrrnnnnnnn
9 of 14rnnnrnrnnn1rnnnrrnnnn1rrnnnrrnrrnnnrrnn1nnnnrn %$"rrrrrrrrr$1rr2r(nrrrrrnrrrrrrrrr#"rrrrrr$rrrrnrrnrr rnnrnnrnnrnrrDnnrr&nnnnrnnnrrnnnnnnnnnnrn1nnrrrrnnrnn%r%nrnnnnn%rnnnnnnnnrrnrn+n+nnnnnnnrnnnn1rnrnrnrnrr!rn,rnnnn nn nnnrnnrnnnnrrnrnrnrrrnnrnnr12nrnrrnnrrnnnrnrnrnnnnnrr&nnnnrnrrnnnrnnnn'nrnnn nrnnrr,nrn**r
10 of 14nr,nrnrr "1rrnnrnnrrnnnn"nrnrrrnrrrnnnnnrrrrnrnnnrrnrnnnnrn! +rr$rn"r$"rr+rr$+"r$rrrrrrnrr+rrrr#rr$+rrr3r1rr4#rn(r)$rrr3rrrrrrrrrrn+rr.$rrr$#r"rrr#r"rrrrnrrrrr$nrrrnr"r$nrrn+"rrrrrrrrrrrrrrr%rrrrrrrrrrrrrrrrr(r r
11 of 14Dnnrnrnnrrr'nrnnnnrnrnnrnnnnrrnnrnnnnnnnnnnrrrrrrrnnrnnrrrrrrnrnrrnrnrrnnr!nr'nnnnnrnnrrrrrrnnr&nnrnnnrrr()nnnrrnnnnnnnrnnrrnnrnrrnrrnnrnnnrnnnnr-nnnrnnnrrnnnnnrnnnn0nrnrnnnnn1rnrrnnn+ +rnn-nn%rnrnnnrrrnrnrnrnnnnnnnr%rnrnnnnnnnrnrnn*nnrnrnrrrnn*(rn*nnnn!rnr!nrnrrnrnnnnnrnrn*nnnrrr?rrnn) #$
12 of 14"nrn%rrnrnnnnrnn nnrrnrrnn nrnn Dnr nnnrrnrnrn1rnnnnnnrrrnnrnrrrn$n!n/nnnrnnr$rrnn nnrrnrrnrrrnnnnr1nnnrnnnrrnnn7nrrnnnnrrr "1rnrnnrnrrnrrrrrnrnrrnrnrnrnrnrrnrnrnrrE%rrnrrnnrr**rnnnrn*1rrnnrnnnnnrrnrrnrr!nn1n()rnnnnrnnnrnnnrnnnnn,rrrnnrnnnrnnrnrrnnrnrnrrrrrn+n+nnnrrrrnnrnnnrrnr!nn*rnnnnrn?rnnrrnnn'nrn()!rrrnrrrrn1()!rrrnnr
13 of 14+n+nrrrrnr n"nn1nnn1nrrnrnrnrrnrnnnnn2nnrnnr,nrnnnrrrn!rrnnrrr %#rrrrrr$rrnrrrrrrr$rrrrrnrrnrrr5nrr$$rnnn6r66rr61rrrrnrnrrrnrrrrnrr$rnrrr1nnrr0n7rr0nrrn$nr$8r'r0nrn$nrrrnrnrrrrnrr(rrrrrr(r#rnrr0nr3nrr$nrnrrrnrrrr&r(nnrrrrrnrrrrnrrrnrrrrnrr$rnrrrrrrn(nrnrrrrrrr$ rrrrrr rrrrrnrrnrrnnrrr9nrrrrr rr%rrnrrrrn$1'r*"r8r0nrnr1rr$+r$n&rrrrrnrrrrr8(rrrrr"rr/r8nn
14 of 145nrrr+nrrrrr$rrrrrrrnrrrr#rrrrrrr/rrnrrnrrrrrrr rr$rrrr#rrr$r0nrrrrrrr1rrr*$r0nrrnrrrrr$(nnrr%rrr0nrrrrrrr$rnrrrr$rr$rrr$rr$n:rrrrnrnrrr)rrrrn1$rnr0n rn1r'r0nnrrrnr$rn%1rrrrrr$1rr0n$r$#rrrrrr1"rnr$nrrrr1rn/nrrrrrr1rnn$rrr&r1n rn rrrrrr#rr11rrr)r &rnnnrnrnrrnnn/nnrnrnrnnrrrnrr'nrrnrn,n2nrnnrrn$-nnrnrnnrnnnrn
1 of 19nrrr rFrom the last two chapters it becomes evident that a fundamental purpose of relating assessment descriptions to standards is to transfor m notions of quality to notions of quantity. So in this chapter the notion of quality is discussed, and some of the differences with the notion of standard are elucida ted. The theory of logical types is briefly explained in terms of its implications for complex constructs with multidimentional aspects and the sp ecial properties of the class "safety standards" is discussed.The construction of a bridge with various criteria for quality is discussed to illustrate the different languages that must be used to justify th e quality characteristics for each criteria. The subsequent history of the bridge is t hen used to illustrate how the notion of quality is related to boundary conditions and event s, and how this affects notions of permanency and attribution.Some reflections on the nature of quality follow. T hese are then applied to some of Eisner's ideas about connoisseurship.Persig's ideas about the metaphysics of quality are briefly discussed, and the relationship between morality and quality on the one hand, and s tatic and dynamic morality, introduced. All standards are arbitraryWhen I was younger and groping for a profession tha t might suit me, I studied Physics and Engineering. I don't remember much of the detai l of those studies, but I did learn two things that are pertinent to this chapter: One is that all measurements contain an error; the other is that all standards are arbitrar y. I remember very clearly struggling with some calcul ations to determine the cross-sectional area of a steel beam for a bridge. Estimations of maximum loading on the bridge, moments of force and tensile stress res ulted in a value of the cross sectional area of the beam accurate to three figures. However before choosing the appropriate steel T section there was one more step. A safety f actor of three must be applied. Or was it four? No matter, the calculated cross-sectional area must be multiplied by this arbitrary number in consideration of possible torna does, earthquakes, rock concerts on the bridge, or whatever other natural disasters mig ht inadvertently occur. This undoubtedly would make the bridge safer for traffic and incidentally more profitable for the steel manufacturers. And it made the accuracy o f the initial calculation absurd.
2 of 19Safety and qualityAt this point I want to try and untangle another co nfusion that has bedevilled the notion of standard, especially as applied in the human sci ences. This is the confusion between safety standards and quality standards.In the manufacturing area there is less confusion. Standards that apply to car seat belts, bumper bars, brakes, lights, are clearly basic safe ty requirements. General design of car, colour, control panel layout, type of upholstery, f uel economy, are aspects of quality. And of course, one aspect of quality is that all sa fety standards are met. Safety is about prevention. Safety is about what is not, about events that are always immanent, yet, if safety is successful, never mater ialise. Safety is about the future that is frustrated, about unrealised potential. Because eac h safety measure blocks a road to disaster, each safety measure is essential in its o wn right. To meet a safety standard is to claim that one such roadblock is in place. To know that all such safety standards are met is to be reassured and insured against disaster. Ho wever, to know that eighty percent of safety standards are met is to know nothing about w hich particular safety standards are not met. For a gambling man this may be a situation of high desirability, and hence provide an experience of high quality. But in the w orld of safety standards, this is a recipe for disaster.Quality on the other hand is about manifestation, a bout potential realised. Quality is not so much about specific aspects as about their inter relations; about interpretation rather than measurement; about the whole gestalt rather th an summaries. Further, notions of quality are intimately and necessarily connected wi th the observer, and hence are constructed from the observer-object interaction, r ather than claiming to be a measurable component, or sometimes a presence or absence, of t he object or specific attribute being observed. Theory of logical typesThe theory of logical types is about levels of abst raction in human discourse. One of its axioms is that whatever involves all of a collectio n must not be one of the collection; that is, that there is a fundamental distinction be tween a class, and the members of that class. This might seem obvious. Obviously a single man is not all men, and a married woman is not all women.Trivial as this might seem, the conclusion from the theory is far from trivial: that when this clear separation between class and members is not made, messages become confused. As Bateson (1972) describes it, "the theo ry asserts that if these simple rules of formal discourse are contravened, paradox will be g enerated and the discourse vitiated" (p280).Human discourse is decidedly more complex that simp le logical syllogisms. We do not usually talk like logic machines. We talk very ofte n in and about abstractions, and these abstractions may be at different levels of logical type. We present information (first level), and give an interpretation of that informat ion (second level), in a particular
3 of 19context which affects its meaning (third level). A story that makes fun of a rich Jew has a very different meaning if told by a speaker at an anti-semitic rally than it does when told by a Jewish comedian on a New York stage.Of particular interest here is that errors that lea d to confusion occur when the properties of a class are ascribed to members of that class, o r vice versa; or more subtly, whenever the discontinuity between class and member is negle cted, and they are treated as if they were at the same level of abstraction: The theory of Logical Types makes it clear that we must not talk about the class in the language appropriate for its members. This would be an error in logical typing and would lead to the very perplexin g impasses of logical paradox. Such errors of typing can occur in two way s: either by incorrectly ascribing a particular property to the class instea d of to its member (or vice versa), or by neglecting the paramount distinction between class and member and by treating the two as if they were of t he same level of abstraction (Watzlawich, 1974, p27). Safety and logical typeSafety is not quality. It is one criteria we might use in describing quality. It is a member of the class of such criteria. But it is a very par ticular member, because it is atomic in its construction. It is comprised of a number of specif ic safety requirements each of which must be individually met. Not only is the class of events or information called "safety" of a different logical type to the class called "qu ality," but the essential information about safety is lost when the class "safety" is des cribed, rather than the individual items that describe it. Unless, as we mentioned earlier, the statement about the class is that "all safety measures have been satisfied." Safety and peopleIn many aspects of our life safety measures are imp ortant for its continuance. In home, leisure activities and job, safety requirements con tribute to our health and that of others. So matters of safety are a part of various educatio nal programs. As such, it would seem important that evidence be obtained that students h ave incorporated such safety items into their behaviour. Or, at the very least, that t hey understand and can implement all of the safety requirements. Talk of safety (like talk of sexuality) produces points of high density in the field of power relations.It should be apparent, however, that test or examin ation information involving rank orders or grades or marks regarding safety represen ts information about the class of safety items, and as such is inappropriate and conf using. If safety requirements are essential requirements, then marks of 70 per cent o r grades of C for safety, or for tests which include questions about safety, present infor mation that is inherently contradictory. By definition, if you have not met a ll safety requirements you are unsafe. Test-makers and others argue that in the context of a test people make errors and it is not reasonable, because it rarely happens, to expect on e hundred percent correct response. This is surely an indication that the test context is inappropriate for obtaining
4 of 19information about a person's acquisition of safety measures. It certainly does not justify accepting that if they can provide evidence that th ey "know" seventy percent of the safety requirements that their "standard" of safety is adequate. Further, talking about safety measures, or choosing the correct safety requirement from a number of choices, is an activity of different logi cal type than implementing that information in the context of a job. Talking about something you do is of a different logical type than doing it. So any measure on a tes t, even at one hundred percent, cannot be a measure of safety behaviour. It is a measure o f test behaviour. At the very best it is an indicator, about which empirical evidence could be obtained about the probability of its correspondence with overt safety behaviour unde r specified conditions. In this respect, probabilities less than one would necessar ily indicate test invalidity. Safety and minimal outcomesThe idea of minimal outcomes is analogous to that o f safety. Minimal, or minimum, means the least amount, the lowest possible. If a c ourse of study has a set of minimal outcomes that define its successful completion, the n by definition all such outcomes must be demonstrated if the course is to be satisfa ctorily completed. To set a test incorporating questions related to such outcomes an d then use a test score (a statement about the class) to describe the "standard" that ha s already been described by each of the members of the class, is again to confuse logical t ypes. Such tests are sometimes referred to as mastery tests.There are three additional confusions, two of them the same as for "safety." The first is that only a perfect score is consistent with the de finition of minimal. So to attempt to find an appropriate "cut-off" score to use as a sta ndard is to engage in a paradox, is to indulge a contradiction, is to professionalise an a bsurdity. Berk (1986) was able to identify 38 methods for setting standards and produ ced a consumer's guide (to choose the most appropriate absurdity).The second confusion involves the fact that context affects meaning. For many educational outcomes the context of a test situatio n is inappropriate anyway and represents another logical type confusion. For exam ple, any outcomes involving verbal discourse, such as listening skills, group problem solving, giving instructions, cannot be demonstrated in a written or multiple-choice test w ithout logical type confusion occurring. Writing about verbal interaction is not verbal interaction. Choosing the most appropriate response from a multiple-choice selecti on is not responding oneself in an interpersonal context. Talking about a painting is not painting. The whole test and examination industry is permeated with this sort of confusion. The third confusion is one of ends and means, and i s well described by Burton (1978): "no measure of a single skill can ever be mapped on a non-trivial vision of real success because any problem can be solved in more than one way. One can determine whether the respondent has the skills necessary to solve th e problem this way, but one lacks the justification for imposing successful performance, this way, as a standard"(p273). Burton believes that "this argument is fatal to any method of setting performance standards." Burton is perhaps mistaken in believing the issue is amenable to rational argument, and does not consider that it may be entr enched in mythical discourse.
5 of 19 Mastery tests and framesMastery tests result in scores produced by the summ ation into a numerical score of specific objectives attained. In relation to error, they contain all of the errors of specific objectives plus a large labelling error. In adding the results most of the important information is lost, in that we no longer know whic h specific objectives have been attained and which have not.In this situation, whilst the generation of the tes t has used the Specific frame of reference, the summation has resulted in a normativ e test score. We no longer have information about what a student has achieved. We h ave information only about how many of the objectives have been achieved. This is exactly equivalent to information about how many addition sums are correct, or how ma ny words are correctly spelt, or how many formulas in dynamics we can remember. The description is now clearly normative, and may only be interpreted in terms of whether one student got more or less "right" than another, or in terms of some arbitrary "standard" of how many "correct" answers will be considered "adequate"; how many cor rect answers constitutes a "pass." In this situation, because information about the pa rticularity of objectives attained is lost, the whole detailed descriptions tend to be similarl y "lost," or unavailable to those interpreting the test information. Labelling errors thus become large, as the meaning of the score, and the label attached to it, are differ entially interpreted. Mastery tests and internal logicIn most courses there are some facts, some understa ndings, some activities or skills, which are central to what the course is about, so t hat we could say if they don't know at least those things, or if they can't do at least th ese things, then there is no way we could say they have adequately completed the course. In o ld-fashioned terms, they are the "must knows" or "must dos" of the course. As distin ct from the "should know" or "could know" categories.Now there may be some areas of study where curricul um writers or teachers are unable, or unwilling, to specify such a category of "must k now" performance. However, when it is so specified, it comprises a description of a fi nite number of procedures or products that will demonstrate the "knowing" of these crucia l things. In other words, within this limited "must know" area, it is possible to specify what must be done, the conditions under which it must be done, and the procedure by w hich its adequacy will be known. These then could be used to describe the essential requirements of the course of study. They are limited in number and extent, and are spec ifiable in the specific frame of reference. As they are accomplished, as evidence is obtained that each outcome has been achieved, this can be certified by the teacher or s tudent. If there are ten such outcomes, then successful completion of the course would requ ire that all ten outcomes be so certified. Otherwise they cannot, obviously, be ess ential. To certify that eight out of the ten essential requirements have been completed is t o certify that two of the essential requirements of the course have not been completed, and thus to certify that the student
6 of 19is uncertifiable. More than this, it is to lose the information about which two essential requirements have not been demonstrated.So to obtain a "total score" on a mastery "test" is to contradict the whole concept of essential requirements, and to lose all the relevan t information. Unless the total score is a "perfect" score.In many situations the very notion of a "test," of some particular situation constructed to check all of the essential requirements at one time would itself be contrary to this frame of reference. In the artificial and often pressured "test" situation it might be expected that success in some "essential" activities might not be demonstrated. It is this very argument which has been used to justify the acceptance of le ss than a "perfect" score in a mastery test. Rather it should be seen for what it is an argument that invalidates the use of the test.The problem of time-binding is not solved by succes s in test situations any more that it is by success in the ongoing teaching learning co ntext. We can never certify that any fact will be recalled at a later date, that any und erstanding will be retained in the future, that any skill will be demonstrated again successfu lly next year. We can sensibly certify that a behaviour has occurred once, or twice, or if necessary one hundred times. Regardless, we can never be certain it will be adeq uately demonstrated on the next occasion.Test givers imply, with their insistence on testing that demonstrations outside the testing situation are in some way of limited value, credibi lity and validity. It has always seemed to me that "tests" have all the inadequacies of "on site" or ongoing certification, with quite a few bonus inadequacies added on for good me asure. Or more accurately, for worse measure. A bridge of qualityLet's assume that we want to describe a particular person's performance in a certain area. Building bridges is as good an area as any. And we are interested in the quality of that performance. That is, we are in the area of discour se often called assessment. We might decide that there are four aspects of perf ormance which we want information about; four members of the class we will call quali ty; four criteria on the basis of which we will assess quality of the bridge produced. Is t he bridge safe? Is it economical in cost of materials, construction, and maintenance? What i s its environmental impact in its rural context? And how is its aesthetic design judg ed in a competitive order of merit in relation to other submitted designs?We note in passing that this decision about these p articular four aspects of quality is itself a value judgment subject to enormous error i n the General frame of reference. It is clear that the language of discourse for each of these four criteria will be different, and attempts to simplify by means of some language that is appropriate to some and not others, or that is appropriate to the notion of "qu ality" as a class but not to some or all of the members of that class, is to compound confusion by oversimplification (Eisner,
7 of 191991, p182).For example, the first question, about safety, may only be addressed by showing that all safety measures are in place; the language that des ignates individual safety standards is appropriate. The question about being economical in volves careful costing; the language of accounting is appropriate, and the language of e conomics will be necessary to delineate boundaries. The question about environmen tal impact will draw information from a number of disciplines geology, biology, ec ology, geography, ethics, economics, and so on. Ultimately, the discourse must deal with the balances and trade-offs among conflicting values and pressures; the language of p olitics and the language of environmental ethics will fight it out. Finally, th e order of merit based on the aesthetics of the design will draw on the language of art and architecture, and be involved with issues of the assessors' personal tastes and the pr ofession's current fashions. Finally, however, such complexities will be reduced to a sin gle dimension where better-worse becomes more-less and a rank order is produced.As this competitive order of merit is one aspect of the quality of the design, it is not that quality. By the same token, no measure of the order of merit can be the measure of quality, any more than a cut-off point on the order of merit can represent a cut-off point of quality. All this regardless of how consistent, stable, generalisable that order of merit may, or may not, be. Permanence of qualitynnrrnnrrnrrrnrrrrrnrnnrnrrrnrrnrnrrnr nnrrrnrnrrnrr!n"nnrr#nrnnrr$nnnrrrrnr%nrnrn nnnrnrnrrnrnnrrnrrrrnrnrnn&rrrrnr'()(rnnnnrr nnrnrnnnrnnrrrrnrnnrnr*nnrn!rrnrnr
8 of 19nnrnrnrrnnnnrrrnnrrnnr*rnnrr&nrnnrnrnrnrrnrnrnrrnnnnrnr"rnrrrnnnrrrnnnnnrnrn+nnrnn*rrn!rrnnrrrrrnrrrrn,r&rrr!rrrnrrr-rn!rnnrrrrrnnnrnn&rrnnnrnr+)r!rr)r.rnrnrr rrrr"rnnrn)rrrnr /nn,nr"r rn rnrrn0nnrrn+)r"r1nr*nr+rrr2!rr!rnnnnnrn*rr!nnnnrnrrnnnrnrn)nnnr nrrrnrrrrrrnrrrrrrrrrrrrrnrrrnrrrrrrrrrrrnrrrrrrrr rrrrrrrnrnrrrr
9 of 19nnrrrrrnr!rrrrnrrrrrrrnrrrrrrrrrrnrrrnrrrrrrrr"nrrrrrrnr#rrrrrrrrrnrrrrnrrrnr$rnrrnrrrr%rnrnrr&rrrrrnrnrrrrrrrr%rrrr$rrrrnrnrrrrrrr%r'rrrrnrr%rrrrrrnrnrrrrrrrrrrrr(rnrrrrrr%r rr)r*rrrrrrnrrrrrrnrrrrrnrrrrrrrrrrnrrrrrrrrrr++rrr$rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrnnrn!n,r-./0rrrrrrrrnrrrrrrrrr%r1rrrrrrrnnrrrrrr%rrrrnnrrrrnnrr2rrrrrrrrrrrrrnrrrrnrrrnr
10 of 19rrrrrrrrrrrrrrrnrrrrrrnrrrrrrr3rrr4rr5nrrrrrrrrrnrnr+rn+rn6rnrrrrrrn2rnrrrrrr$rnrrrn$rrrrnrrr1rrrrrrrnrr34rn$r6rnrrr6rrnrrr7rnrnrrnrrrrrnrrrrrnr++(rnrrrrrrr"rrrrrr6rrr6rrrrrrrr$+r+r+rn+$rrnrrrrnrnr rr8r9rrr%rrrrr9r%rnrrr2rrrrrrrrnrnrrrrrnnnnnrrn%rrr$rrrr3rrr4nrrrnrrrrnrrrrrrnrrrrnnrr rrrrr::nrrr rrrrrnrnrrrrrrnrrrrr6nr
11 of 19rrrrrrrrrrrrrrrrrrrrrrr37-.;;6<-..=4 (rnnrrnrn*rrr1rrr*rrnrrrrrrrrrnrn1rnrnrnrrnrrrrrrr>r*rrrr3!-.?/4 5nrrnnnr$nr2rrrnrrrr(rnrrrrnrrrrrrrrrrrnrrrr6rrrrrrrrrnrrrrr"@rnrrrrrrnnrnr1nrrnrrrrrrnr$rrrr%rrrr8rrrr9rnrrrr3rrr4rrnrrrrr(rnrrrrrrrrrrr6rrrrr(rnrrrrrrrrrrrnrrrrr(rnrrrrr6rrrrrr6r6rrrrrrrrrr6rrrrrr3 -../47rrnrrrrrrrrrrrrrrnr
12 of 19rrrnrrrrrnrrrnn"rrrrrrnrnrrrrnr rrr91rn9&979)9"r9 rrrrrr>rr*rrrnrrrrrrrrrrr!rrrnrrnrrrrrrrrr$rrrrrrrrrrrrrrrrrr$rrrrrnrrnrrnrr6rrrrrrrr6rrrrrrrrr2rrrrrrrrrrnrrr3rrr4rrrr3rr41rrrrrrrrrrrrrrrrrrrrrnrrrrnnrrrnrnrr6rnrrr6rrrrr rrrr&rrrrrrrrnrrrrnrr&3-..-4rrrrrrn+1nrrrrrrrn+3-?4"rnrrnrrrrrrn+$r++rrr++rn+nrrrrnrrrrrn8rrr+nrrrr+rrrr+rrrrrrr+3-?4&rr+rrrrrnrrrrrrrrrr+3-;4nrnrrrrrrrrrrr+rnrrrr+r&3-..-/A4rrrrrnrr8
13 of 19rrrB nrrnrrrrr rrr9rrnrrrrrrnrrrrrrrr2rrrrnrrrrrr3%4rrrrr6rrrrrrn rnrrn%$rrrrrrrrrrrr1nrrrrrrrn3--A4 2n&rrrrr+,rrrrrrr+rrnrrrrrrrrrr#rr+rnrrrrrrr*+3--A45nrrrrrrrrrrr$rr&*rrrnrr*B+rrrnnnrrrrnrnr+3--A4 rrrrrr9$rrrrrrrrrrrrrr nrrrn%rr&9r%r&3-.;C4nrrrrr+rrrrrrrr%r+3--C48rr@nn+rrrr+"rrrrr@n9"rr@nrrrrrrn$rrnrr%r$rr%rrrrrr"r@n&rrrr%rrrrrrrnr$@n+r+3&-.;C--C4"&rrrrnnr$rn*rrrrr rrrrrrnrr
14 of 19rr%rr"r6rrr%r6rrrrrrrn6%rrrr6r%rrrrrr%r3rrrrrrrrr4nrrrrrnrrrrrrrrrnrrr1rrn*rrrn%rrrrrrrrrrnrnrrrr$rrrrrrrrr%rrrrrrrrrrrr%rrrrrrr%rrrrnrrrr%r$rr&*rB rr9&n*rrn$rr5nnrrn*n>rrrrrrrrrrnrrrrrrrrrrrr34rr3",r-/Drn4 r "&*rrrrrrrrrrrrrrnrnrrnr7rrrnrrnrrrrrrr$rrrrrrr"rrrnrnrnrrrnrrnrrrrnnr6rr%rrrrrrrrrrrrrr''rrrrnrrrrn<3-./?42rnr rrr -/rrr=0rrn8r!-?rrnrB rrrrrrrrnnBrrrrnr3r48
15 of 19rnrrr=0rr3A/4 8rrn+!rrrrrr+!nrrr1rr
16 of 19&rrrrrrrrrrrrrnrrrrrn)r*rrr6rrnnrrrrrrrnrr6%rrrrrrrrnrrrrrrrrrrrrrrnr7rrrrrrrrrnrrrnrrrrnrnrnrrrB rrrnrrrnr%rrnrrrnn$rnrnnrrrrrrr6%rrnrrr*nrrrrnrrrrrrrnrrrrrrn3!-..--/;4 rnrr6rrrnrrrnrrrnrrrrr!*3-..-4B 8nrrrr9!rnrnrr97rrrrrrrrrrnr1rrrrrrrrrrr@nr3==;4 rnrnrrrrrrrnrrrrnrrrrrn5rrnrrrrrrrr$rrrrrrr++nrrnrrrrrrrrr+r++rn+rrrrnrrrrrrrnrrrrrrrrrrrrrrrrrrrr$nrrrrrrrrr++7nrrrrnrrn$rrrrnrr
17 of 19rrrrrr6rrrrrrrrrrr$rrrrnrn6rrrrrrrnrrr6rrrrnnrrrrrrrrr,3-.?-4,3-.C/4r rnrrrrrnrrnrnrrBn::rnrrrrnrrnnrrrnnrrn$rr3rnr4rrrrrrrn rr6nrnrrrnrnrrrrrrnrrrrrrrrn7rrr3rn4rrnrrrrrnrrrrrrrrrr2rrrrnrrrrn(rnrrr6rrrrrBrrnrrrrrrrr+(rnrr+31rr-.;E--04rnrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrn$rrrrnrrrrnrrrrrrnrrrrr$rrrrrnrrrrnrrrr$rrrnrrrrrnrrrrrrrrrnrrrr+rr+++rrrrn
18 of 19nrnrrrrrrrrrrr$rnrrrrrrrrrrrrrr8nrrrrn3rrrr4rrrrrrn"rrrrnrrnrrrrr$rrrrrrrrn6rrrnrrrrnr7rr%rrrrnrnr r""r9rrrrrnrrr9)rrnrrrrnrrrnrrrrrnrr7rrrrrrnrrrrrrrrrrrn6rr7rrr%rnrrrrGrrnrrrrrrnrrrrnrrrrrr%rrrrrrrrrrrrrrrnrrrrrrr#rrnrn$rrrrrrrrrrnrrnrrrrrrrrrn5n*rr5nrrrrrrrrrrrrrrnrrnnrrrrrrrrnrrrrrrnrrrrrrnrnrr1rrr*rrn
19 of 19rrrrrrrnnrrrrrrnrrrrnrrrnrrrr rrrnrrrrrrrnrrrrnrrrrrrrrrrrrrrrrrrrrrrrnr
1 of 7nrr rrr r !n" nrrThe meaning of error in each frame of reference for interpreting assessments is now considered: In the Judges frame the phrase "error i n the Judge's frame" is recognised as an oxymoron; in the General frame error is conventi onally defined in statistical terms that ignore or underestimate some of the considerat ions, and the unattainable true score is seen to be a theoretical construct that need not relate to any external reality; errors are hidden in the Specific frame, and some of the Prete nders to this frame, namely mastery tests, criterion referenced tests, and competency s tandards, are briefly examined; finally in this chapter the meaning of error in the Respons ive frame is considered. As this frame involves human interaction and discourse, error is what disrupts or disturbs movement towards clarification of meaning.Assessment discourse is necessarily confused and co nfusing when the frame of reference within which the discourse is occurring is not spec ified, or when it involves definitions and methods where the actual frame being used is mi srepresented. The meaning of error in different framesAs soon as assessment data are committed to paper, their material permanency is dramatically increased. Likewise, the span of their associations is spread and emphasised. No longer just a description of a parti cular performance, the assessment becomes interpreted as a measure of knowledge and a bility, an indicator of achievement on a course of study, and a predictor of future suc cess or failure. Participation in an event has been transformed into an attribute of a p erson. To estimate error is to imply what is without error ; and what is without error is determined by what we define as true, by the assump tions of the frame of reference that forms our epistemological base.There are four, at least, frames of reference for a ssessment. Four different sets of assumptions about the nature of the exercise. So wi thin each of these frames the meaning of error, as defined by the assumptions of that frame, is different. Just as the meaning of error within each frame will be differen t again if judged by the assumptions of another frame. It is these differences that will be examined in this Chapter.
2 of 7Error and the JudgeThe Judge assumes omnipotence and infallibility wit hin limits. The limits are defined by the particular performances with which the Judge is presented. These are the facts of the case. The task of the Judge is simple. He examines the performance of the accused, in whatever form it may be presented, he relates this performance to the standard, and then describes it accordingly.He does this without error.So problems that relate to error such as labelling, construction, stability, generality, prediction, categorisation, values and distortion o f learning are, to the Judge, irrelevancies. For Judges are practical people, con cerned with the realities, with what is, rather than what might be. And for them reality is the answers written on paper, is the art poster presented, is the motor repaired; in short, is the performance or artefact with which they are presented.Questions of ability and stability, of looking to t he past or to the future, are both irrelevant and unsettling. Irrelevant because they are outside the limits of their scrutiny. Unsettling because they trigger notions of a subjec t.nrrrnrrrrrnn r!" rr##rn!rn!rn$n!r r%rnr&! ## rnr&'nr&rr" #r#rr r"#rr$rr #rr&rrr(rrrr) rr r !&r r$ #r* r$r rnr&r r###r+ r #rr,,rrr&r#r r"#rr r#r
3 of 7 r-rr$nr&rr rrr #r"##r rr $rrnr&rrnr r#rrr r#rr #rr$nrrnrrrrnrn%$&nr"'nr&%$%nr()rnrrrrnr&"nn&()rrnrr&"rn"rrnr&*nn+("n",rr+rn+(-nrrnrrnrrr.rnnr&n%$rnn(nrrrr"&rnr%r/rnr"&%$nrr()r$rr.rn%$r"%0nr(+%'nrnrr"nn&"rnn"(1nn+2rnrn"(rn&'nrrr,rr+rnrnr(-nrrn"%nnrrrrn&n+(3+rn"rrn%rnrrrrrn(4%rnnn%n$rr"(+rnrn$r&rr&r&n*rrnr(4r/"rrnrrrrrnnn&rr*"rn&&(nnnrnr+rrnrn%n*"n+2rrnrnr(rr+"nnrr"rnrnn&rrn"rn"n"rrrn""rnnrrr5"rr"rnrnrnr(-nnrr-*nn%rrnn"'nnr&-*n*rnnrr+&"rrnrn&$%n"r6)n"+rnnrn&n$rnr(
4 of 7#r"$rnrr"rr"n"rrr&&%rnrnnr5"r%nr*nn"rrnn*$nr(7"r%nr%%r"r"r%nrrrr(83n9:;<(-r%r%rn&r(nrrnr"rnrrn"%nrnrr4%*&rn/rnn'nn(r*&*&"nn*(nrnn"rnnnnrnr5rnrrnrrrnr&rnrn"$nr'nn$nrnnrn&n"nrnrnr(7nnnr"rr(=nnnrnnn)n(-r"nr"n$rrnr(>nn&nn%rnr&r""nrnr%nn$(?rnrnr"*r*&nn"r&*rn( -nrrn%%$n)nrrr"*".*r&r%rnrrrrrn&nnrn&nnrnr(-nrrnnrnn&nrn"rnnnn*n(-rrrnnrn"nrrn""%rnr&rn&"*".*(-nrrnr"n"%"n"rrn"n(".*rnrn&nnrrnrrn5n'&r$nrnnn(8#&9 @n%r0r/-rn"nrrnn".*r".*rrrnrnn"nrnr(
5 of 7-nrr&rrr"r"$nrnr&nnnrr$nr*n*nnrnrrrr(>nr&nnr*nn&rn"nr%r".*rrnr%rnrn(rnrnr&%*&/rrr*nn**n"rrrnrrrnnnnrrn"rnr5nrn"rnr"*".*(-n&r.rnr(rr%#rr$89B9& rrnr%*n"r*n&n&n&**r&nnnn"rr"r$rnrn%rn"nrrrr%rnnnrr'n*nnr(r$rn"nnrrr(>n%rnnrr$r&nr"nr%n"AArr".*&nr$"rn%*nrn(-n%r&r"n%r$&r$*rnr&r*nr"n8?&9 <(>rnr$nn"rnrnrr&rrrnnrnnnrrnn5nrr&r&/n*r"n".*()rnnrnn"0nrr&r&%%rn"rrnrnr$&rnnnnr$n%&"nrnrn()nnrn&r(>nnnr"(nrnn*".*nr(?n&r".*rn%n&*nnn.nn%".*r&nn(7nn&rrnrrrr"r%.*r&rrn&rrn&nrn%*rr( -n1rnr*nrnnrnrn/r(>/nrrnn*rrnnnnnrrrrnCrrrrn(-nrrn*n&nrn
6 of 7n(1rrrrrnrrnnrr&%"r"%rn*nn'8r*rrrrn<(7rnr*rrrrnrn2rn**nnnnnnrrrrnr"nnn8%$/rrrrn<(7rnr*rrrrnn**rnrnnn0.nrnrrn2r%$8nrnr<(rrrrnr*rrrrn"n*&rrnnr(rn&nrnnrrrr2rrnr*&nrnr/(-nn&nnr/nnr".*nrn&r"nnrnrnnnrrrn&nnn""r&r"/r(1rnr*"$nrnnrr%n**r"r*nrnrn&nn&nrnr(-nnn**nn(-n&nrnrrnnrrnrnnrnrrr&r"rnr&nr&*nr&nnr(&rrnrnnnr&nrnnrrnrrrrrrrAAnnrn(rn.2rnn&"nrn(-nr&rn"nrrrrrrn"r*nrrrr&rrrrrrnn"nnrr(nnrrnr"r0n&rrnn"nnnnrn%r"nn(>nrnn%nnn&nrnnrrnnnnr(>/nn"n"nr/nr(r&'n%nrnn(>n&rnrrrrrrn"$rrrrrrrrn&rnrnrnnrr(rrrrrrrrr%rrrrr2rnr&nrnr()nnnnnrrnrrrrnn(Dnnnnnrnrrrrnn&nr*rrn0nnrnr(nnr&"nrn%n*nnrr*n&n*n&n/nrr*rnrn&'*r&nn*nrr*n(-nn&%n"nrrn"nrnnnnrn%nrr"""(
7 of 7#n&%"%r"%nr"rnrnnnnr&nrr$n**nr*( >rnnnr%rrrrn&rrrnr(r**r%rrrrn&rrrnrnrnnn*&rrnrrrrn*nnnrrn%nnn(
1 of 10nr In this chapter I discuss in more detail the questi on of what it is that a test measures. In what sense can it be said to measure knowledge or a bility? To what extent does it perform a ritual task and measure nothing? Or is it the wrong question? Should we rather ask, what do tests produce? Tests and scalesA measure, or scale, assumes of course that equal i ntervals anywhere on the measure are in some sense of equal value. That the difference b etween sixty and seventy percent is in some way equivalent to the difference between twent y and thirty percent. So if a test is a measure then it must be a measure of something, and we would expect equal differences to represent equal differences in that something.We know that a ruler measures length and the unit i s a metre. We know that a clock measures time and the unit is a second. We know tha t a balance measures mass and the unit is a kilogram. And relative humidity measures what fraction of the water vapour the air could contain at a given temperature that in fa ct it does contain. So this is a pure number. Nevertheless, it is a ratio of two quantiti es that do have units. So what does a test measure? And what is the unit o f measurement? Let's look at the unit issue first.It is clear that there are no units. The measure is a pure number. Unlike relative humidity, however, it is not a ratio of two measure s of absolute humidity which do include units. Again, this supports the idea that t he numbers are not measures, but ordinal numbers numbers that represent an orderin g of some kind. Numbers that describe a position in a series. Numbers in this ca se that assert that some performances, or people, have more of "something" than do others.At this point it is worth mentioning that the whole paraphernalia of normalising scores and otherwise fiddling with them has two purposes: One is to try to magick a linear scale out of an ordinal one by making various sorts of as sumptions about the distribution of the "something" that is being "measured"; the secon d is to produce "measures" that are mathematically pliable, that are accessible to the manipulations and pleasures of mathematicians; that will, in short, turn a horse r ace into a profession (See chapter 11). Cultural differencesBack to the problem of the "something" that is meas ured by the test. For the most part, Europeans and their colonial converts on the one ha nd, and the United States and their
2 of 10spheres of influence on the other, have different a pproaches. To the Europeans it has never been a problem. Inure d by tradition to a religious belief in the Judge, they have generally accepted the proposi tion that the test or examination measures whatever the Judge says it measures. The a cceptance of this "fact" denies the existence of a problem. The Judge says that tests m easure student achievement. Pressed further, he or she might say that student achieveme nt is a measure of what has been learnt on the course of study being tested. The tes t is simply that part of the course where learning is demonstrated. And the Judge, who holds the mystical secret and truth of standards, is able to convert this demonstration in to a mark which is the true measure of what is achieved.As I wrote that last paragraph I was aware of how right" it sounded. Like all religions, there is a plausibility in its logical circularity that is terribly enticing, a simplicity in it's self-evident truth that gives a deep sense of secur ity. Articles of faith are characteristically immune to both the challenges of logic, and the intrusion of empirical data. To paraphrase Horkheimer and Adorno (1972), f aith needs knowledge to sustain it, and thus pollutes knowledge in the act of attaining it (p20). The Americans, whose religious tradition is democra tic and competitive rather than monarchic, have little faith in particular Judges o r, for that matter, Presidents. Which is not to say that they do not revere even more in com pensatory manner the institutions of power in which these fallible humans are niched. Re gardless, their tests must be free of the Judge's subjective idiosyncrasies, and pay due homage to the competitive individualism that is central to the American dream The problem of subjectivity was (mythologically) so lved through the medium of the "objective" test:nrrnnnrnrrnnrnrnrnr !"#"n$%& rr'rr'r(''rnr'' )nn&*nnr+nrnrrnrn
3 of 10nn,nnrn-rr.r''/!"%0+!""1+(!""0&nn2nn+rrn3nn4nrr)rr*r.rr5nrnrnnnnnnnr2nnnr-nr''*nr''''(nnnrr&n.rrnrnr 6/!"%0+ !"#"n$"& .!"7#.rnnnnnn'.n'r844nrn.59(nrrnnrrrrrr:;n
4 of 105>n5>nn5>)>nn-nnnnrn-r.>n'3nnnrr(rrrr?.2nn''rn+r3>2?rnnrrnn@?;?*rnnnrrr?@n2nnrrnrr?3nrnn?nr'@?''@nnnnn?'@n?nnnnn,rrnr2r3nnrnnrnnnrnnnrr=4r>nn r nrnn
5 of 10nnnnnnrrnnn !nnr"nrnnnrnnnnrnnnrnrnnnnrn!rnrn#nnrn!rnr$rnnnrnnnnnn%nnnnrrrnn%nrn"nrnr%%nn"nrn!nrnnnnnnrnrnr&nrnnnrnrnrnn$n!nrnn'n!n%n%n!!nnnrn!nnnnr$rnr#n$rnrnrnn!nrn%nr(nnn%n$rnn%nrnnrnr%nnnnnrnrrnrnnrn%nnrnrnnnnnrn%nr)%r!nrnr!nrnr"n*nnnrnnrnrnrrnrrnrn!"nrnnnr%nnrnnnr%nnr%nnnrrnr%n!nrnrnnnr%nn+nrnrn)nn%nnrn$nr%nnrnnrnn
6 of 10rrnrnrrnnrnn%%n%n$n,nnnnnnrnrn!rnrnnrnrnrn!nrnrn%nrnrnnnrnr)nrnrn%nrnnnr$nrnrnnrnnnrnnnnn%rn%nrnnnr!nrr&nrnnn%nrnn-nrn%)nrn%%nn$nnnr%nn$..%nn%%nnnn%nnnnrrrrnnnrnnn nrnnrnnnrnn%nnnn) nrnnrnn%nrnnn/)n !nnrnrnrn!nrnrnn!nrnrnnnnrn%n%nrnrn!%nnrnnrnnrrn*n%n%nnnnr%n%)nrn% A n!rnrnrnr%n r rrrnn
7 of 10r2nrnrrn>nrrr,2nrrrrnnr(!""!n!BB&By creating a story, we create a reality. And we ha ve as many realities as we create separate stories about ourselves in the world. It i s in the creations of such stories that we define ourselves to ourselves. Out of our past we s elect and choose the experiences, with appropriate perceptions, that sketch the outline, a nd then fill the substance, of our stories. The firmer the story line becomes, the mor e selective our experience, and the more distorted our interpretations are likely to be to maintain the story line. All this is fine, so long as we keep reminding ourselves that w e are much more than our stories, that our experience is much richer than our percept ion and interpretation of it, and that the world is much more than our experience of it.Yet there is another trap more subtle still. For no t only do we get caught up in our own stories, we also get caught up in the stories of ot her people, particularly those we admire, or love, or are controlled by. For we do not live a lone. We are social animals, and our life stories require other people to bring them int o being. Thus our stories about ourselves in the world are c onstructed out of our experience in the world. And this experience may come to us by di rect involvement in the world, or involvement through the incorporated stories told u s by others. And once these stories become accepted by us, they become part of our real ity, part of our way of living in the world. Then we tend to construct our experience out of our stories. This is not a cause-effect relation, but an ecology of effects; o ur consciousness of the world, our way of being, involves an intimate interconnection of o ur experience, and the stories we use to make sense of that experience.Our knowledge of ourself is just that interconnecti on. Knowledge of a fieldIn just the same way do we construct knowledge in a particular field of study. We create events around the object of study, observe what hap pens, and then make up a story about what is happening. Or more likely accept someone el se's story about what is happening. For any field of study is just such a consensus sto ry, comprising what Foucault calls a "regime of truth." Then we use the story to help us make sense of other events involving the object, or other objects in that field.This is equally true whether the field of attention is immense, as in mysticism or physics or history or engineering, or is small, as in build ing a table or washing dishes or driving a car.So our knowledge of the field consists of descripti ons of events involving a selected set of data constructed out of the relation between sto ry and experience, between hypothesis and interpretation (possibly involving measurement) between conception and perception. As Wolf (1991) expresses it, "sophistic ated thought follows a 'zig-zag'
8 of 10course between craft and vision"(p41).But again, let us be clear on this fundamental poin t. The data, the knowledge, does not belong to the object of study. It is not a property of the object. Nor is it the name or a measure of a property of the object. It is rather i nformation about the relationship of the object to its environment during a particular event a particular interaction, suggested by the story in which it has a part to play.Messick (1989a) comes close to this but does not fo llow it up. In claiming that tests "do not have reliabilities and validities, only test re sponses do," he goes on to say "that test responses are a function not only of items, tasks, or stimulus conditions but of the persons responding and the context of measurement" (p14). In my terminology, they are functions of events.We could generalise. All knowledge is knowledge of the relations that identify events. And as we are observers at some point in the intera ction, either at the level of direct observation, or at the level of constructing and in terpreting the story that is the basis for the data collection, then we ourselves are involved in the interaction, and are thus part of the knowledge. And for the very reason that we are part of the knowledge, we are not that knowledge, and the knowledge is not part of us Human abilityIn the light of the above, how are we to make sense of the notion of human ability, of capacity, of intelligence, of cognitive achievement of some factor of the mind, of a latent trait?These are normally considered properties of the per son, attributes of an isolated mind, functions of an individual human consciousness. Yet our analysis of how we collect information about the other, or even how we obtain knowledge about our self, denies the possibility of such separation, and acknowledges th e possibility only of information about relations.I described knowledge of the field as a selected se t of data constructed out of the relation between story and experience. Such selection is alw ays in a context of some action, even if the most recent action is talking to oneself. Ab ility is a redundancy concept that acknowledges the action and then claims responsibil ity for it. It is an example of the common epistemological error of attributing a cause to the relational balance of an ecological system.Semantically, this is achieved through the simple t rick of nominalisation; of changing a verb into a noun, and thus of converting a process into an object. It is very simple: I do something, I am part of an event. Therefore, the ca usal logic goes, I am able to do the things I do (before I do them), otherwise I wouldn' t have been able to do them. Therefore I must have (here comes the nominalisation) an abil ity, some property located somewhere within me, that allows me to do this thin g that I do. This is an example of the dormative principle. Keen ey (1983), explains how it works:nnnnr
9 of 10n2n 2nnnnnrrn>n>>2n>nnr>n>nn3rnrrr2nnBB& r=..3nn''?/..8.nnnnnr8nrnrr8.nr82nnnr''.r2nr2n.nrrnn.nr3rr?.rrrrr.rnnn2nnrnn2n*rnr.rn82.rrnn2nrr !""$&nnnr= rr>>nr+rnrrn.>n>nn+n+nrrrnn!"1&nrn+rrn$$B&It is not by accident that whenever universal educa tion claims to equalise opportunity to cultural immersion and hence occupational choice, a t the same time examinations and psychological labelling provides upper limits previ ously applied through the
10 of 10mechanisms of class and caste. The basis of the hig hest morality of any society has always been the maintenance of stability. ConclusionSo what does a test measure in our world? It measur es what the person with the power to pay for the test says it measures. And the person w ho sets the test will name the test what the person who pays for the test wants the tes t to be named. The person who does the test has already accepted t he name of the test and the measure that the test makes by the very act of doing the te st; when you enter the raffle you agree to abide by the conditions of the raffle.So the mark becomes part of the story about yoursel f and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates t his authority; true because your cultural habitus makes it difficult for you to perc eive, conceive and integrate those aspects of your experience that contradict the stor y; true because in acting out your story, which now includes the mark and its meaning, the so cial truth that created it is confirmed; true because if your mark is high you ar e consistently rewarded, so that your voice becomes a voice of authority in the power-kno wledge discourses that reproduce the structure that helped to produce you; true beca use if your mark is low your voice becomes muted and confirms your lower position in t he social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete
1 of 16nrrrThe first part of the chapter details some of the w ays in which psychometricians fudge; by reducing criteria to those that can be tested; b y prejudging validity by prior labelling; by appropriating definitions to statistical models; and by hiding error in individual marks and grades by displaced statistical data, and imply ing that estimates are true scores. In the second part of the chapter a number of speci fic examples of fudging are detailed; in particular, the item response theory fudge, sele ction and prediction fudges and the great Queensland reliability fudge. Constraining the definitionReliability and validity are two concepts dear to t he heart of test constructors and others involved in the field of psychological and educatio nal measurement. I'll begin my analysis of the fudge that characterises the field by looking at reliability, or the lesser fudge.Reliability in classical test theory is (indirectly ) an estimate of the error you'd expect if the student did a hypothetical parallel test. And i n generalizability theory it's an estimate of the difference between the "universe" score and the score on any particular test. In both cases it's about the reliability of the test, or more accurately of the test-testee interaction, and not of the assessment; of the exte nt to which two tests give the same score, not the extent to which this particular desc ription of student performance, based on a test, confirms or contradicts other such descr iptions, which may or may not include a test (Behar, 1983, p19).Note the way the mathematical model simplifies and constrains the world. It would be easy to believe the reliability of the test was abo ut the extent to which the test describes course outcomes or student performance or work succ essfully completed. It isn't. It confines itself to the closed world of the test. It 's about its ability to reproduce itself. Mathematical models and true scoresThe concept of the true score or universe score is central to the derivation of the theory. That is, it is a theoretical assumption. That does not mean that it necessarily has any place in the interpretation of the theory, that it corresponds to some measurable property of real people. And even if it does, the theory ind icates that we can never know the true or universe score, only an estimate of it. And that estimate is always associated with error.So in practice, in the world out there, there is no true score that can be attached to a person or an event. There is no thin line beside wh ich a number is placed. Even before the empirical evidence starts to come in, there is only a wide fuzzy band, and all we can
2 of 16say mathematically is that the true score is probab ly in there somewhere. And if it is only probably in there somewhere, then for all practical purposes, for an individual person it isn't in there at all. In practice there is no true score. There is no stable rank order. And if in practise there is no stable rank order, then the re can be no stable practical standard. The history of achievement testing represents an en ormous confusion of theory with practice. A model is not true or false. It is usefu l in as much as its predictions accord with empirical data at some points. It is not neces sary that the assumptions of the theory correspond to actual situations in the world in whi ch its predictions are applied. The assumptions of quantum mechanics from which the the ory derives cannot be validated empirically. That is why they are assumptions. The metaphor in which the assumptions may be enclosed is useful in as much as deductions from the theory are experimentally verifiable. But such assumptions are not considered "true." Nor are they considered as having some "real" existence out there in the "atom ." Psychometricians on the other hand assert that thei r assumptions about a true score or universe score imply that such a score refers to so me attribute, some measurable property, of a person. The person can be then class ified, because the number is a measure of something called achievement, or ability or whatever. In Criterion-referenced tests it is achievement in a s pecified "domain" of knowledge, and is called a "trait."Regardless, this achievement is assumed to be some psycho-cognitive state which can be accurately described by finding a corresponding poi nt along a one dimensional scale. Why are these very intelligent people wanting to in sist that their theoretical assumptions are consistent with empirical reality, when theorie s in general require no such correspondence? And when the fundamental assumption the primary axiom of this particular theory, is that such correspondence can never be achieved? Why this enormous urge to represent uni dimensionally a vari ety of human performances which are obviously multi dimensional? Why this obsession with numbers, this illusion of numerical accuracy, this delusion of descriptive ad equacy? At this time, let us merely note that all of these activities are related to a psychological ideological assumption about human ability, or skil l, or achievement. Some particular quantifiable quality of people that belongs specifi cally to them, and is thus independent of gender, race and class; that is unsullied by env ironmental factors; that is a permanent fixture of the person independent of the conditions of its production. That is, indeed, the clinging legacy of the nineteenth century belief th at "intelligence was a unitary and immutable trait. It had no kinds or varieties, only ranks."(Wolf, 1991, p36). As well these assessment activities are related to an ideological social assumption that this quality may be quantified and be represented a long a uni dimensional line of almost infinite length, along which each person may now be accurately placed and categorised, their place permanently fixed, and their relative p osition in the order of things firmly established. And this conception of "ranking, fixed ness, and predictability provided the "scientific" basis for two enduring institutional r esponses to the diversity of styles, cultures and academic backgrounds of students: univ ersal testing and the systems of tracking students." (Wolf, 1991, p38).And, further to Chapter 4 note that
3 of 16This portable cumulative record of individual worth and achievement is central to bureaucracy and psychology alike. . th e inscriptions in individuality . make the individual knowable, c alculable, and administrable, to the extent that he or she may be differentiated from others and evaluated in relation to them. . individualit y has been made amenable to scientific judgement. . With psychometrics the previously ungraspable domain of mental capacities has opened up for gover nment. What can now be judged is not what one does but what one is (Rose, 1990, p140). The General frame and the true scoreThe logic of the General frame does not require any notion of a true score. The true score is a statistical artefact, a mathematical art ifice, devised to defend a quite fantastic and monstrous proposition about ordering and classi fying with great accuracy large numbers of people. Here is that monstrous propositi on spelt out in more detail. The political proposition that is being rationalise d, justified, mystified, constructed and implemented in the notion of a true score is this: that it is possible in any area of human achievement to produce an accurate order of merit o f "ability" in that area, and to attach to each person a number, a score, that fixes them f irmly in position within that hierarchical order.What do we actually know empirically? That under ce rtain conditions it is possible to increase the stability of the rank order of merit o f people on "test" results, in "test" situations. And that the more we can eliminate pers onal idiosyncrancies of setters and markers by averaging, and the shorter the time span of repeating the testing, the more the rank order is generalizable to other setters and ma rkers of similar tests constructed by similar people.We do not know empirically whether there is an asym ptotic limit to this stabilisation; theoretically, and practically, there is always an error of measurement. We do know that this fits empirical data quite closely in regard to sampling assessors for marking. That is, when students do very similar tasks and the idiosyn crasies of assessors are "averaged" out.We do not know empirically whether a similar stabil isation occurs when results are averaged over different occasions. There is no a pr iori reason to believe that they should be, especially for achievement tests with a high me mory component. Indeed, there is every reason to believe that the actual performance s of particular students would vary considerably, and differentially, when assessed ove r time, given that their forgetting curves are non linear and of different shapes. Thus sampling across these dimensions could produce an increase in error in the General f rame, not a decrease. It would be very dangerous to collect such information, however, for it would contradict the assumption of stability that the notion of skill or ability im plies. Empirically the true score is not known, and can ne ver be known. Empirically estimates of the true score can be obtained, and these are al ways different, because all of the measurements we make contain an error. In practice then, error is indicated by the difference between estimates, not between estimates and some hypothetical "true score." That is why the notion of true score is not necessa ry for simple and specific and
4 of 16individualised estimates of error, though theoretic ians and ideologues may well require the idea for their own particular purposes.The notion of the true score, then, despite its eno rmous ideological importance, is practically unattainable, irrelevant, and misleadin g. It is a theoretical input to the mathematical theory of testing, not a practical out put. The statement that there is a true score is a statement about a theoretical statistica l assumption, not about an attainable empirical reality. Further, such assumptions of mat hematical models need have no direct links to any properties or aspects or qualities of phenomena "out there" in the real world. Note that we do not define true score as the limit of some (operationally impossible) process. The true score is a mathematic al abstraction. A statistician doing an analysis of variance does not try to define the model parameters as if they existed in the real world. A statistical model is chosen, expressed in mathematical terms undefined in the re al world. The question of whether the real world corresponds to the model is a separate question to be answered as best we can (Lord,1980, p6). Lord then agrees with me, at least on page 6. More of Lord later. For now, having seen how the fudge about the true score works, we'll exa mine some of the others. One really big one relates to test items. Models and itemsThere is no doubt that one way to get information a bout achievement (what a person has done), or skill (what a person can do), or ability (what a person could do given the opportunity), is to get them to answer some questio ns about what it is they are supposed to have achieved or have the ability in. And one ra ther contrived way of doing this is to use pencil and paper tests. Further, a particular m ethod of this technique is to use test items of a multiple choice or short answer form.It requires an enormous suspension of rational thin king to believe that the best way to describe the complexity of any human achievement, a ny person's skill in a complex field of human endeavour, is with a number that is determ ined by the number of test items they got correct. Yet so conditioned are we that it takes a few moments of strict logical reflection to appreciate the absurdity of this.Test items not only determine the form and media of testing as paper and pencil tests, but also specify the type of question as short answ er or multiple choice. In other words, talk of test items tends to narrow dramatically the sort of performance situation in which the person being assessed is to be put, and also se verely limits the sort of description that might be given.Why is this important? Because psychometricians hav e defined reliability and generalizability in terms of test variance, which i s in turn determined by the characteristics of test items. Likewise, estimates of construct validity, on the rare occasions they are estimated empirically, are deter mined by statistical manipulations of item characteristics.By appropriating terms like reliability and general izability and validity, and defining
5 of 16them in terms of the mathematical properties of par ticular tests, professional test agencies and examining institutions perpetrate anot her grand fudge. These concepts become narrowly construed as properties of tests, o r relations between numbers, rather than as useful criteria on the basis of which conce rned people may judge the whole assessment exercise. Item response theory and the absolute scaleItem response theory allows us to construct a scale in the same way that classical test theory and generalizability theory enables us to co nstruct a true or universe score. The magic is in the word "construct." It is theoret ically constructible, not empirically constructible. In fact, the theory determines that the scale is absolute but improbable; the actual scale produced measures the probability (or if you prefer, the improbability) that any person to whom the scale is applied actually ha s that reading on the (theoretically) invariant scale that the theory constructs.Just as objective tests are highly subjective instr uments in which the marking can be done objectively, but it is implied that the assess ment is objective; and just as the true score can never be measured but it is implied that the estimated score is that score; so the invariant scale of the criterion referenced tes t can never be physically produced, but it is implied that the test produced contains that scale, rather than its very error-prone physical manifestation. Criterion referenced testsCriterion referencing, as applied by professional t est agencies, is not directly referring to course objectives or to student learning. Criterion referencing refers directly to test items. A criterion referenced test is one that is p roscribed by tight delineations of the structure of particular tasks to be included in the test. Advocates of criterion referenced tests often claim that the performance on such a test is judged in relation to an absolute rather than a rel ative standard. That is, that scores on criterion referenced tests are measures of achievem ent in a particular domain and do not depend on relative merit, but are informative in th eir own right. This claim is another psychometric fudge. Criterion referenced scores are in no way absolute scores. They are norm-referenced. The norm -referencing is done prior to the test construction process at the item level, and no t at the total test level during a specific application of the test. (Behar 1983, Glass 1978)Criterion referenced tests contain all of the error s of Mastery tests plus one additional labelling error of great ideological significance. A sub-group of tests in this area, called sometimes Domain referenced tests, have developed a whole theory based on test item characteristics, which is very efficient. Efficient in the sense that students can be tested with less items than in the random sampling model f or the same error (an error which, as usual, is never attached to individual scores). Thi s is achieved by using known levels of difficulty of the items (based on random or other s pecified population estimates), in
6 of 16computing the student's score.Nothing wrong with this of course. Except the label ling claim that these scores are absolute measures of a "latent trait." What is a la tent trait? It is some "hidden characteristic" which some students have more of th an others, and which is measured by the test. And those who have more of it are more li kely to be able to answer correctly the more difficult items.As all of the items in a Domain referenced test rel ate to some particular area of learning, such as reading comprehension, or computer skills, or simple calculus, or newspaper editing, or social skill, or whatever, then it does n't really matter what "latent trait" means. The assertion that "it" can be measured abso lutely is what constitutes its ideological power. Here is the ultimate rationalisa tion for intellectual and social stratification. Here is the number that describes e ach person's place on the continuum of ability or skill or whatever for any label that tes ting agencies wish to attach to the domain of items.On the surface, of course, it is the specific label that assumes social importance. The claim being made, or at least strongly implied, is that such a test is an absolute measure of reading comprehension, or computer skill etc. Bu t in focussing on the label, we are likely to miss the frightening significance and ide ological sleight of hand that produced the "latent trait" as some substantive property or quality permanently attached to the person tested, somehow magically unrelated to the h ighly subjective, contrived, interrelational world where a student sits at a des k, reads some questions, and places ticks in computer marked boxes.Such tests construct current fashionable truths. Th ey are being presented as the latest panacea for testing human ability, or "skills" or competencies" as they are now called; they are being presented as the theoretical support for an invasion of competency based assessments in all areas of human measurement (in s chools, businesses, bureaucracies, or where-ever else hierarchies operate). So we shou ld be clear about three things: The first is that constructing a domain referenced test and naming it produces no evidence that the tests measures any sort of trait or ability that can be attached to an individual person (Lord, 1980).The second is that they are not absolute, or error free measures; the scores are related to relative merit, and there is no "standard" performa nce or score that relates to any minimum or other grade of "competency" that can be theoretically attributed to any score (Glass, 1978).Which takes us to the third point, which is a logic al conclusion from the previous two. Domain referenced tests can make little contributio n to a field of "competency" assessment which purports to describe (or more sign ificantly measure) some "standards" of competency in various "skill" areas of human per formance. Limiting constructs, limiting errorLet's examine briefly how some of the more general criteria of assessment; labelling, construction, stability, generality, prediction, te nd to be limited to what can be controlled
7 of 16by test makers.Labelling is achieved by the simple act of giving a name to the true, or universe, or latent trait, score. Which means, in practice, to the esti mated score. The errors implicit in the communication of what that label means, between tho se who define the course, those who teach it, those who produce the test, those who do it, and those who consume its product, are thus not considered. All of these peop le will give their various meanings to the label, and make their judgments accordingly. We may be certain that these meanings vary considerably. How much they vary will probably never be known, because it is not in the interests of any institution to uncover yet another source of error. Labelling errors are not currently considered in any estimate of tes t error. I believe they are immense. If communication is its effect, then such confusion s are, to the student, irrelevant. To the student the meaning of the label is the grade or th e mark attached to it. Within the structure that contains the assessment system, the meaning of the label, as distinct from the meaning of the mark, amounts to little more tha n ideological gossip. At least some students recognise the meaninglessnes s of the label. I remember vividly a television program which followed the fortunes of f our students through the final months of their preparation for the University Sele ction Examination in New South Wales. One student in particular, a science student a paragon, studied hard and reaped the ultimate reward. Straight A's.Just after he received his results he was interview ed for the last time. He was obviously pleased with his success. "I suppose," the interviewer said, "this will be ve ry useful to you in the future." "The marks?""The understanding. The knowledge.""Oh that. No, I don't expect that to be of any use to me at all. I'm going to be a lawyer." Likewise, construction errors are not estimated; th ey do enter the theoretical psychometric definitions of validity, but are in pr actice neither measured nor estimated. The major task of matching objectives to assessment to performance is assumed entirely by the test maker, and most of the errors within th is activity are also disregarded, as easily as the errors caused by differing forms of a ssessment, and use of media other than reading/writing, which don't fit the format of test items on paper, are disregarded. It is assumed that the test is indeed contracted, and the performance required by the student indeed matches, the objectives of the course, or th e criterion definitions of the test. Sampling processes that are used, even in professio nal testing agencies, are at the best primitive, and at the worst nonexistent. This part of test construction is nicely described as an "art" rather than a science (Nairn ,1980).One thing is certain though; no course has stated a s its major, or even minor objective, the ability to answer a pencil and paper test in a given time under stress conditions. And why not? Surely this is the essential behavioural o bjective. Stability becomes narrowed to test reliability, mor e accurately called internal
8 of 16consistency, an internal test measure that cannot t ake account of variation over time and place and assessors. Theoretically test-retest reli ability is one form of reliability, but in practice such estimates are rarely obtained.Generality becomes narrowly construed as related to the extent to which the test samples the universe of possible test items, or how well th e item specifications cover the domain. Generality becomes a function of test items and is called generalizability. Generalizability ignores previous performance in di fferent contexts, forms and media. It ignores all performance other than the purely cogni tive response to simulated experience of a multiple choice or written form. It thus ignor es all cooperative and all production modes of expression. It reduces human response to t he act of recognising a "best" answer, to conforming adequately to some authority' s view of importance, relevance and reality, or to answering someone else's question in a particular way. And prediction becomes tied to numbers and test sco res. In this psychometric world we are no longer concerned with the extent to which ac tual people are helped to function in differential social situations of great complexity. Prediction does not attempt to describe the relationship between a particular set of learni ng experiences for some person, and how helpful that is in some future situation for th at person. Rather it ranks a group of people on their "success" in the "learning" situati on, then ranks them again in some criterion situation. The correlations between the t wo rank orders represents the predictive value of the test. Not of the course, of the test. And not of its relevance to the quality of their performance, but to its correlatio n with some person's or group's ranking of their relative performance. And note that even i f this correlation is high, which is unusual unless a similar test has been used to meas ure the criterion, this tells us nothing about whether the relation is in any way causal. How the fudge worksThe psychometric fudge occurs through the following processes: Firstly, the criteria by which assessment is determ ined are chosen so that they are easily adaptable to the construction of tests and to the s tatistical manipulation of test data. Criterion-referenced tests are just that: Only thos e criteria that are appropriate for referencing test items are chosen.Secondly, the validity of the test is prejudged by labelling it to describe what it is supposed to measure. Such is the power of labelling that this exercise in wishful thinking, this untenable assertion, is interpreted by most people, including the test constructors who become entranced with their own pr opaganda, as being an accurate description. At a deeper level still the mathematic al theory itself contains such terms as true score, ability, and trait before any empirical information at all is available; that is, before any connection (let alone correspondence) wi th the world outside mathematics is established.Thirdly, definitions are appropriated and defined t o fit specific statistical models; in particular, by narrowing the universe of possible t est situations to a universe of possible test items (random sampling model), or by narrowing the universe of possible test items further to the universe of suitable test items (dom ain referenced testing). In both cases
9 of 16the performance of students outside of such test si tuations is disregarded, or downgraded, and the right to appropriate the person alising labels (ability, trait, true score) is assumed.Fourthly, the data is presented in a way that is mi sleading at best and deceitful at worst, by hiding error of individual marks and grades with obscure and displaced statistical data, thus implying, to all but the statistically s ophisticated, that estimates are "true" scores. Further, the implication is made that such tests are accurate as predictors, claims that in most cases cannot be substantiated (Reilly, 1982). Finally, estimates of confusions and errors related to construct validity are ignored, usually theoretically, and almost always practically.We could look at these fudges as things done by ind ividuals, and thus attributable specifically to them. From this psychological frame how could we make sense of this fudging behaviour? At best the fudges can be interp reted as logical or psychological slips propped up by delusions of grandeur. At worst they represent academic chicanery and political manipulation in high degree (Nairn, 1 981, p58). If we regard this in a sociological context, howeve r, a different picture emerges; psychometricians may well be regarded as the moral guardians of the age of competency, the high priests who hold society stabl e by propagating, preaching, and propping up the gospel of the Standard, and the cul t of the linearly determined individual that it constructs and supports. In the beginning"What's in a name?" Bill Shakespeare said, "that wh ich we call rose by any other name would smell as sweet." Maybe so, yet that which we call a trait when it is just a mathematical function takes on a different odour in deed. Names have a magic of their own, and the stickiness of the name is very depende nt on the power of the namer. Lord (1980) produced the seminal work on item respo nse theory, in his book Applications of item response theory to practical t esting problems It is possible here to trace in detail the birth of a fudge.Early on there are some laudably honest statements: True score theory shows that a person may receive a very low test score either because his true score is low or because his error score is low (he was unlucky) or both (p5).The true score is a mathematical abstraction. A sta tistician . does not try to define the model parameters as if they actually existed in the real world. A statistical model is chosen, expressed in mathema tical terms undefined in the real world. The question of whether the real wo rld corresponds to the model is a separate question to be answered as best we can. It is neither necessary or appropriate to define a person's true score or other statistical parameter by real world operational procedures (p6) In item response theory . the expected value of the observed score is still
10 of 16called the true score (p7). Admittedly, our laudability quotient diminishes as we reflect on the use of the word "true." In what sense can it be true if it doesn't exist in the real world? Why call it true if it can't be measured. But perhaps it is true in a m athematical sense because it is a necessary conclusion for the premises of the theory ? Not so, it is merely the name of a variable assumed in the theory.Undeterred we press onwards. Five pages later Lord commences the serious work in developing the theory: Let us denote by the trait (ability, skill, etc) to be measured. For a dichotomous item, the item response function is sim ply the probability P of a correct response to the item. . it is very r easonably assumed that P increases as increases (p12). Now this is truly remarkable paragraph. The word "t rait" has not appeared before. Where did this "trait", this "ability", this "skill" come from that is being measured? What does it mean? Lord "very reasonably assumes" that as this t hing increases, the probability of answering a particular test item increases. But why do we need this thing at all? And why is it named a trait or a skill or an ability, w hich are hardly "mathematical parameters"?We wait expectantly till page 45 to find out what means mathematically. "A person's number right score . on a test is defined . a s the expectation of his observed score x. It follows immediately . that every person at abi lity level has the same number right true score." Then on page 46 the crucial point fina lly emerges "true score . and ability . are the same thing expressed on different sca les of measurement. And just in case you missed it, the best estimate of this true score this ability, is the number of items answered correctly on the test.Thus on his own admission Lord has done exactly wha t he claims statisticians do not do. He defines the parameter as having "real world" sta tus when he calls it ability. (Just as he infers it has some objective or propositional re ality when he calls it true). Its mathematical status is simply the number of items a nswered correctly under the idealised conditions specified in the theory. It's empirical status is the actual number of items answered correctly, or some statistical manip ulation of that number. There is one more aspect of this fudge that we need to look into. It is the fascinating use of the adjective "latent" in front of trait. Hamble ton & Swaminathan (1982) elucidate: Any theory of item responses supposes that, in test ing situations, examinee performance on a test can be predicted (or explaine d) be defining examinee characteristics, referred to as traits, or abilitie s: estimating scores for examinees on these traits (called 'ability scores') ; and using the scores to predict or explain item and test performance. . S ince traits are not directly measurable, they are referred to as latent traits o r abilities. Any item response theory specifies a relation between the ob servable test performance and the unobservable traits or abilities assumed to underlie performance on the test (p9).
11 of 16Of course, this is not quite true. Item response th eory does nothing of the kind. It assumes certain characteristics of test items, and then generates a total score which is an estimate of the true score. Under certain condition s, "we can think of as the common factor of the items" (Lord, 1980, p19). The true sc ore can only be guessed. The mathematical theory tells us the probability that i t lies somewhere within a certain range of scores. Latent means hidden or concealed or pote ntial. What is hidden, what is latent, is not any characteristic of the person, but a char acteristic of the measurement itself. The examinee has performed, has participated in the eve nt of answering test items. Nothing hidden or latent about that. So why the displacemen t? How did a latent measure become a latent trait?Item response theory doesn't need any assumption ab out traits at all. The talk of traits and abilities is redundant and gratuitous. After al l the terribly refined and elegant statistical manipulations, Item response theory sim ply produces a total score which (given knowledge of the structural characteristics of individual items) allows a prediction of the probability with which any partic ular item will be answered correctly by a person with that total score. It does require a certain consistency of correct (or incorrect) response for specific items on the part of the examinee. All else, as far as item response theory is concerned, is fantasy.Incidentally, such prediction is in no way an expla nation; to assume that is to evoke the dormative principle; the total score is just a summ ary of information about a particular person answering the individual items. Such a score cannot now be used to explain why the items were answered correctly.On page 55 Hambleton and Swaminathan (1982) come cl ean; rather by accident that design, I fear. "Ability", we read, "is the label t hat is used to describe what it is that the set of test questions measures." Precisely. And wha t it measures is an estimate of probabilities of answering certain test items corre ctly. To what extent that measure relates to any "characteristic" or "trait" or "abil ity" of the examinee may only be known after "construct validation studies . (which) v alidate the desired interpretations of the ability scores" (p55). Shouldn't that read "validat e or invalidate"? Mistakes: probability, correctness, and checkingItem response theory cannot predict whether a parti cular person (whose true score we don't know but whose estimated score we do know), w ill get a particular item (whose characteristics we know), correct or incorrect. The theory will predict the probability of getting it correct. In practice it will either be c orrect or incorrect (probabilities are only 1 or 0).So item response theory never even pretends to esti mate what people know or can do. It only claims to estimate the probability that they c an do certain things. Then the assumption (and that's exactly what it is) is made that this indicates an ability of the person in that area of cognition. It might mean som ething else. Or it might not. When I worked as a test constructor I noticed one a spect of answering tests that was interesting. When groups of year 10 students did th e 100 item tests most would finish in about ninety minutes. When groups of year 8 student s did the tests most would finish in
12 of 16about 60 minutes. The year 10 students got slightly better results (about 0.3 S.D. better). Conventionally this would be interpreted as meaning that they had more ability, or simply more maturation. But given my perceptual dat a, perhaps it just means that they did more checking! Psychometric selection myths and fudgesHulin, Drasgow & Parsons (1982) complain that the c ontroversy and rhetoric about standardised educational admission tests seem to ha ve developed independently of the psychometric evidence about the usefulness of admis sion tests in reducing errors in prediction. They claim that Cleary, Humpreys, Kendr ick, & Wesman (1975), Rubin (1980), Linn, Harnisch, & Dunbar (1981) among other s, have produced summaries of large numbers of studies relating college and profe ssional school admission test scores to performance in post secondary and postgraduate e ducational institutional institutions: The evidence is clear and consistent. Well-construc ted tests of cognitive abilities are significantly and consistently relate d to performance in school. When appropriate corrections are made for restricti on of range and other statistical artefacts, the validities of tests are appreciably large (p 281). Claims such as this are very common. So on this occ asion I thought I'd check out the references.Cleary's (1975) data involved correlations between verbal and mathematical SAT scores on the one hand and High School grade averages and College grade averages on the other. The correlations ranged from 0.35 to 0.50. B ut the correlations between the High School and College grades were higher at 0.64. So t wo points about Cleary's study: firstly the correlations are at best only 25% bette r than pure chance. Is this "appreciably large"? Secondly, they were considerably lower than the correlations from grade averages, so why were they necessary at all?Rubin's (1980) study involved the use of the Law Sc hool Admissions test to predict first year grades in 82 law schools. The correlations ran ged from 0.03 to 0.5; after corrections for range (Linn, 1981), the correlations range from 0.2 to 0.7. In 14 of the schools they were below 0.35, which is 12% better than chance. I s this "appreciably large"? When it is known that issues of construct validity introduce far more sources of error than are involved in simple predictive correlations of this sort, it is difficult to understand how this sort of justification, which is quite common in the literature, goes on for decades virtually unchallenged within the ps ychometric community; on the other hand, compared to the abysmally low correlations of ten obtained in such predictive correlational studies, perhaps they are appreciably large. However, these studies raise another issue and anot her fudge; the correction (always upwards) of predictive correlations. Fudging the predictive correlations
13 of 16Correlations between a selection instrument and lat er performance are often corrected for range restrictions and for criterion unreliabil ity. Range restriction is reasonable; generally some of the people tested were not select ed, so had no opportunity to be in the final sample. It is considered appropriate by stati sticians then to estimate what the correlation would have been had all of those select ed actually been appointed. After the correction, of course, it is a correlation about so mething different; it becomes the estimated correlation between test performance and later performance of all those who sat for the test. Prior to the correction it was th e correlation between test performance and later performance of all those who performed la ter. Different sample, different correlation. Which to use depends on what question you ask. Automatically raising the correlations is a fudge.Correcting for criterion unreliability is a differe nt matter. Most job tasks are multi-dimensional; that is, they involve many very lowly correlated tasks. And college grades are likewise composites based on lowly corre lated components. If a single correlation is to be obtained a with multi-dimensio nal job performance the various ranks or gradings have to be collapsed into one single ra nk or grading; and that requires some arbitrary and explicit loading to be applied to eac h dimension (See Chapter 10 on Comparability).Even when this is done (and it often isn't), there is still the assumption that there is indeed a meaningful rank order to be obtained. If m ost people in most jobs or in most courses do their work adequately (just as most peop le drive cars adequately), then we would expect correlations to be low, and ultimately where training schemes are very adequate, to be zero. In such situations, the relia bilities would be low not because of rater inadequacy that can be corrected for, but bec ause raters are attempting to separate performance when it cannot be separated, or/and are trying to pretend that a multi-dimensional performance is in fact uni-dimens ional. In such cases it is obviously not appropriate to artificially inflate the correla tions because of rater unreliability. The changes are more than trivial. A study by Schmi dt, Hunter & Pearlman (1981) involved 150 000 people, 2000 predictive correlatio ns. Before correction the average correlations between eight aptitude tests and job p erformances in clerical job categories ranged between 0.15 and 0.25. After the statistical corrections, however, they magically rise to between 0.3 and 0.5. Still not good. In fac t, still quite awful. But they certainly look better than before, and aptitude tests survive again to live another day. The great Queensland reliability fudgeI was talking to the Principal of a secondary schoo l in Queensland. Students in year 12 are assessed internally, with the help of some exte rnal monitoring. I suggested that there might be some problem with reliability. "It's 0.95, he replied with confidence. "Excellent," I responded with some scepticism. Then I decided to check the data. The study is titled Random sampling of student foli os: a pilot study (Travers, 1994). In this study . 1189 exit review folders of Year 12 student w ork were collected randomly from school subject groups across Queensla nd in December 1993
14 of 16and assigned to two hundred and forty review panell ists in other districts. These exit review folders show the work of students who have received a result for that subject on their Senior Certificate The role of the review panellists was to examine packages from schools con taining ten folios, and for each folio decide a Level of Achievement and re lative position in that achievement band (p 1). The review panellists were given access to other ma rker's assessments and comments, as well as the school's assessment of the Level of Ach ievement. What they didn't have was information about the rung placements within each l evel of achievement (There are ten rung placements within each level of achievement) .So this is not a blind reliability study: because it was not possible to reproduce all the co nditions under which judgments about students were made by schools which supplied folios. In particular, panellists did not have the opportunity to observe student performance over an extended period of time as teac hers do (Travers, 1994, p12). The astute reader will already have noticed a contr adiction here. The study was not constructed as a blind reliability study where no p revious marks or grades were attached because they wouldn't have sufficient data to make valid judgments about levels of achievement. On the other hand they are being asked to make much finer discriminations regarding rung placements.The astute reader will also doubtless have expected a very large halo effect, and would not be surprised if reliability coefficients, at le ast in relation to levels of achievement, were very high. As indeed they were. Eighty per cen t of achievement levels remained unchanged, most of the aberrant cases being one lev el lower, indicating, no doubt, the "high standards" of the review panellists.The overall correlation figure obtained for agreeme nt between school exit and review level rung placements, on a fifty point scale, was 0.95. The authors were particularly pleased with the rung placement data: a rung difference of plus or minus one or two is no t so much a significant difference as a demonstration of precision and accu racy . half the decisions about rung placement involved either assi gning the same rung or one or two rungs lower. . (this) suggests that no t only do these panels read the folios very closely, but that they are able to arrive at decisions about standards that are both highly reliable and very pr ecise (Travers, 1994, p17). I did a little experiment. I listed fifty (hypothet ical) folios in rank order of one to fifty, with ten papers at each level of achievement. Then, keeping them at the same level of achievement, randomly allocated new (reviewed) rung placements within each level. The rank order correlation was 0.95.It follows that acceptance of given levels of achie vement (halo effect), combined with random allocation of rung placements, is sufficient to account for the 0.95 correlation that was used to justify the whole procedure, not o nly of the pilot study, but indeed for
15 of 16the whole examination system, as evidenced by the P rincipal's comments. Rather than evidence of precision in rung placement s, which determine tertiary entrance scores, the data generates evidence of randomness, and another psychometric fudge is perpetrated by well meaning psychometricians on a g ullible public. The General frame and the true scoreThe General frame of reference as hijacked by psych ometricians contains as an essential element of its assumptions the notion of a true sco re; a further element of those assumptions contains the notion that it is possible in some way or another to approach that true score; to get measures empirically closer to the true score by various procedures implied by the particular model. For example, in cl assical test theory by increasing the number of items on the test; in generalisability th eory by sampling more tasks more randomly from a bigger collection of possibilities; in item response theory by having more items of appropriate characteristics which are uni-dimensional; in domain referenced tests by having the domain of items crit erion referenced to a high degree. Allied to this frame but not tied to it so tightly are the various notions of reliability and validity that have not been developed as part of th e mathematical models mentioned in the previous paragraph, but have emerged from more general considerations of the notions of assessment, rather than of tests. In my terminology, these considerations have challenged the artificial constriction of the gener al frame by psychometricians, and have restored, through notions of construct validity and consequential validity, at least some of error components previously bypassed.However, this has produced a contradiction with the notion of the true score that has not been made overt. For example, as described in Chapt er 16, most achievement tests are not made more valid by increasing their reliability ; on the contrary high reliability is seen to be, in most circumstances, an indicator of low validity. For most achievement areas involve a large number of disparate activitie s, and there is no a-priori, or even post empirical reason to believe that these activities a re uni-dimensional, or otherwise closely inter-correlated.I argue in Chapter 15 generalising the assessment e vents across contexts, or time, or media, or even value assumptions or frames of refer ence, does not (as does generalising across selection of test items or markers), reduce the standard error of the estimate; on the contrary, we have every reason to believe that it will increase such error, to a point where the whole notion of true score becomes unsust ainable. After all it is not by chance that so much space is given in test manuals to ensu ring the conditions under which the test is given are kept constant. Obviously this ind icates the fragility of the test to contextual shifts. (On second thoughts, it could be as much a ritual designed to imply scientific accuracy, and sustain the notion of fair ness). Regardless, it is clear that contextual shifts increase the error term, whilst c ontextual control artificially reduces it; artificially because no argument is ever given, nor could it be sustained, that this particular test context is superior to any other to the measurement of this "ability." So once again the price of higher reliability is lower validity.
16 of 16PreviewWe could go on dealing with the specifics, but it i s time to present the greatest fudge of all. Validity. For as will become clear, the very d efinition of validity creates a discourse around it where every test may be assumed valid unt il proved otherwise, and as there are no specific descriptions as to how such a proof mig ht be constructed, and no specific standards of acceptability to which such descriptio ns might be compared, all assessments may claim to be valid.
1 of 11nrrr The professional theoretical face of assessment dis course asks the question, is the test reliable? More ethically orientated assessors ask t he additional question, is the assessment valid?The public wants to know, is it fair? And the more critical of them might add, are people being violated?In this chapter some of the more recent work on val idity is discussed, and its positioning as advocacy demonstrated.Reliability is also discussed as a problematic, rat her than as an obvious prerequisite to validity. Validity"Validity," states the first sentence of the APA St andards of educational and psychological testing (American Educational Research Association, 1985), "is the most important consideration in test evaluation. The con cept refers to the appropriateness, meaningfulness, and usefulness of the specific infe rences made from test scores" (p9). It goes on immediately to explain that: "Test validati on is the process of accumulating evidence to support such inferences."Which all sounds very scientific and objective and devoid of bias. But is it so? Let me, from my own particular concern with the test taker, rewrite the first sentence to dovetail more accurately with my concerns."Invalidity," states the first sentence of the alte rnative tract, "is the most important consideration in test evaluation. The concept refer s to the inappropriateness, meaninglessness, and uselessness of the specific in ferences made from test scores. Invalidity or error estimation is the process of ac cumulating evidence to problematise and ultimately reject such inferences."It should be clear even from this small rewrite tha t a text that began with the second conceptualisation would be a very different text fr om one that began with the first. PositioningThe main participants in the testing process, we ar e told, are the test developer, the test user, and the test taker. Also often involved are t he test sponsor, the test administrator and the test reviewer. Sometimes, many of these par ticipants may be parts of the same
2 of 11organisation, with the notable exception, of course of the test taker. As clearly stated in Chapter 1, my position of valu e, my backdrop when I seek information about events, concerns the violations p erpetrated on the participants in those events. So in the matter of testing, my focus is on the test taker, and in what ways the taking of tests and the inferences and consequences flowing from such events constitute a violation a diminishing of personhood, a misrep resentation of potential or action, a claim to unwarranted accuracy of description, and t hus unwarranted control and construction of the living human person who is taki ng the test. The 1985 Standards acknowledge, with fine understat ement, that "the interests of the various parties in the testing process are usually, but not always, congruent" (p1). This trivialisation of the traumatic effects, dislocatio ns, and exclusions of millions of students based on test and examination results is quite rema rkable. Perhaps it is just another example of the way social positioning can overwhelm interpersonal sensitivity and intellectual honesty.The concern of the test makers and users is, after all, with hundreds, thousands, or hundreds of thousands of test takers (not to mentio n their concern with their Board of Directors and shareholders). But their concern is w ith them, viewed as a group. Their interest is with groups, not individuals; in summar ies, not raw data; with simplifying complexities, not with complexifying individuals; w ith objectifying human subjects, not with subjectifying human events.For the test constructor, sponsor and user there ar e so many difficult questions; so many criteria to consider; so many factors to consider i f the overt and covert claims of the test makers are to be defended. We shall deal with these in due course. Yet to the test taker there is only one question, a normative question wh ich emerges from his or her very construction as an individual. Have I passed or hav e I failed? Am I satisfactory or unsatisfactory? Am I normal or a nut case?Additionally and ironically, it is precisely becaus e they see the testing event from this individualised perspective, rather than from a grou p perspective, that they do not ask the more crucial, the more fundamental question: How mu ch error, ambiguity, uncertainty, does this attribution contain? Or is it their power lessness, and unheard voice, that makes these questions at the best unspeakable, at the wor st unthinkable? Sources of evidenceThe 1985 Guidelines describes an ideal validation a s includingnrrrrrrrnrrnrrrrrnrrrrrn rrr rn!rrr!r rn"rrrnn
3 of 11rrrnn!rr!nrrrrr#rrrnrrrnnrn$rrrrrnrrr%rrrrrrrnnrrrnr&r'(rr)nn*+,-.!/rrrrrrrnrrnn!n!nr!r-rrrnrrrrnrrrrnrnrnrnr0r"rnrrrrrrrr1rn'nrrrnrrrrr"nnr)2*++0r *+*+*3nrrr'rnrrrrnr2*++rrrrr!rrrnrrrrnrr!n3,45nrrrr0r *+. rrnrrrr)rrnrrrrrrrrrnnrnnrrnrrrrrrrn* 6rnrr!nn!rrrn7rrnrrrrrrnrrr. &rnrrrrrrrr8nr%&r'rrrnrrrrr rnr'rrrn** 9!8nr!!nr!!!:rnrrrrrrrrrrrnnr
4 of 11;r0r rrr#r!rrrnrrrrrnnnnnrrnrr!0r *+n*r!rrrrrrr!n3?7r%rrrr rr5rr n'rrrrrrr!r!r/rrrrrrr7rrrrnrrrnrrnrr&rrrrrrrrr.rrrr"rrr4@'4rrrrr$rrrrr!n!rrrrrrrr!r!rrrrr/rrrr*+,-&-nrrr. #rrrnrrrrrnrrr8rrrn?4 9rrrrr rnr*+,-r!$rr !7rrrrrrrrrr&r(0*3rrrrrr-nr. /rrrrrnrrnrr%nrrr6rr
5 of 11rrrrnnrrrnnrrr)nrr' nrrnnrr )r%8n, -rrrr8rrrrrrrrrnrrrrrnrnr"rnrrrrrnnnrrrnnrrrrn rr rrrrrn)A=r r)nrrrrrrrnrrrr'r)rrrrrr)rrrrrrrr' n'r&rrrrrrrrr"nrrrrrrrrr"rrrrrr r'rrrnnrr!r!rrr7!r!nnnrrrnrrnrrrrnrrrrnr)rrrrrrrrrrrnrrrrnrrrrrrr rr&Br9*@,nr. &r*>C)rrrrrnrrrrrrrrrrrrrrrrnrr.rrrrrrrrrrrr&rrrr)rrrrr78nr2-5:25nrrrrrrrr8rr
6 of 11rrrrn,, &rnr rrrr.rrrrrnrrr'rrrrnrr8r6rrrrrr rrrrnrrrrrnr8r-2n*4rrnrnrrrr$nrn)nrnr&rrrrrr8r"rrrnrnrrr7)rrrrr" r)rrrnrrrn B rr8r(9rDrnrr&rr rrD&rrr8rrr%rrrr9rrrnnnn&rrrrrrrrrrn#nr rnrr rnnrrrrnnrn68rrrrrrrrrrrrrr&rrr84+rrr8rrrrr9EBr*@,n3+$nrrrnnrBrr(rrrrrrrrr1 r)rr8n4<-rrrnrr!r!rrrnr)rr5)rr)rrrrDrrrrrr
7 of 11rrrrn7)nr r 6rrrn2rrrrr?4347 ,r&r<4nn)rrr!!r!rrr!6rrrnrrrr!!)nrnBrr)nrrr"rrrrr!r !rrrr r r9rr rrrA= rrBr9*@,.!rrrnrrrrrrrrr!n,,rrrr'rr0*3rrrr!rr!6rrrrrnr(#4+4,(%(&rrrrrrnnrrrrrrr)rrrrr%rrr&r!!rB =Brr2*3rr!rrrrr!&=(&)n&rrrrrrr'rrrrBr90B(Fnrrrrrrrrr(nrrrrrrrr7n)nrrrrrrrrrrr2)rrrrnrnnrrnrrrrrrrr)rrrrn 7rrrrrr)nrrrrnrrr)2r rrrrrnrrrr
8 of 11)rrn7rnrrrrrrrrnrrrrrrrrrrrrrrn)Brrrrrnrnrrrrrrrrrrrnr r rrrrrrrrrrrr7nrrrrrnnrrnrrn-rrrrrr&rrrrrrrnrrrrrnrrrrrr7rrrrr)rrrrrrnrrrr&rrrr"rrrrrrr-rrrrrrrrnrrrr1rrrrrrrrrrrrrnnrrrnrrnrrrrrrrr r0r *+nrr!rnrrrrrnnnnnrrnrr!n*
9 of 11rrn, &rrrrrr8rrr!)n!rrrrr5rnrrrrrrrrr0r rA= rrr"rrrnnnnnrrrrn7nnnrnnrnr&rrrrrrrnnnr1rrrrrrrrrrrnrrrnrn&rrrrrnrrrrrnnnrnnrr&=r;rrnnnrr"nrrnnr&!rr!rrnnnrr!r!!rnr!&nrr!)n!rrr8r7rrrrrrrrrr)nrrrr8rrrrrn7r0r rrrrnrr-nrrr0r =*+rrrnnrrrr5r rr rnrrrr"rrrrrrrrrr&rrrr&nrr)n#r)rrrrrrrrrr)rrrrAnrr.rrrrr)rrrrrrrrrnrr)rrrrr0rrrrnrrrrnnrrrrrr
10 of 11rn)rr"nrr8rrrrrrrrrrrnnrr5r8rrrrrnrrrrr0*+>nrr. nrrrrr"rrnrr7rrrrnrr)nrnn*? $rnrrnrrrrr rr"nrrrnrrrrrrnrr rnrrrrrrrrrr r rBrnrrrnrnrrrnnrr#=*+?!rrrrnrnr!n+#*+4nrrrnrrrrrrnrrnnnrnrrn'nrrrrrrrrrnrr'r72*++nrrn'nnr.!rrrr'rrr8rrnr!n*,3&rrrrn'rnrrrrrrrrnrr'rrrn8rrr=nrrrrnrnnnrrrrrrnnrrrn"rrrr'rrnrnr-n**r)n.!nrrnrrrrnrr !6.!B)rrnrrrrn(Brrrrr(!n*4rrnrrrr
11 of 11r;rrrnrr r !r Grrrrrrrrrrrrrrr&0r *+rr rrrrrrrrrrr'rr7)rrrrrrrrrrnrrrrrnr!n,> &nrrnrrr!r!!!rrnrrrrrrr&rr&r!rr!rrrn!!Brrrr rrrrrrnnnrrr "B rrrrrrrr&nr&rrrrrrrrrrrrrrrrrrrrrrrrnrrrrrrrnrrrrrrrrr9rrnnrrr)rr)n&)nrrrnrr
1 of 20nrr nrrrFrom the analysis so far, it is possible to produce a general definition of error as it applies to the field of educational measurement and /or categorisation. This is the flip side of validity which exposes that general nastine ss called invalidity. In this chapter the notion of invalidity is reconce ptualized, having both discursive and measurable components. Thirteen (overlapping) sourc es of error are examined, all contributing to the essential invalidity of categor isations of persons. For easy reference I have indicated the summary theoretical and practica l definitions of these error sources in bold print. Definition of errorError is predicated on a notion of perfection; to a llocate error is to imply what is without error; to know error it is necessary to determine w hat is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-eve nts, the discourses and silences, the world of surfaces and their interactions and interp retations; in short, the practices that permeate the field.All assessment statements about a person are statem ents about that person engaged in an event, or a potential event. They are descriptions or indicators or inferences about the person's performance in that event. As such they in volve at the very least an event in which the person being assessed is an element, and an event in which the assessor engages directly in the first event, or with a prod uct (element) of it. Error is the uncertainty dimension of the statement ; error is the band within which chaos reigns, in which anything can happen. Error compris es all of those eventful circumstances which make the assessment statement l ess than perfectly precise, the measure less than perfectly accurate, the rank orde r less than perfectly stable, the standard and its measurement less than absolute, an d the communication of its truth less than impeccable.I want to list some of those sources of error, some of the conditions that change the measurement of a standard from a thin red line into a broad blue band: In doing so I will reject the notion of construct validity as a unitar y concept, and dismember its dark side into disparate if sometimes overlapping categories. Sources of errorI have named these sources of error:
2 of 20 1. Temporal errors2. Contextual errors3. Construction errors4. Labelling errors5. Attachment errors6. Frame of reference errors7. Instrument errors8. Categorisation errors9. Comparability errors10. Prediction errors11. Logical type errors12. Value errors13. Consequential errors 1. Temporal errorsWe would hope our description of performance would have some substance; would be a stable quantity, invariant over time and space, rat her than some ephemeral numerical butterfly attaching itself momentarily to the perso n assessed. If the person's performance is described differently if done at another time, i n another place, with another group of people, then such difference as there is represents a source of error. Or is it? Should we rather discount stability as be ing counterproductive in an educational situation? If stability is seen as the very antithe sis of the educational enterprise, which we could define as being dedicated to change, then we would not wish any description to remain stable, as this would represent a nullificat ion of the educational process. Contrarily, if we wish to maintain stability as a c riteria for assessment accuracy, we must be certain that all learning pertaining to the perf ormance ceases at the time of assessment. And that none occurs during the assessm ent process. As well as all forgetting for that matter. Otherwise the error of the description increases rapidly, as the permanency of the description becomes increasingly dismembered by the ravages of time.Regardless of which side of the fence we want to si t, or whether we want to sit on the fence, pretend it isn't there, and attribute the co ncomitant pain to other variables,
3 of 20stability must logically remain as a pertinent, or in conventional circles an impertinent criteria, to be considered in any estimate of error in assessment. My conclusion is that the logic of its contradictions makes most of the a cademic and psychometric definitions of reliability trivial.So temporal errors have their genesis in changes th at occur over time; persons change over time; tests change over time; the "same" event has different meanings over time. People are not computers, they react differently at different times; and they forget. So temporal errors increase over time. (Not to mention that different people make different meanings out of the same event; which makes it, of course, a different event.) Temporal errors thus include all those confusions t hat constitute the dark side of stability, one aspect of reliability.Practically, temporal errors are indicated by the d ifferences in assessment description when the assessment occurs at different times 2. Contextual errorsContextual errors constitute the underside of claim s to generality and generalisability. Any performance is relatively specific and defined: It is a single instance of possible instances; it is an event chosen from a multitude o f possible events; it is a particular designed to illustrate a generality. Yet the perfor mance will invariably be described (labelled) in terms of the generality it aspires to rather than the specifics that define it. This is true of almost any evaluation, any test tha t goes beyond the description of a single behavioural objective, and even that, one st ep back, will often be found to be illustrative of a class of objectives, rather than of particular significance in its own right. In the old days (good or bad depending on our value s), this would constitute an example of "transfer of training." The claim was that if yo u could think clearly in Latin, then this should transfer to dealing adequately with the comp lexities of life in the social world; or if you could think logically in mathematics, then y ou could do so in international affairs; not to mention playing Rugby being a necessary prer equisite to running an Empire. When empirical data showed that such transfer was t enuous, the notion was kept, but the name changed. Taxonomic terms such as application a nd analysis, or the more up-market process called problem solving, have latt erly laid claim to this temporarily non-habitable area. As well, the notion of a "skill has latterly become fashionable, and generalisable social, cognitive, emotional, spiritu al, and psychomotor skills proliferate, securely untrammelled by prophylactic empirical dat a of any kind. As soon as assessment descriptions are committed to paper, their material permanency is dramatically increased. Likewise, the span of their associations is spread and emphasised. No longer just a description of a parti cular performance, the assessment becomes interpreted as a measure of knowledge and a bility, an indicator of achievement on a course of study, and a predictor of future suc cess or failure. One source of error then is the magic transformatio n that occurs between numbers and categorisations, between specific acts and generali sed descriptions. Unless the assessment statement purports to be no more than a statement about a particular
4 of 20assessment event, then the differences between this statement, and those obtained from all other possible contexts, is error; these are th e generality differences attributable to other equally relevant contexts, eg written, oral, cooperative, on-the-job; all those boundaries that possibly could contain the assessme nt event that are different to the boundaries of the particular assessment event. Cont ext also includes those power relations that pervade it and the judgment processe s embedded in it that affect the performance of the person assessed, and the judgmen t of the person assessing; and this includes those that the boundary localises, as well as those that invade its permeable surface.Contextual errors contain all the ambiguities inher ent in those relations and elements and discourses that impinge on the event, but get e xcluded from the label. Practically, contextual errors include all those di fferences in performance and its assessment that occur when the context of the asses sment event changes. 3. Construction errorsThe performance that is described in an assessment is generally built up of a number of parts; a science test is built up from a number of questions; an electrical automotive practical test requires the identification and repa ir of a selection of common electrical faults; a social skills assessment requires grading s on a number of interactional criteria, or more likely a game constructed about such criter ia in multiple choice form. Such constructions are designed to represent the course of study, or the skill requirements, or the criterion referenced framework, that the assess ment is supposed to describe. Further back still, the course has itself been constructed to improve performance in some areas of living, in some role as citizen, home maker, aca demic, engineer, baker, or whatever. Somewhere, sometime, someone must make a choice abo ut how far back along the chain of constructions we go in order to estimate the err or, the difference between the "perfect" description of performance and the actual one that our assessment produces. Let's take the electrical automotive test as an exa mple. We could begin with a requirement to describe how well a student could id entify and repair any electrical fault on any car brought into any garage (A). From this w e construct a thirty hour course of study called Automotive Electrical Mechanics 2M, co mplete with course aims and objectives and assessment criteria (B). From this w e construct a one hour pencil and paper test (C) and a two hour practical assessment (D). Now how are we to describe the construction error i n assessing a particular person? Is it the difference between the descriptions given in C and D? Or the difference between the matches of B and C on the one hand, and B and D on the other? Or should we look at the matching between C and D and A? Or is it all of the se?nrrnnrnn
5 of 20nnnnn !"#$nnrnnrn$rn$rnrrnn!nrrrn%nnn%&"rn"'n!rnn(nnrnn)r$n*+nrn$n$rrr, -n. rnnnnnrnnrnnnrrrrrrnnrnrrnrrrrnrnnrnrnnrrnrnrnrrnrrnnnnrnrrrnrnrrnnrrrrnnrnnnnrnrnnrnrrnrnrnrnrnrnrnrnrrnrnrnnrnrnrnrnrrrrnrnnrnrrrrrrnnnnnnr r
6 of 20 rrrrnrrnrnn!nn"r"rrnnn#rnrn#rnnrrrnrrrnrrrnr$rnnrrnnn!nnnnrrnnrnnn%nrrrrnrrrrnnrn&nnnnrnnrrrrrr!rnnrrrrnrnnn'nnnrnrnnrrrrnnrnnr$nnrrrrnrnrrrrr(nnn!nnnnnnnn!nrnrrrnrnnnn)*+r,-../'nnnrnrnnnnrrrrrnnnnrr nrrrrnrnnrn%rn'nrnnnnnnnrnrnnn0nrnnrnnnrrrrn)rr/nrnrrrrnn)r/nrrrrrrnnnnrnnnnrnrrnrrrrnn !r!rrrnn1nrnnrnrnrrnn2'rnrnrrrr2%rrnrrrr2'rrnr3n3rnrnnrnnnnnnnrnrrrrnrrrr24nnrnnnrnnnrrnrnnnnnrnnr!nnrnnnrnrnnnnrnnrrnnnnrn5r#rrnnrnnrrnrn
7 of 20 nrnrrnrnnnrnnrn#rnnnnnnrnrnnnrrnn5nnrrrrrrrrrrrrnrrnnnnrnrnrnrnnrrrrrnnrnrrrrnrn "#!rr$nrrrrnnnnnr%rrrnnr!nnnrrrrrnrrrnrnnnrrnnnr#nrnrnrn),./6nrnrnrnrnnrrnrrrnrrnrrrrnrrrnnrnnrnrnrrnrnrrrrnrrnrnrnnrrrrnrrrrnnnnrrnrnrnrrrrnnr n$!r nrn3rrnnrnnnnnrnnrrrnnrn#rrnnrnrn3rnrnnrrn&n nrrnrrnnr$nrnrrrrrrrrnrnnnrnnrnrnnrnnrrnnnnr)-/$nrrrnrnrrnrnnnnnrnnnrrr%nrnrnnnrnnrnrnrrnrnrnrnrn
8 of 20r&rn&nnnnnnnnnnnrnrrnrnr!nnrr%nnrnrrnnrnrnnrnr#rnnnrnr %rrr nrnnrrnnrnnrn5n33rnrnrnrnnrnnrnnnrnrrrrnnnrrnnn5nrrnrnrnr),,/rnrrnrnrrrrnrnnrrnrnrrrrrn &r!rrnrrrrnrrrnnrnnrrrrnrrn3rrrrnrnnnn$rnrnnrrnr3rnrnnnrrrnrnrnrnnnnnnrrrrrnrrnnrnnrnnr!nrrn 'rr%nrrrrrnnnrrnnn$rnnnrrnnrnnn&nnnnrrrnrnnrnr%rnnrrr0nnrnrn
9 of 20nrrr!rnr7rrnrrrrnrnrnnnrn8rrrrrr9nrnr5rrr!rrrn3nnrrrrnrnrrrn:nrnnrrnrn!nrrrrn#rnrrrrnnrrrrnr5rrnnnrrrrn8nrrrrrrnrrnrnrrrn#nnnrnnnnn nnrrnnrnrrrrnnrrrrnrr"nnnrr"rnnrrrrnn nnrrnnrrnn&nrnnnnnnnrn36nrnrnnrnnnnrrr;-rrrrr9nrrnrnnrrrrnrnn%rrnrrnnnnrrrnn9nrrrnnrrrnnrnnn3rnnnn$rnrrnnrrrnnrrnnnrnrrnnnnrnnrrnrrrrnnn
10 of 20!rrrnnrnrnnrnnnn6r-;nnrnrrnnnnnnrrrrn nrrnnrr=;nnrrrnnnnnrrnrnnnrrrrrr0rnrnrnnrnrrrnrnrnrnrrrrnnrrrrrnnnnrrnrr ()r rrnnnrnnrn3rnrnnnnnnrr rrnrnr#r3rnrnrnrrrnrnnrrrrnnnr nnnrnrnnn#rnn3n!nrnrrrrnnnnrnnrrrrrnnnr!nrrrrnn)r/nnnnrnnnnrrrnrnrrnrnnnrnnrnrnrrrrrrrnnnnrnr3nrnrnnnnnrn)n/rn *r+r7rr),-=-/nn),-==/rrnnnnnrrrrnrn%rnrnnnrrrrnrrnrrr!rrnnrnnnn6rnrrrnnnnrrrnrrnnnrrnnrrrr2 rrrrrrr
11 of 20rnnrr2nnrrnrnrrrnnrrnrnnnnr5rn8rrnrnrn rrrrnrnrrrrrrr0rnnnnrrnnnnnrnrrrnrrnrrrnn!rrrrnnrrnnn>rnrrrrnn)!nrn3nr&rrrnnnrn/rrnr3nrnnrnnrnrnrnrrrrrnrr nnrnnnrnnnnnnr3nrrnrnr)r?@/ $rr,-7rr>r),-=-/nAn /nr-n rn nn)/rrrnrnnrnrrrrrn%nrrn7rr>rrrnnnnnrrnnn&rrr3rnnn.!rr8rrr7rrrrr!rrnrrn5nnnnrnnnnnnrrn %nn+rrnrnrnrr&rnnrnnrrrrnnnnnnnrnnrrn%
12 of 20rnnnrnrrrnnn)r/rrrnrnrnr)nn/nrrnnnn)nr/nrrrrrnnr)rnn/nrnr)B/ nrnrrnrnnnr nrrnnrnrC8rrn7rrrrnnrrrr n+rr!rrnnrrnrnrnrrrr#nnrnr)B/ 6rnnnrrnnrrrrnrn32r/rnrnn7rr8rn !rrnrnrnrrnrrrrnrrnnnnrrrrrnnrrnrrnnnnrn!rnnrrnnnnnrrrrrrnrn!rnrrrrrnnrnrrnnnrnnnnnrrrnnn3rn),?&,/ !rnrnrnn7rrrnnrnrrnnnnnnn7nrnnrrrrrn#nnnrrnrr&&nrnnrnnnr)%nrnnrnrrnrr/
13 of 208rnrrnnrrnrnnn rrnrrrnnnrnnnrrnnnnrrnrnnrrrnnnnrrnrnnrnnnr)=/ %nrnrnnnn3nnnnrnnnrnrnnrrrrnrnnrrnnnr)rnnn/nr nrnnrrrnrnrnnrnn7rrnr"nnr&nnrrrrrnnrnnrn").D/8rrrr"rnnrrnrrnnnnrrrrrnrrn").-/6"nnrnrnr")?;/ nn7rr>rnn nnrrnrrnrrrrrrrnrrrnnnnnnr)?;/ nnrrnrr3rnrrrn5rnnnnnrrnrrnrr7rrrn"nnrrrr&nnrrnnr5rrr"),B/ rrrnnrnrn"nrrnrnrrnnrnnnrnrr)-/1nrr7rrrr
14 of 20 %nrnnnrnrnrrnnrrrrnnnnnn+nnrnrr)rn/rnnnrnrnr rnnrr&nnrrrnnnrnrnrrrrrnr3nrnrrnn)D;/ nrrnnrnnr %nnrnrrrn3rrnrrnnrrrnnrnnrnnnnnrrrnnrnrrnrnrrnnrrnrrrnrrnr3nrnnr3nrrrnnrnrrnnnrrnrrrrrrnnnrnnnrrnnrnrrnnnnr%nr>rnrnnnnrnn)?&/ nnrnn2%nrnrrnn8nrrrrrnnrrnrnnr#-nnrnr5nrnrrrnnnnnrnr$rrrrrnnn2%r7rr>rnrnrnnnnnnrn !r!rrrrnrnnnnnrnrnnnrrrrnnnnrrn8nrn7rr>rnn8r !rrnrrrrnnnrnr
15 of 20),/nnr)/rnrnrnrnrrrn5rnnrr),/ %nr3rrnrnnnnnrr#nrnnrrnnrn#!rr7rrrnnnnrn%nrrrn8rrrnnnn&nn""rnnrrnnrnrrr nnrnnnnnrn8nnr %nnrrnnr&rrn&nrnn&nrnnnrnr&&nnnnnnnnrrnrnnrrn5rnnnrr5r)D,/ rnrnnrrrrnrnrrnrnrnnnnnrrrrrnrr$!r%nrnrrrn>nnnrrrr%nnrnnrnnn
16 of 20rrrn)B=&=;/!rrnnnrnrrrnnnnnr#nrnrnnrnnrrrrnrr!rrnrnrnrr3rnr1nrnrrnnrn"r""rr" rnrnnnrn7rr>rr26rnrnnnrrrrnrnrrrnrrnr!nnrrrrrrrrrnnrrnnrrnnnnnrr%nrrrr%rnnrn%rrr%nr2$rnrrrnn7rr>rnnrnrrnrnrr21rrnrnrrnnrnr%n7rr>rrnrnr5nr%nnrrrnnnrrnrnnn$rnrn5rnnrrrrnnnnrrrnnrrnrnrnr!rEr7rrrn"rnrrrnnrnr")??/rnnnrrnnrrrrr&nnrr8rn"nrrrrrnrnr")B;/8rrrnrr"n"8nrrn"rrnrnrnnnnrrnrnrrrnnrrrnnnn")B?/8rnr 1nnnrnnrrnrrrrrnnnrnrnr5nrnnrrnnnrr5rrnrrnrnrrrrnrnnnrnrrr)B?&/
17 of 20r"r"/"n
18 of 20r&nrrnnnnrnnnrrrrrrnnrnnnn")=/r&nnrrnrrnrnn&rr"r"27rrnrnn nnrrnrnrrrnrrrnnrnrrnrnrnnr!rnrnrrnrrnnrr&rnnrnrn5rn)D./ 87rrrnrnnnrnrrnrnnnnnnnrrrr+r7rrrnrnnnr3nrrr)=&D./(rnr"nrrrrnrrnnnrnrnnn"),=/8r"rnr3nrnnnnnrnr"),-/8nrrn !nr3nrrrnnrrnrnrnnrnnnrnr nrrrnnrnnrrnnnr)F;/ !rnrrrnnrr!nrnn3rnr7rrrnrrnrrrnr&nrnn !rnnn+nnrnnrnrrrnrrnnnrnrnnnnrnnrn(nnn+nnnrn5nrrnnrrnnrnnnnn3nnnrnr)=;/
19 of 20"nrnnnrnrrnrn")=,/!rrnrnnrnrnrnr3nnrnnrrrnrrn,-0rr rn7rr>rnArrn5nnrrnrnnrnnrnrr:nnnrrnrrn""rrnnrnrn8rnr"!rr"nr7rr"rrnrrrrnrnnnrrnrnr").?/Er7rrrnnnn"n" nrrrnrnnrnnrr(nrrrrnrn&nrnnrnrrrr&nrnrnrnrrnrn1!!!rrnnrrnrnnnr5rrrnrrrnr5nrr!nrrrrnnnrnrrnn3rrrrrrnnrrrnrnrrnnrrrrnrrrrnrnrnrrrn)rn/rnnnnnnnrnnnnrnrr#rnnrrrrnnnnnrrrrrnrrrr>rnnnnrnrnnnnnnnnn5nr!rnrrrrrrrnnr#nnnnnnnrnr#nnnnrnrnrnn#n5nrnnnrnr5nrn
20 of 20rnr rr!rrrrnrrrr),-==/rnnrnrrnnrrrrnrr6nrrrnrrnrrnnnnnnrrrrn rnnrnnrrrnnnnnrnnn!rrrG>rnnnnr6rnnnrnrnrn rrrrrrrnnrrnnnnrnnrnrnr nrnrrrrnrnr rnnrrnrnrrnrrnnrnnrrrnnrr3nnrnnnrrnrnrrrrnrrnnnnn6nrnnrrrrrnrrnrnr%rrrnnrnrrnrnrn%nrrrnnnnrrrnrnnnnn rrnrrrnnnrnnnrr5r
1 of 14nrnr rrrr !r!rrr!!r "#!!$%r &rrr "r rrr nnrr!r'(!rrr) !&rrr "!r*rrr&rrr "rrr!rrrr&!#rrrrrrrr "(*!#rrrrrr +rrn!!r'!r
2 of 14rr!#r!#!!,*,r!)r! ""-r)rr#rr(rr!#r "rrrr .&r(r!r/"01%2 ",r!r)r,/%2 +(&rr( rr! "!rrr 3rr.r!#-"/r2,!(,,r (r!r ,.rr,r,/42 5,5 ,! 67/%2,"rrrrrrr,/12 &"0!!&!,&,r,,,,&r !r!r!# 58#rr "r!#r! rr/n2 3 "r!#!rr##!&r!rrrr, ,!r
3 of 14r 0& ,.&(rrrr,/n2 "!r&'!& "&!##! 9##!r)#rrr#!r!# r!&/2 !&r#r/)2r +!!:r,#!##!#r(rr,/2 ,r*rr,/2 +r+r!#*r/2 ;r)rr#(,8,/n2! "&'rr+6"/n2.-/2r-/126rr8-/12rr"0 !:!)r:rrr:;r:r!!!r!!!#:12#rr !r!r/=%2
4 of 14+r>#/12 r!! "r!!!r/=2 "rr'0/$2& 5
5 of 14&rr ;rr(r -&:@r!!rr >!r'rr!r!
6 of 14+r#& "r&,,,r,!& -r r r rr"r+r!#rr!(( "rr!r! 0!/$2& #r8r##!#r&!#/=%2 0rr#!#&!rrr +rr) 6/2rr r! r/$$%2 # !nnr"rr!"r'r !r!r+r!&rr!rrrrr!!#r -!!#r,,,!1)r,!rrrr
7 of 14!!r&r!r r!r! &!rrr'!!!r!rrr +rr,(,!rr 9rr,!r,!,,!!!r!('!r "r!!r&'rr r3rrr8,(,!#r!#rrrr)r!# rrrr#rr 3rrr(rrrr!rr rrrrr!r!r#rr rrrrrrrrrr rr&rr!r 3rr&rr!,r,8!r!r&!# !r& "rr
8 of 14,r,,r,,r,r /2rr "(rrr&rr!&r !#rrrr/$$2 ?& 0 &!!r#r rr#/$$12 !r+rBr 6#/n%2!rr!#!r!r!&r +rrr !rrrrrr!r ;rrrr +!!!r#(Br!(+r ))&rrr!!!rr +r!(rr-r)r +,*,rr!* 5#rr>8r +!r "rrrrrrr! ;rrr!#
9 of 14r!# "!rrrrr#! rrrrr!r!rrr! 3:!r#!rr!r),#,!!r!( 3!:# nr-C0#/12,"rr,/n2 >rr! "rrrr!rrr! Dr r(rrrr,,rr r!rr 3r!+rrrrBr)'r "rrr*rrr! rrrrr "r!rrrr
10 of 14 r!rr((r +!rr:/2rr rrr!rr)rrr&r!r/$$42 rr#rr !rr,,!:r!* 5/4n2rrrA+ (r !#!rr&r 3!!rr "!!r/$%2 rnrMelton (1994) accurately describes the sort of proc esses that are actually involved in competency assessment:rrr#!r# -#r#*rr "rrr!###rr(r r*r/1nn2 *!!!rr*rr(rr*r!r#r
11 of 14rr0r#r!#!r'rr!!rrrrr!#r/B=2 ;r!rrrr#!!rr (#r!r!!rrr !rnrrrr#!
12 of 14$n rrrr'r!r(r& +(#r*r3r(:"(r !rr #/$2rrrr rr)rrrrrr#! "r!rr "!rr/=42 ?!r:3:?!!r)##r'!!##rrr D!!r!&'r!'!#'(!r
13 of 143rr&:?(r!!&*r D&r! 9rr +&!&r'!!#r rrrrr'r r% nr9!r rrA+0#/12rr rr)r!*rr/%2 >#/$2!r rrr/=2 +(rrr r!rrr#r "rrr!r!rr +&r "r!rr! "!r rrrr* rr "r!r)r "*rr
14 of 14rr "#!r r(rrrr!rrr ."0/12,rr,/%2r!r!rrrr'!rr(!#! .rr "r!# "!#rr!#)r!r /!2r8!# rrr r!r#rr
1 of 13nrrrrIn this chapter I apply the reconceptualised notion of invalidity to national literacy testing, and to the definitions of grades within my own university. These are presented as specific examples of the pot ency of the invalidity conceptualisation. National Literacy Testing ContextIn its edition of 15-16 March, 1997, the newspaper Weekend Australian announced on the front page under the heading "All pupils face t ests of literacy" that:nrnrrrrnrrnrrrnnrnrn rrnrrrrnrr!n"n#rnr$%rnrn&nnr'&nnrnrrrr nrrr()#rn$rnn*rrrnrrnrrnrnnnr(nrnrrnrnrr+rnrnnr,rnrrr-rrnn+rnnrr
2 of 13nnnnnrnnnrnr.nnrnnrnrrnnrnnnn/nnnrnnrrnrr0r.rrrrnr1'rrnr1'rrrnr2nnr3rrnn rnnnr#n0 nnnrrrnrnrrnrnnnrrrrrrnrn.r nrrnrnrnr rrnnrnnnnrrnrrnnr'rn.$nrnnrn/rrrn,n-n,-r/nn+nnn1)nrnn1)nnn14rnn5nnn1'rnnr nnnrnn1n*rnrrrrnrnrn'r
3 of 13nnrnnrr3nrrn+rnn nnnnnrnrrn'nnrnnrnnr3nnnnnnr6rrnrrnnnrn'nnrnrnrr&&&&rnnrnnrnrnr*&)r nnn1&n'nnrnnnnrrrnnrr*nnnr rrnrrrnrrnrnrr.nr'r*0nnrnnnnnr 7nnnrnrnnrrrnnnrrrrnrnnnr.nrnnn rnrrrnnrnn nrrnnrnrr'rrrrrnr(nnnn6nnnnnnrnnnn'rrrrrnn nrr,+89:-'rrn
4 of 13nnr rn rrn, rnrnrrnr*n,;89<:-)nnrn rnrnrrrn'nrr*n6r!.nnn3nnrrrrnrr6rrrnnn=nn,nnrrrnrr>rnnrrrnrnr&&nrrnnrrrrnnn'r,-nnrrr*0nnrrnnrnn*nrrrnnrrrnrrrnnrnrrrr6rnnnrrr7nnnrnrnnnnrnrnrrrnnrnnr?nnnrrnnnnnrrrnr rrnnrnnnnnrrnnrnnrr
5 of 13nrnrrrnr7nnnrnrnnrnnrnrrnrnrnn/n0rnn'nrnnrnnnnr.nn,rr-nnnrnrn0rn&rr&+nnnrn 0rnnr"rr7nnrnrnrrnrnr,r-rrrnrnrnnnnrrnrnnnrnrnr. nrrnnr/nnnnnrrrn7nnnrnnrnnnrrnrnnrnn.nnrr&'nn&,()899-nnnnrn'nrnnnnrnrnr7nnnrrrnrnrrnrnnrnrrr#"rnr nn
6 of 13nnrn*nrnnrnn,rn-r(rrrrrnrrrnrnrnnn,-nr0#nrrrrnrn'rn nrrnrrrrrnnrnrn(rnrrn'rnnrrn,n-$r*rnrrrnrrnrrn'rnnnrrrn*nnnnn rnrrrnnn rnrnn*nrnn,rr-rrnrn)nn*nnrnr&&&r&rnnnrrnnr 7nnrn,-5nrrrnnn3nrnrr5nnrnrnrrrnrnn6rnnnr3rrnnr3rrnrnrn
7 of 13 36rr6rrnr)nnrnrnn rrnrnrr@@r3 rnrrrnrrrnn(nn,-nrrr!r.rrnr n %rrArn!nrrnr,?rB'899<-'rrnrnnr'rn8,-.n.rnr.rnnrnrrnn$nnrnrnrr.rrrn nnrrnnrrnnr rrr'rnnrr& &rnnnrn'rrr1 &rr nnr r r nrrrnr r rnn rr rr
8 of 13 r nnr r rr rrrr rr rr rrrr r!rr rrr r "rr # rr rr ?nnr6n*rn rnnrnrrrnrnrrr.rr3rr*nrrrnrrnrn+nrrrn.r nrnnrrrnnrnr3rrnn.rn>rnrn rnnrnrnnrrnnnr5rnrrn,r-)nnnnrnrr3rnrrnrrnnrrrrrrrrrnnn
9 of 13n'rr+nr6n1'rr'.rnrrrnnn?nnnnrrrrnrn r'.rnrnrrr'nrnrrnr'nrnnnnrr rn#rnrnnrnnnnrrrn.nrrrrnnnrnr#rnrrnnrn nrrnnn)nnrnrrn4rr*nrr)rnnrrnrr0r14rrrrrn1'rrrrnnnrn rn14r nnrr r1=nnrnn0r*rr77nnrnnnr&&nr;nrrrrrn nrr n rrrrrrrr )rnn>r ;nrrnrrr)nnr
10 of 13.Ar5rnrArrn rrn rnrrrn #nrnnrnnr=rrr&r&r8nrrr rrnnrrrr.nnn nnrrr3nnrrrnrr rnnnrr*&n&Ar5nnnnr!)r0nn rnnn&6n&n 'rnn6nn rrn*rnrnnnnrr6rr&*&.nnnnrrrnn0n r rnrnrrnr00rr6nrnr6nrrnn6n6rnrrnn6rrr)nnnrrrrrrnrnrnnrr5?&n &nnnrrrrrr&r &=&&*r r8nn+r&&
11 of 13&rr&n *rnr&nrrr& *rrnrnnr'rr&r& rrnn*r8r&r& rrnn*rnrrr&rnr& rrnn*rrn&&*rrn1rr2&> rrrrrrC&rr4rnr1rnnrnnnnrnr'rnn14r&rnnrn &n/+nrn4>n!!n8'nn1)nr&rnn&n1'rnnn 6n5nrn1.rr5nrrrrnn 6nnr&rr&6rr rr1"rr)rnrrnnnrrrrrnrnrrrn 'rnrrnrr* nr7nnrnn nrnnrrn nrnn*rnrrrnn*nrn*n'rnr8rn6nnnrrrrnDnrrr*nnnrnrnrrn6nrnnr>8rnnnr'*rrrr1=1=r1rrr
12 of 13Ernrnrnrrrr14r 1.rrnrr.rrn*nrnr1'nrnnrr#)n1'&r&&r&nrnn rrnnrrrnnnnrnnnnnrrrr nr*rrnnrnnnn nnn nnr rrrnrrn'nnnrrnn.rn nnnnrrrnn rr.1nnrnnrnn r1rnnnr*rrnrnn rn r1rnr6rn$r4*n rrrnn6rrrnn,rr-nrnn1nnnrnnnn1rnnnnn 1nrnrnnr6nr1nnnrn0rrrrr>r*rnnnrrrrrrnnr
13 of 13$nrrnnr1.r4rnnr'nrn6rnr&rr&nrnrn.r1.nn r6'r#4 #rn =89n689:F899F rrnnnnn rr =nrnrrrnn?r rrnn=5,899<00&rnrn&rrnn&rr&'rnr&'nnrn5rnnnrnnrr&,9<-+nArnnnrnrr0nrn r.n5&C&
1 of 2nrnnnr rrrrrrr !"!"r#r"r$%&r"r&'"%(#!#"r#%&)#r"r%()##r%*r!rr#%r#r"%r%!r%!"r+,!r###%(!#!#r#r++"%!r##rrr#r#rr#--)%&rrrrrr%#r#r%!r"rrr#%"rr#
2 of 2rrrr%(rrrr##r"r%#$rr.r#/#012.r))r$rr,)r",$rr%"rrr%.r#r#rrr3#"r(#r%(!#!%4r%r%3)r#r55r+%3")r)#r%r#r)%!%3rr#r#rrr#rrrr%
1 of 13References(APA) American Psychological Association, American Educational Research Association, & National Council on Measurement in E ducation. (1985). Standards for educational and psychological testing Washington: American Psychological Association.Apple, M. (1982). Education and power Boston: Routledge and Kegan Paul. Arendt, H. (1969). On violence London: Penguin. Australian National Training Authority. (1994). Towards a skilled Australia: A national strategy for vocational education and training : Australian National Training Authority. Ayers, W. (1993) To teach: The journey of a teacher New York: Teachers College Press.Ball, S. (1994). Education reform Buckingham: Open University Press. Barone, T. (1992). Beyond theory and method: A case of critical storytelling. Theory into Practice, 31 (2), 143-146. Barton, L., Whitty, G., Miles, S., & Forlong, J. (1 994). Teacher education and teacher professionalism in England: some emerging issues British Journal of Sociology in education, 15(4), 529-543. Bateson, G. (1972). Steps to an ecology of mind New York: Ballantine Books. Bateson, G. (1979). Mind and nature London: Wildwood House. Becker, H. (1990). Generalising from case studies. In E. Eisner & A. Peshkin (Eds.), Qualitative enquiry in education: the continuing de bate New York: Teachers College, Columbia University.Beevers, B. (1993). Competency-based training in TA FE: Rhetoric and reality. In C. Collins (Ed.), Competencies: the competencies debate in Australian education and training Canberra: Australian College of Education. Behar, I. (1983). Achievement Testing Beverly Hills: Sage Publications. Beittel, K. (1984). Great swamp fires I have known: Competence and the hermeneutics of qualitative experiencing. In E. Short (Ed.), Competence: Inquiries into its meaning and acquisition in educational settings (pp. 105-122). Lanham, MD: University Press of America.Benett, Y. (1993). The validity and reliability of assessments and self-assessments of work-based learning. Assessment and Evaluation in Higher Education, 18(2 ) 83-93. Berk, R. (1986). A consumers guide to setting perfo rmance standards on criterion reference tests. Review of Educational Research, 56(1) 137-172.
2 of 13Biesta, G. (1994). Education as practical intersubj ectivity: towards a critical-pragmatic understanding of education. Educational Theory, 44 (3) 299-317. Bloom,B. (Ed.). (1956). Taxonomy of educational objectives; Handbook 1, cog nitive domain New York: David Mc Kay. Bloom, B. (1976). Human characteristics and school learning New York: McGraw-Hill.Bloom, B., Hastings, J., & Madaus, G. (1964). Handbook on formative and summative evaluation of student learning New York: McGraw Hill. Borthwick, A. (1993). Key competencies Uncovering the bridge between the general and vocational. In C. Collins (Ed.), Competencies: the competencies debate in Australian education and training Canberra: Australian College of Education. Bourdieu, P., & Passeron, J. (1977). Reproduction in education, society and culture London: SAGE Publications.Bowden, J., & Masters, G. (1993). Implications for higher education of a competency-based approach to education and training Canberra: Australian Government Publishing Service.Bracht, G., & Glass, G. (1968). The external validi ty of experiments. American Educational Research Journal, 5(4) 437-474. Broadfoot, P. (Ed.). (1984). Selection, certification and control London: The Falmer Press.Brown, R. (1973). Religion and violence Philadelphia: The Westminster Press. Bruner, J. (1986). Actual minds, possible worlds Cambridge: Harvard University Press. Bucke, R. (1969). Cosmic consciousness New York: E.P. Dutton Co. Burchell, G., Gordon, C., & Miller, P. (Eds.). (199 1). The Foucault effect London: Harvester Wheatsheaf.Burgess, R. (Ed.). (1985). Issues in educational research: Qualitative methods London: The Falmer Press.Burton, N. (1978). Societal standards. Journal of Educational Measurement, 15(4), 263-273.Cairns, L. (1992). Competency-based education: Nost radamus's nostrum. The Journal of Teaching Practice, 12(1) 1-32. Camera, H. (1971). Spiral of violence London: Sheed and Ward. Campbell, J. (1956). Hero with a thousand faces New York: Meridian Books. Carr, W., & Kemmis, S. (1983). Becoming critical Geelong: Deakin University Press.
3 of 13Cherryholmes, C. (1988). Power and criticism New York: Teachers College Press. Cherryholmes, C. H. (1988). Construct validity and the discourses of research. American Journal of Education, 96(3) 421-457. Clough, E. E., Davis, P., & Sumner, R. (1984). Assessing pupils: a study of policy and practice Windsor: NFER-Nelson. Codd, J. (1985) Curriculum discourse: text and context. Paper presented at the National Conference of the Australian Curriculum Studies Ass ociation, La Trobe University, Melbourne.Codd, J. (1988). The construction and deconstructio n of educational policy documents. Journal of Education Policy, 3(3), 235-247. Collins, C. (Ed.). (1993). Competencies: the competencies debate in Australian education and training Canberra: Australian College of Education. Collins, R. (1979). The credential society Orlando: Academic Press Inc. Cox, R., & 1965. (1965). Examinations and higher education: A survey of the literature London: Society for Research into Higher Education.Cresswell, M. (1995). Technical and educational imp lications of using public examinations for selection to higher education. In T. Kellaghan (Ed.), Admission to higher education Dublin: Educational Research Centre. Cronbach, L. (1969, ). Validation of educational measures. Paper presented at the The 1969 invitational conference on testing problems: T owards a theory of achievement measurement.Cronbach, L. (1988). Five perspectives on validatio n argument. In H. Wainer & H. Braun (Eds.), Test validity (pp. 3-17). Hillsdale, NJ: Lawrence Erlbaum. Cronbach, L., Rajaratman, N., & Gleser, G. (1963). Theory of generalizability: a liberalization of reliability theory British Journal of Statistical Psychology, XVI(2) Cronbach, L. J. (1990) Essentials of psychological testing (Fifth ed.). New York: Harper and Row.Delandshere, G., & Petrosky, A. (1994). Capturing t eachers' knowledge: performance assessment. Educational Researcher, 23(5), 11-18. Docking, R. (1995, January 1995). Competency: What it means and how you know it has been achieved. NTB NetworkSpecial Conference Edition, 18 Donmeyer, R. (1990). Generalizability and the gener al case study. In E. Eisner & A. Peshkin (Eds.), Qualitative enquiry in education New York: Teachers College, Colombia University.Downs, C. (1995). Key competencies: A useful agent for change? Richmond: National Centre for Competency Based Assessment and Training
4 of 13Eisner, E. (1988). The primacy of experience and th e politics of method. Educational Researcher, 17(3), 15-20. Eisner, E. (1990). The meaning of alternative parad igms. In E. Guba (Ed.), The paradigm dialog Newbury Park: Sage Publications. Eisner, E. (1991b). Taking a second look: Education al connoisseurship revisited. In M. McLaughlin & D. Phillips (Eds ), Evaluation and education at quarter century (pp. 169-187). Chigago: The National Society for the Stu dy of Education. Eisner, E., & Peshkin, A. (Eds.). (1990). Qualitative enquiry in education Eisner, E. W. (1985). The educational imagination (second ed.). New York: Macmillan. Eisner, E. W. (1991). The enlightened eye New York: Macmillan. Fay, B. (1987). Critical social science New York: Cornel University Press. Feyerabend, P. (1988). Against method London: Verso. Finn, B. C. (1991). Young people's participation in post-compulsory edu cation and training Canberra: Australian Educational Council Review Committee. Fish, S. (1980). Is there a text in the class? The authority of inte rpretive communities Cambridge, Ma.: Harvard University Press.Foucault, M. (1972). The archaeology of knowledge London: Tavistock Publications. Foucault, M. (1982a). Questions of method: an inter view with Michel Foucault. Ideology and Consciousness, 8(6) 3-14. Foucault, M. (1982b). The subject and the power. In H. Dreyfus & P. Rabinow (Eds.), Michel Foucault: Beyond structuralism and hermeneut ics Brighton: Harvester. Foucault, M. (1988). Politics, philosophy, culture: interviews and other writing New York: Routledge.Foucault, M. (1992). Discipline and punish London: Penguin. Frederiksen, J. R., & Collins, A. (1989). A systems approach to educational testing. Educational Researcher, 18(9) 27-32. Freud, S. (1963). Civilisation and its discontents London: The Hogarth Press. Friedenberg, E. (1969). Proceedings of the 1969 invitational conference on testing problems Princeton: Educational Testing Service. Garcia, G. E., & Pearson, P. D. (1994). Assessment and diversity, Review of research in education (Vol. 20, pp. 337-391). Garman, N. (1994). Qualitative enquiry: meaning and menace for educational researchers. In J. Smyth (Ed.), Qualitative approaches in educational research (pp.
5 of 133-14). Adelaide: Flinders University of South Austr alia. Garman, N., & Holland, P. (1995). the rhetoric of s chool reform reports: sacred, sceptical and cynical interpretations. In R. Ginsbe rg & D. Plank (Eds.), Commissions, reports, reforms and educational policy Westport: Praeger. Gillis, S., & Macpherson, C. (1995, ). Examination of the links between pre-employment qualifications and on the job competency based asse ssment. Paper presented at the Australian Association for Research in Education, 2 5th Annual Conference, Hobart. Glaser, R. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18, 519-521. Glass, G. (1978). Standards and criteria. Journal of Educational Measurement, 15(4) 237-261.Golstein, H. (1979). Changing educational standards : A fruitless search. Journal of the NAIEA, 11 (3), 18-19. Gonzalez, E. J., & Beaton, A. E. (1994). The determ ination of cut scores for standards. In A. C. Tuijnman & T. N. Postlethwaite (Eds.), Monitoring the standards of education Good, F., & M, C. (1988). Grade awarding judgements in differential examinations. British Educational Research Journal, 14(3) 263-281. Green, M. (1994). Epistemology and Educational Rese arch: the Influence of Recent approaches to Knowledge. Review of Research in Education, 20 423-464. Green, P. (1981). The pursuit of inequality Oxford: Martin Robertson. Guba, E. (1990). The paradigm dialog Newbury Park: SAGE Publications. Guilford, J. (1946). New standards for test evaluat ion. Educational and Psychological Measurement, 6 427-439. Hacking, I. (1991). How should we do the history of statistics? In G. Burchell, C. Gordon, & P. Miller (Eds.), The Foucault effect London: Harvester Wheatsheaf. Haertel E H. (1991). New forms of teacher assessmen t, Review of Research in Education (Vol. 17, pp. 3-29). Hambleton, R., & Swaminathan, H. (1985). Item response theory: Principles and applications Boston: Kluwer Nijhoff Publishing. Hambleton, R., & Zaal, J. (Eds.). (1991). Advances in educational and psychological testing Boston: Kluwer Academic Publishers. Hambleton, R. K. (1989). Principles and selected ap plications of item response theory. In R. L. Linn (Ed.), Educational measurement, Third edition New York: American Council on Education, Macmillan Publishing Company.Hartog, P., & Rhodes, E. (1936). The marks of examiners London: Macmillan and Co.
6 of 13Harvey, L., & Greed, D. (1993). Defining quality. Assessment and Evaluation in Higher Education, 18(1), 9-34. Horkheimer, M., & Adorno, T. (1972). Dialectic of enlightenment New York: Herder and Herder.House, E. (1991). Evaluation and social justice. In M. McLaughlin & D. Phillips (Eds.), Evaluation and education at quarter century (pp. 233-247). Chicago: University of Chicago Press.Howe, K. R. (1994). Standards, assessment, and equa lity of educational opportunity. Educational Researcher, 23(8), 27-33. Hulin, C., Drasgow, F., & Parsons, C. (1983). Item response theory: Application to psychological measurement Homewood, Illinios: Dow Jones-Irwin. Huxley, A. (1950). The perennial philosophy London: Chatto and Windus. Illich, I. (1971). Deschooling society : Calder and Boyers Ltd. Jackson, N. (1993). Competence: A game of smoke and mirrors? In C. Collins (Ed.), Competencies: The competencies debate in Australian education and training, Canberra: Australian College of Education.Jaeger, R., & Tittle, C. (Eds.). (1980). Minimum competency achievement testing Berkeley: McCutchen Publishing Corporation.Jaeger, R. M. (1989). Certification of student comp etence. In R. L. Linn (Ed.), Educational Measurement, Third edition New York: American Council on Education, Macmillan Publishing Company.Johnston, B., & Dowdy, S. (1988) Teaching and assessing in a negotiated curriculum Melbourne: Robert Anderson and Ass.Johnston, B., & Pope, A. (1988). Principles and practice of student assessment Adelaide: South Australian Education Department.Jones, L. (1971). The nature of measurement. In R. Thorndike (Ed.), Educational measurement: second edition, (pp. 335-355). Washington: American Council on Education.Kavan, R. (1985). Love and freedom London: Grafton Books. Keeney, B. (1983). Aesthetics of change New York: The Guilford Press. Kennedy, K., Marland, P., & Sturman, A. (1995). Implementing national curriculum statements and profiles: corporate federalism in re treat. Paper presented at the Annual Conference of the Australian Association for Resear ch in Education, Hobart, 26-30 November.Knight, B. (1992). Theoretical and practical approa ches to evaluating the reliability and dependability of national curriculum test outcomes, : Unpublished article.
7 of 13Korzybski, A. (1933). Science and sanity Lakeville, Con: International non-Aristotelian Pub. Co.Laing, R. (1967). The politics of experience Harmondsworth: Penguin. Lather, P. (1991). Getting Smart: Feminist research and pedagogy with/ in the postmodern New York: Routledge. Lazarus, M. (1981). Goodbye to excellence: A critical look at minimum c ompetency testing Boulder: Westview Press. LeCompte, M., Millroy, W., & Preissle, J. (1992). The handbook of qualitative research in education San Diego: Academic Press Inc. Levin, H. (1978). Educational performance standards : Image or substance Journal of Educational Measurement, 15(4) 309-319. Lincoln, Y. (1990). The making of a constructivist. In E. Guba (Ed.), The paradigm dialog Newbury Park: Sage Publications. Lincoln, Y. (1995). Emerging criteria for quality i n qualitative and interpretative research. Qualitative Inquiry, 1 (3275-289). Linn, R. L. (1994). Performance assessment: Policy promises and technical measurement standards. Educational Researcher, 23(9), 4-14. Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based assessment: expectations and validation criteria. Educational Researcher, 20(8), 15-21. Lord, F. (1980). Applications of item response theory to practical t esting problems Hillsdale, New Jersey: Lawrence Erlbaum Associates.Lorge, I. (1951). The fundamental nature of measure ment. In E. Lindquist (Ed.), Educational measurement, (pp. 533-559). Washington: American Council on Edu cation. Madaus, G. F. (1986). Measurement specialists: Test ing the faith A reply to Mehrens. Educational Measurement: Issues and Practice, 5(4), 11-14. Mager, R. (1962). Preparing instructional objectives. Palo Alto, CA: Feardon Publishers.Marshall, C. (1990). Goodness criteria. In E. Guba (Ed.), The paradigm dialog Newbury Park: SAGE Publications.Masson, J. (1991). Final analysis London: Harper Collins. Masters, G. (1994, 17 March). Setting and measuring performance standards for stu dent achievement. Paper presented at the Public Investment in School Education: Costs and Outcomes, Canberra.Maturana, H., & Guiloff, G. (1980). The quest for t he intelligence of intelligence. Journal of Social Biological Structures, 3.
8 of 13Mayer, C. C. (1992). Putting general education to work: the key competen cies report Melbourne: Australian Educational Council and Minis ters of Vocational Education, Employment and Training.McDonald, R. (1994, October, 1994). Led astray by c ompetence? Paper presented at the Australian National Training Authority, Brisbane.McGovern, K. (1992). National competency standards the role of the National Office of Overseas Skills Recognition. The Journal of Teaching Practice, 12(1) 33-46. Meadmore, D. (1993). The production of individualit y through examination. British Journal of Sociology in Education, 14(1), 59-73. Meadmore, D. (1995). Linking goals of governmentali ty with policies of assessment. Assessment in Education, 2(1), 9-22. Melton, R. (1994). Competencies in perspective. Educational Research, 36(3), 285-294. Messick, S. (1989a). Validity. In R. L. Linn (Ed.), Educational Measurement, Third edition New York: American Council on Education, Macmil lan Publishing Company. Messick, S. (1989b). Meaning and values in test val idation. Educational Researcher, 18(2), 5-11. Messick, S. (1994). The interplay of evidence and c onsequences in the validation of performance assessments. Educational Researcher, 23(2), 13-23. Miller, A. (1983). For your own good New York: Farrar, Straus, Giroux. Miller, A. (1984). Thou shalt not be aware London: Pluto Press. Miller, C., & Parlett, M. (1974). Up to the mark: a study of the examination game London: Society for Research into Higher Education.Millman, J. (1989). The specification and developme nt of tests of achievement and ability. In R. L. Linn (Ed.), Educational Measurement, Third edition New York: American Council on Education, Macmillan Publishing company. Mishler, E. (1986). Research interviewing Cambridge: Harvard University Press. Mitroff, I., & Sagasti, F. (1973). Epistemology as general systems theory: An approach to the design of complex decision-making experiment s. Philosophy of the social sciences(3) 117-134. Moss, P. A. (1992). Shifting concepts of validity i n educational measurement: Implications for performance assessment. Review of Educational Research, 62(3) 229-258.Moss, P. A. (1994). Can there be validity without r eliability? Educational Researcher, 23(2), 5-12. Mykhalovskiy, E. (1996). Reconsidering Table Talk: Critical thoughts on the
9 of 13relationship between sociology, autobiography and s elf-indulgence. Qualitative Sociology, 19(1), 131-151. Nairn, A. (1980). The reign of ETS Washington. National Training Board. (1992). National Competency Standards: Policy and Guidelines (Second Edition) Canberra: National Training board. National Training Board. (1995, January 1995). Who' s doing what? Assessment in Australia today. NTB Network Special Conference Edition, 19-20. Norris, N. (1991). The trouble with competence. Cambridge Journal of Education, 21(3), 331-341. Nuttall, D. (1979). The myth of comparability. Journal of the NAIEA, 11(3), 16-18. Nuttall, D., Backhouse, J., & Willmott, A. (1974). Comparability of standards between subjects. (Vol. 29). Oxford: Evans/Methuen Educational. Oakley, A. (1991). Interviewing women. In H. Robert s (Ed.), Doing feminist research London: Routledge and Kegan Paul.Orrell, J. (1996). Assessment in higher education: an examination of everyday academic's thinking-in-assessment, beliefs-about-as sessment, and a comparison of assessment behaviours and beliefs. Unpublished Ph D Flinders University of South Australia, Adelaide.Partington, J. (1994). Double-marking students' wor k. Assessment and Evaluation in Higher Education, 19(1), 57-60. Pawson, R. (1989). A measure of measures London: Routledge. Pearson, A. (1984). Competence: a normative analysi s. In E. Short (Ed.), Competence; Inquiries into its meaning and acquisition in educa tional settings, (pp. 31-40). Lanham, MD: University Press of America.Pennycuick, D., & Murphy, R. (1988). The impact of graded tests London: The Falmer Press.Perkins, D., & Salomon, G. (1988). Teaching for tra nsfer. Educational Leadership (September), 22-32. Persig, R. (1975). Zen and the art of motorcycle maintenance: An enqui ry into values New York: Bantam Press.Persig, R. (1991). Lila: An enquiry into morals London: Bantam Press. Peters, M. (1996). Poststructuralism, politics and education Westport: Bergin & Garvey.Phillips, D. (1990). Subjectivity and objectivity: an objective enquiry. In E. Eisner & A. Peshkin (Eds.), Qualitative enquiry in education New York: Teachers college, Colombia University.
10 of 13Popkewitz, T. (1984). Paradigm and ideology in educational research London: The Falmer Press.Porter, P., Rizvi, F., Knight, J., & Lingard, R. (1 992). Competencies for a clever country: Building a house of cards? Unicorn, 18(3), 50-58. Prigogine, I., & Stengers, I. (1985). Order out of chaos London: Fontana. Quine, W. (1953). From a logical point of view New York: Harper and Row. Rechter, B., & Wilson, N. (1968). Examining for uni versity entrance in Australia: Current practices. Quarterly Review of Australian Education, 2(2). Reilly, R., & Chao, G. (1982). Validity and fairnes s of some alternative employee selection procedures. Personnel Psychology, 33(1), 1-55. Resnick, D. P., & Resnick, L. B. (1985). Standards, curriculum and performance: A historical and comparative perspective. Educational Researcher, 14(4) 5-20. Rorty, R. (1991). Objectivity, relativism, and truth Cambridge: Cambridge University Press.Rose, N. (1990). Governing the soul: The shaping of the private self London: Routledge.Rosenberg. (1967). On quality in art: criteria of excellence, past and present Princeton: Princeton University Press.Royal Commission. (1974). Report on the suspension of a high school student Adelaide: South Australian Government.Sadler, D. R. (1987). Specifying and Promulgating A chievement Standards. Oxford Review of Education, 13(2) 191-209. Sadler, R. (1995). Comparability of assessments, gr ades and qualifications. Paper presented at the AARE Conference, Hobart, 24 Novemb er. Schmidt, F., Hunter, J., & Pearlman, K. (1981). Tas k differences as moderators of aptitude test validity in selection: a red herring. Journal of Applied Psychology, 66(2) 166-185.Schnell, J. (1980). The fate of the earth London: Picador. Schwandt, T. (1990). Paths to enquiry in the social disciplines. In E. Guba (Ed.), The paradigm dialog Newbury Park: SAGE Publications. Scriven, M. (1991). Evaluation thesaurus; fourth edition Newbury Park, Cal: SAGE Publications.Shepard, L. (1991). Psychometricians' beliefs about learning. Educational Researcher, 20(7), 2-16.
11 of 13Shepard, L. A. (1993). Evaluating test validity, Review of research in education, 19 Sherman, R., & Webb, R. (1988). Qualitative research in education New York: Falmer. Slater, P. (1966). Microcosm New York: John Wiley. Smith, B. (1994). Addressing the delusion of releva nce: Struggles in connecting educational research and social justice. In J. Smyt h (Ed.), Qualitative approaches in educational research (pp. 43-56). Adelaide: Flinders University of Sou th Australia. Smith, J. (1990). Alternative research paradigms an d the problem of criteria. In E. Guba (Ed.), The paradigm dialog Newbury Park: SAGE Publications. Smith, J. (1993). After the demise of empiricism: the problem of judg ing social and education inquiry Norwood, N.J.: Ablex Publishing Corporation. Smyth, J. (Ed.). (1994). Qualitative approaches in educational research Adelaide: Flinders University of South Australia.Soucek, V. (1993). Is there a need to redress the b alance between systems goals and lifeworld-oriented goals in public education in Aus tralia? In C. Collins (Ed.), Competencies: The competencies debate in Australian education and training Canberra: Australian college of Education.Spearritt, D. (Ed.). (1980). The improvement of measurement in education and psychology Hawthorne, Victoria: Australian Council for Educa tional Research. Stake, R. (1991). The countenance of educational ev aluation. In M. McLaughlin & D. Phillips (Eds.), Evaluation and education: at quarter century (pp. 67-88). Chicago: the University of Chicago Press.Stanley, G. (1993). The psychology of competency-ba sed education. In C. Collins (Ed.), Competencies: the competencies debate in Australian education and training Canberra: Australian College of Education.Stern, D. (1991). Diary of a baby London: Fontana. Sternberg, R. (1990). T & T is an explosive combina tion: technology and testing. Educational Psychologist, 25(3&4) 201-222. Sydenham, P. (1979). Measuring instruments: tools of knowledge and contr ol London: Peter Peregrinus Ltd.Taylor, C. (1994). Assessment for measurement of st andards: The peril and promise of large-scale assessment reform. American Educational Research Journal, 31(2), 231-262. Taylor, P. (1961). Normative discourse Englewood Cliffs: Prentice-Hall, Inc. The Flinders University of South Australia. (1997). Calender Adelaide: Flinders University of South Australia.The Social Development Group. (1979). Developing the classroom group: How to make
12 of 13your class a better place to live in Adelaide: South Australian Education Department. The Social Development Group. (1980). How to make your classroom a better place to live in Adelaide: South Australian Education Department. Thompson, P., & Pearce, P. (1990). Testing times Adelaide: TAFE National Centre for Research and Development.Thompson, W. (Ed.). (1987). Gaia, a way of knowing Hudson: Lindisfarne Press. Travers, E., & Allen, R. (1994). Random sampling of student folios: a pilot study (1 0) Brisbane: Board of Senior Secondary School Studies, Queensland. Watzlewich, P. (1974). Change New York: W Norton & Co. Weiss, C. (1991). Evaluation research in the politi cal context. In M. McLaughlin & D. Phillips (Eds.), Evaluation and education; at quarter century (pp. 211-231). Chicago: The University of Chicago Press.Wheeler, L. (1993). Reform of Australian vocational education and training: A competency-based system. In c. Collins (Ed.), Competencies: the competencies debate in Australian education and training Canberra: Australian College of Education. Wiggins, G. (1988). Teaching to the (authentic) tes t. Educational Leadership (September), 41-47. Wilbur, K. (1977). The spectrum of consciousness Wheaton: Quest. Wilbur, K. (1982). Up from Eden: A transpersonal view of human evoluti on Boston: Shambhala.Wilbur, K. (1991). Grace and grit North Blackburn: Collins Dove. Wilbur, K. (1995). Sex, ecology, spirituality Boston: Shambhala. Wiliam, D. (1995). Technical issues in criterion-re ferenced assessment: evidential and consequential bases. In T. Kellaghan (Ed.), Admission to higher education Dublin: Educational Research Centre.Williams, F. (Ed.). (1967). Educational evaluation as feedback and guide Chigago: The National Society for the Study of Education.Willmott, A. S., & Nuttall, D. L. (1975). The reliability of examinations at 16+ London: Macmillan Education Ltd.Wilson, N. (1966). A programmed course in physics, Form V Sydney: Angus and Robertson.Wilson, N. (1969). Group discourse and test improve ment. Unpublished data. Wilson, N. (1969). A study of test-retest and of ma rker reliabilities of the 1966 commonwealth secondary scholarship examination. ACER Information Bulletin, 50(1).
13 of 13Wilson, N. (1970). Objective tests and mathematical learning Melbourne: Australian Council for Educational Research.Wilson, N. (1972). Assessment in the primary school Adelaide: South Australian Education Department.Wilson, N. (1974). A framework for assessment in the secondary school Adelaide: South Australian Education Department.Wilson, N. (1985). Young people's views of our world (Peace Dossier 13). Melbourne: Victorian Association of Peace Studies.Wilson, N. (1986). Programmes to reduce violence in schools. Adelaide: South Australian Education Department.Wilson, N. (1992). With the best of intentions Nairne: Noel Wilson. Withers, G. (1995). Achieving comparability of scho ol-based assessments in admissions procedures to higher education. In T. Kellaghan (Ed .), Admission to higher education Dublin: Educational Research Centre.Wolf, D., Bixby, J., Glenn, J., & Gardner, H. (1991 ). To use their minds well: Investigating new forms of student assessment, Review of research in education (Vol. 17, pp. 31-71).Wolf, R. M. (1994). The validity and reliability of outcome measures. In A. C. Tuijnman & T. Neville Postlethwaite (Eds.), Monitoring the standards of education. Wood, R. (1987). Aspects of the competence-performa nce distinction: Educational, psychological and measurement issues. Curriculum Studies, 19(5), 409-424. Wood, R. (1987). Measurement and assessment in education and psychol ogy London: The Falmer Press.