xml version 1.0 encoding UTF8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c20059999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E1100452
0 245
Educational policy analysis archives.
n Vol. 13, no. 30 (May 06, 2005).
260
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c May 06, 2005
505
Toward an objective evaluation of teacher performance : the use of variance partitioning analysis, VPA / Eduardo R. Alicias, Jr..
650
Education
x Research
v Periodicals.
2 710
Arizona State University.
University of South Florida.
1 773
t Education Policy Analysis Archives (EPAA)
4 856
u http://digital.lib.usf.edu/?e11.452
xml version 1.0 encoding UTF8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.loc.govmodsv3mods31.xsd
mods:relatedItem type host
mods:identifier issn 10682341mods:part
mods:detail volume mods:number 13issue 30series Year mods:caption 20052005Month May5Day 66mods:originInfo mods:dateIssued iso8601 20050506
PAGE 1
EDUCATION POLICY ANALYSIS ARCHIVES A peerreviewed scholarly journal Editor: Sherman Dorn College of Education University of South Florida Copyright is retained by the first or sole author, who grants right of first publication to the Education Policy Analysis Archives EPAA is published jointly by the Colleges of Education at Arizona State University and the University of South Florida. Articles are indexed in the Directory of Open Access Journals (www.doaj.org). Volume 13 Number 30 May 6, 2005 ISSN 10682341 Toward an Objective Evaluati on of Teacher Performance: The Use of Variance Partitioning Analysis, VPA1 Eduardo R. Alicias, Jr. New England College The Philippines Citation: Alicias, E. R. Jr. (2005, May 6). Toward an objective ev aluation of teacher performance: The use of varian ce partitioning analysis, VPA. Education Policy Analysis Archives, 13 (30). Retrieved [date] from ht tp://epaa.asu.edu/epaa/v13n30/. Abstract Evaluation of teacher performance is us ually done with the use of ratings made by students, peers, and principals or supervisors, and at times, selfratings made by the teachers themselv es. The trouble with this practice is that it is obviously subjective, and vulnerable to what Glass and Martinez call the Â“politics of teac her evaluation,Â” as well as to professional incapacities of the rate rs. The valueadded analysis (VAA) model is one attempt to make evalua tion objective and evidencedbased. However, the VAA modelÂ—especially that of the Tennessee Value Added Assessment System (TV AAS) developed by Dr. William SandersÂ—appears flawed essentially because it posits the untenable assumption that the gain score of stude nts (value added) is attributable only and only to the teacher(s), igno ring other significant explanators of student achievement like IQ and soci oeconomic status. Further, the use of the gain score (valueadded) as a dependent variable appears hobbled with the validity threat called Â“sta tistical regression,Â” as well as the 1 Copyright Registration No. O 2004037, February 26, 2004, The National Library, Manila, The Philippines
PAGE 2
Education Policy Analysis Archives Vol. 13 No. 30 2 problem of isolating the conflated ef fects of two or mo re teachers. The proposed variance partitioning analys is (VPA) model seeks to partition the total variance of the dependent variable (posttest student achievement) into various portions representing: first, the effects attributable to the set of teacher factor s; second, effects attributable to the set of control variables the most impor tant of which are IQ of the student, his pretest score on that particul ar dependent variable, and some measures of his socioeconomic status; and third, the unexplained effects/variance. It is not difficult to see that when the second and third quanta of variance are partitioned out of the total variance of the dependent variable, what remains is that attributable to the teacher. Two measures of teacher effect are hereby proposed: 1 for proportional teacher effect and 2 for direct teacher effect. The Need for an Objective Teacher Evaluation There is an obvious need for objective teacher evaluation. First, on equity considerations, there is a need to establish a direct link between te acher productivity and teacher compensation. Clearly, it should be the case that the more productive teachers should be paid more and/or should be given priority for promotion. Or, by the same token, the teaching laggards will have to be considered first for compulsory retraining or even dismissal, if the law allows. Second, on optimality grounds, there is a need on the part of the school administrator to deploy hi s teachers in the teac hing of courses where they can demonstrate their utmo st competenciesÂ—or, in the language of the economist, where their largest respective marginal productivi ties lie. For example, if it is shown that a mathematics teacher is more productive in the teaching of, sa y, algebra than in geometry, then the Pareto principle of opti mality would dictate that said teacher, ceteris paribus would have to teach alge bra instead of geometry. Consequently, on the whole, the school will then tend to move to a higher level of pr oductive optimality. And, third, it makes significant political sense that the school administra tion is perceived to be fair and impartial in the assignme nt and compensation of teac hers. Obviously, this will minimize the occurrence of distractive unc ooperativeness or at times even destructive resistance on the part of the teachers. WhatÂ’s Wrong With the Traditional Practice of Teacher Evaluation? Teacher evaluation is essentially and almost always done with the use of ratings made by students, peers, and principals or supe rvisors, and at times, selfratings made by the teachers themselves. The trouble with this evaluation scheme is that it is obviously subjective and vulnerable to the quirks and fr ailties of the raters, not to mention their professional incapacities. For example, wh at sense can one make of a principal whose professional specialization is English an d then observing and suggesting that the mathematics teacher handle, say, quadratic equati ons in this or that wa y? Or, what if the principal, for one reason or another, simply dislikes the teacher? Indeed, these occurrences, as we ll as those described in Gl ass & Martinez (1993).
PAGE 3
Alicias: Toward an Objective Evaluation of Teacher Perfor mance 3 The traditional practice is teachercentered in that it us es ratings about traits and behavioral patterns of the te acher, rather than those abou t the students. Of course, students are the reasons for existence of sc hools and teachers; and therefore, whatever happens or not happens about them in the name and proc ess of teaching should be the basis of measuring the effect iveness of said process. Implicit in the traditional scheme are some global and commonly accepted but essentially unvalidated assumptio ns about teacher tr aits and behavior, such as: (1) teacher performance is a monotonic in creasing function of educ ational attainment and/or professional seminars and ins ervice trainings undertaken, and (2) there is a standard teacher classroom behavior against which indi vidual teacher behavior is measured. In regard to the first, there is no conv ergence of evidence showing that the more highly educationally qualified teachers are mo re effective in the cl assroom. The second is simply heroic. What is th at standard behavior? Does it make empirical senseÂ—and, if so, in all or what subject ar eas? Who should define th is behavioral pattern? The principal or a professional bo dy? On what light would they do that? That of revelation (dogma) or that of science? By the way, what is the nature of teachi ng? Is it art or science? If it is art, then why canÂ’t we just leave th e individual teacher alone to his own artistic devices? On the other hand, if it is science, where then is that unambiguous corpus of scientific knowledg e that predicts wi th great probability that, say, this particular teacher behavior will produce this much of this type of student achievement within this length of time? In this age of postEinsteinian relativity (e.g., supersymmetry and superstrings are on the horizon), does it make sense to consider the nature of teaching as determi nistic as that? In any event, the traditional practice also moves against the flow of professional autonomy of the teacher who is licensed by th e state to practice his profession. This is anchored on the academic freedom of the teac her, which apparently is now a wellsettled and universally acce pted principle. The ValueAdded Analysis : Is This a Valid Att empt at Objectivity? The valueadded model of teacher evaluation seeks to isolate the additional learning (the value added) that is presumed to have occurr ed at the end of a teachinglearning cycle, say, at the end of a term or school yearÂ—and by some mode of reasoning, attribute this increment to the teacher. Ernest Pascarella (1986) described this mo del as attempting to separate the net effects of instruction from pr evious ability or simp le maturation. He suggested ways to improve valueadded assessment, namely: cro sssectional research design, methods of estimating the effect of a particular lear ning experience independent of studentsÂ’ prelearning differences, multiple regression analysis, analysis of joint or redundant effects not directly attributable to instru ction, and the development of causal models. (ERIC CDROM, 19851998). Likewise, he note d the possibility that not all students may benefit equally from the same experience. Th is is a reality that always happens in the classroom, and apparently this is somethin g that the valueadded model is unable to capture. In 1998, Jill Berlin Slac k and Edward P. St. John use d a variant of the valueadded modelÂ—the sequential analysisÂ—to exam ine the association of specific factors to test score improvement. Among others, their findings showed the significant impact of age, gender, school environment, and curri culum and instruction on improvement.
PAGE 4
Education Policy Analysis Archives Vol. 13 No. 30 4 However, their most consistent and significan t finding is that Â“hig her ability students were less likely to improve than lower ability st udents.Â” They argued that this finding is consistent with the Accelerated Schools phil osophy that Â“disadvantagedÂ” students stand the most to gain from in novative teaching approaches (ERIC CDROM, 19851998) The Tennessee Value Adde d Assessment System (TVAAS) is arguably the most famous of such assessment sy stems. It was designed and operationalized by Dr. William Sanders, acknowledged by many as the valueadded analysis guru. He claims that by carefully tracking student progress over time Â— with his Â“mixedmodelÂ” statistical methodologyÂ—he can gauge student academic performanceÂ—and the teacher effect on that performanceÂ—that is more accura te and fair than earlier measures. Apparently, however, Dr. Sanders has no t cared to publis h a complete and detailed description of his model (other rese archers are encouraging him to do so in a refereed journal), so we rely on how others describe his model and/or his own general description of his model. For example, Jeff Archer (1999) describes SandersÂ’ approach as follows: While other researchers have spen t years struggling to control for differences in studentsÂ’ backgroundsÂ—such as family income and parentsÂ’ educational levelsÂ—Mr. Sander s lets each student act as his or her own control. To do that he focuses on gains, instead of raw scores, so that each studentÂ’s pe rformance is compared not with that of similar students, but against his or her own pa st performance. The tool he uses is called mixedmodel methodology. Though written into the Tennessee school code, its exact operation is nearly incomprehensible to a layperson. (1999 Editorial Projects in Education, Vol. 18, Number 34, pp 2628) It appears that Sanders atte mpted an improvemen t on traditional statistics, that is, an improvement on convention al trend or timeseries analys is. Is it a multivariate analogue of a simple or even an interrupted timeseries anal ysis? If so, how many time intervals are included in his mo del? Did he say more or less three years? Would that be adequate enough to yield a valid analysis? Anyway, consider the following example as mentioned by Archer. If a teacher taught just one student for one year and that student made poor progress, then traditional statistics would pred ict that the teacherÂ’s next student would falter as well. However, on the other hand, SandersÂ’ mixedmodel would take that single result and predict that the next student would make gain s that would only be slightly worse than the average for all the teachersÂ’ students. Archer described the mixedmodel as that involving a weighting of results based on how much information is available. He fu rther described the stat istical algorithm as a Â“magicÂ” called Â“shrinkage estimation,Â” and what it yields is called a Best Linear Unbiased Predictor (BLUP). SandersÂ’ BLUPÂ—Magical, Mystical. We may grant that SandersÂ’ BLUP is a magical algorithmÂ—but, at least for now, its magic appears far too mystical to be clearly understood and appreciated by those dir ectly concernedÂ—the ordinary classroom teachers and/or school princi pals. Indeed, he owes it to the interested readersÂ—the
PAGE 5
Alicias: Toward an Objective Evaluation of Teacher Perfor mance 5 scientific community at largeÂ—to publish a clear descripti on of what is reported to be his claim as the Â“bestÂ” estimate of the teacher effect on st udent achievement. In view of the absence of a clear description of how he mixed the ingredients, as it were, of his mixedmodel, we now specul ate and interpret SandersÂ’ model as follows. Let us ask some basic questions, but firs t let us lay down some basic premises consistent with what are reported to be such ingredients. Well, first, he tracks student progress over time, presumably to assemble a se t of timeseries data c onsisting of a finite temporal chain of discrete incremental valu es (valueadded quantities). Second, the student acts as his own controlÂ—allegedly co ntrolling for vital background factors such as family income and parentsÂ’ educational leve ls (as reported by Ar cher). With the data thus assembled, let, say, delta 10th be the observed valueadded of student X for the last (10th) stage in, say, a 10stage learning cycle. Then the BLUP algorithm is applied, weighting into account as much relevant avai lable information as possible. And bingo, the BLUP of the teacher effect on studen t XÂ’s achievementÂ—presumably a portion of delta 10thÂ—comes to the fore. Now, what nonzero weights would Sanders assign to such re levant information as family income and parentsÂ’ educational levels? Would the weights partake of the nature of Bayesian probabilitie s? Anyway, there appears to be the necessary implicit assumption that BLUP must al ways be less than delta 10th; otherwise, if he should assume equality, then the weights of all th e other background fact ors would each be reduced to a nullityÂ—contrary to an implied premise of his algorithmÂ—and, of course, contrary to the weight of empirical evidence. At an y rate, it would be much too unrealistic and utterly counterintuitive for him to posit that the whole of delta 10th is determined by just one and only one factorÂ—the teacher. Further, in light of available literature, why is there no expl icit mention of two other factors which are probably more important, na mely: the intelligen ce quotient (IQ) and the learning or cognitive state of student X at the be ginning of the 10stage cycle (pretest score). If these are included in th e available Â“relevant information,Â” then again, what would be their subjective weights ? And by how much would delta 10th be further adjusted downward beca use of said weights? Does he also assume that family income, parentsÂ’ educational levels, and the other available relevant information are invariant over time? If the unit of analysis is a student or a cohort of studen ts undiminished by attrition, it may be granted that IQ is inva riant over time (although the basic question remains, what is the effect of that invariant IQ), but then what about the other relevant factors? For example, what happens if the pare ntsÂ’ educational levels increase, say, at the end of the 8th stage of the cycle? Is the BLUP algorithm designed to handle such intertemporal variations of some relevant co variates? Also, what about the effect of maturation on the part of the student? In short, why did he purposely exclude factors like IQ, family income, and pa rentsÂ’ educational attainmentÂ—or equivalently, assume the same to be constantÂ—factors which time and again appear in the literature as significantly impinging upon student achievement? In a related vein, Sherman Dorn (see Glass, 1995) mentions a number of problems afflicting the valueadded assessme nt system, the most important of which apparently vitiates the gain score (value added) as a basis for statistical analysis. Dorn pursues the point as follows:
PAGE 6
Education Policy Analysis Archives Vol. 13 No. 30 6 Â“A gain score is a questionable basi s for statistical analysis. Gain scores conflate the effects of two different teac hers. VAA may seriously underestimate the effect s of prior knowledge, social background, etc. Would the effects be different if you put in sex, race, economic class, pe rhaps a square of last yearÂ’s scores, in the equation? I bet no one knows.Â” Glass, on the same occasion, further drives home the point as follows: Â“Now imagineÂ—and it should be no strain on oneÂ’s imagination to do soÂ—that we have Teache r A and Teacher B and each has had the pretest (Sept) achiev ement status of their students impeccably measured. But A has a class with average IQ of 115 and B has a class average IQ 90. LetÂ’s su ppose that A and B teach to the very limit of their abilities all year long and th at, in the eyes of God, they are equally talented teachers. We woul d surely expect that AÂ’s students will achieve much more on the postte st (June) than BÂ’s. Anyone would assume so; indeed we would be shocked if it were not so.Â” Indeed, there are many questions in need of answers, and there may be more that are needed to be asked, not to mention a number of ot her possible assumptions and a slew of other subjective weights (probabilit ies). There is small wonder then that according to Dorn Â“VAA is not an eval uation system accessible to teacher understanding.Â” Indeed, the crucible where Sanders mixed the ingredients of his mixedmodel recipe appears much t oo mystical and cryptic. Are There Flaws and/or Weakness es of a ValueAdded Model? We may grant that Sanders had the foresight and wisdom to anticipate and adequately factor into his model all such aforementioned questions, assumptions, and subjective weights; but it is not clear from the available literature how he handled what apparently are some possible flaws and/or weaknesses of such a valueadded model. In fact, woefully, instead of meeting headon the st raightforward and pointed remarks of Dorn and Glass, he dished out arcane technical ja rgon and assert ed tangential generalities as follows: We do not even calculate simple gain s. For example, we use the whole observation vector for each child ov er all subjects and grades. This approach is superior to tr aditional multivar iate approaches. As we apply these approaches in the context of the estimation of the teacher and school effects on the academic growth of populations of students, we take advantage of the prior knowle dge of the distribution of the variancecovariance structure among populations of teachers, as well as the variancecovariance structure among students. Regardless, first the valueadded model (using gain scores) appears flawed with the methodological threat to in ternal validity called Â“statistical regression.Â” This is the phenomenon wherein larger incremental values (valueadded quantities) are observed on administration of posttest among students wi th lower pretest scor es than those with
PAGE 7
Alicias: Toward an Objective Evaluation of Teacher Perfor mance 7 higher pretest scores. Apparently, this was wh at Pascarella referred to when he noted, as stated earlier, the pos sibility that not all st udents may benefit eq ually from the same experience. Ironically and in fact, Sanders himself noticed this phenomenon which he called Â“shed patterns.Â” Unfortunately, witting ly or unwittingly, he apparently chose to ignore the significance of this phenomenon. In this respect, Ar cher reported about SandersÂ’ observation, thus: In many urban schools, he has notice d a pattern in whic h students with the lowest past performance make the greatest gains, but those who start with high scores make little headway. A graph of such gains against past performance creates a do wnward sloping line from left to right. He calls these Â“shed patter ns.Â” (1999 Editorial Projects in Education, Vol. 18, Nu mber 34, pp 2628) This phenomenon probably underpinned the apprehension articulated by Tom Mooney, president of the Ohio Federation of Teachers, as reported by Willard and Oplinger (2003), about the use of a valuead ded model in the evaluation of teachers. It was reported that Mooney did no t want to use a valueadded approach to the evaluation of Ohio teachers because the same Â“could divide teachers as they try to avoid hardtoeducate children.Â” If, in fact Sanders fa iled to account for th is phenomenon in his algorithm, then indeed that weakness would ev entually be translated into the behavior of teachers trying hard to avoi d low IQ or slowlearning stud ents. (Willard, Dennis J. and Doug Oplinger. Â“Â’Valueadded analysisÂ’ cred its school districts for progress a student makes,Â” Columbus, Ohio: Beacon Journal ; May 26, 2003). This problem is essentially related to that posed by the likelihood that this phenomenon may yield th e spurious conclusion that an observed incremental value is significant, when in fact it ma y just be a statistical artifact. The fundamental question is then asked: which is indicative of greater teacher productivity, the larger gains that may arise from the lower pretest sc ores or the smaller gains that may arise from the higher pretest scores? Or, by the sa me token, which is indicative of greater productivity: smaller increments arising from lowIQ students or larger increments arising from highIQ students. There are two possible answers to this question, depending on the underlying assumption. If the unrealistic assumption is made that th e entire valueadded (gain score) is caused by the teacher and only by the te acher (apparently this is SandersÂ’ basic assumption but which we deem it much too heroic), then th e larger gains arising from the lower pretest scores or those arising from highIQ students would certainly be indicative of greater te acher productivity. On the other hand, if the realistic assumption is made that only a portion of that valueadded (gain score) is causally attributab le to the teacher, then it is possible that Â— after isolating the effects of the relevant nonteacher factors, including but not limited to IQ and the pretest scores Â— the smaller gross gains arising from the higher pretest scores or those arising from lo wIQ students would yield proportional net gains possibly even larger net gainsÂ— thereby truthfully indicating greater teacher productivity. To belabor the point, an an alogous though exag gerated question is asked: which is more productive of energy, splitting a massi ve log into smaller pieces of firewood or splitting just one extremely smaller uranium atom?
PAGE 8
Education Policy Analysis Archives Vol. 13 No. 30 8 The second weakness of a valueadded approa chÂ—particularly that of SandersÂ— appears to be that it cannot get away from the constraints imposed by current theory and practice of teaching, especially the conc omitant difficulty of measurement. This difficulty arises because, as herein above de scribed by Archer, it involves a weighting procedure depending on Â“how much informati on is available.Â” This means that the BLUP should be more valid if there is so much more information made available. This consequently begets the fundamental ques tion: what are the bits and pieces of information that are availableÂ—and perhap s more importantly, what are those that should be made available that were not factored into the estimation of SandersÂ’ BLUP. For instance, what are the ot her teacher factors that are embodied in the teacher and therefore necessarily inputted into the teac hinglearning process that produce a desired incremental value but were no t captured by the BLUP? In view of the fact that there is a great and bewildering multiplicity and indeterminacy of both observable and unobser vable teacher factors, then the task of identifying which to conceptualize and to me asure, aside from the few that are being traditionally used in research, s hould in fact be attended wi th mindboggling difficulties. The intangible and unobservable teacher traits like desires, motivations, attitudes, values, philosophies, fears, anxieties, emotions, affe ctions, dreams, mania, native abilities, etc., etc., etc. are particularly diffi cult to handle. What about th eir behavioral patterns in the classroom? Their teaching styles and tec hniques? Their physical appearance and dressing style? What about their patterns of social interactionsÂ—professional, family, marital or even extramarital relationships? What about their religious convictions? Indeed, what about an almo st infinity of other teacherrelated factors? The third apparent weakness has something to do with practical considerations. In addition to what has be en mentioned earlie r on, Lynn Olson (1998) reported that many worry about the complexity of the statistical te chniques used, th us making the valueadded approach vulnerable to misunderstan ding by the public at large, particularly the parents and the taxpayers. Olson elaborated that Â“public support is considered a crucial element of accoun tability efforts, and st ates and districts have long been criticized for using language and statistics that confuse, rather than enlighten.Â” Likewise, she cited the ambivalent description of Carol Ascher, senior research scientist at th e Institute for Education and Social Policy, New York Univer sity, about the features of a valueadded approach to teacher evaluation, thus: Â“As an idea, itÂ’s very appealing. It feels very progressive. It feels fair. Bu t the execution of it is so pr oblematic.Â” (Olson, Lynn. Â“A Question of Value,Â” Education Week on the Web ; May 13, 1998). The VariancePartitioning Analys is (VPA) Model: Conceptual And Methodological Advantages It appears that the attempt at crafting an objective te acher evaluation scheme in the context of an inputoriented and/or te achercentered framework is fraught with conceptual, methodological, and measuremen t difficulties. More importantly, the purported measure of teacher effect constructed therefrom appears untenabl e on closer scrutiny. We instead propose a variancepartition ing analysis (VPA) model in the context of an outputoriented and studentcentered evaluation model. The concept appears elegant in its simplicity. There is a pie, as it were, that represents the total variance of a set of achievement scores on a particular depe ndent variable or criterion. This pie is
PAGE 9
Alicias: Toward an Objective Evaluation of Teacher Perfor mance 9 then partitioned into various angular portions representing: fi rst, the effects attributable to the set of teacher factors; second, effects attributable to the set of control variables the most important of which ar e IQ of the student, his pre test score on that particular dependent variable (criterion ), and some measures of hi s socioeconomic status like family income and/or the occupations and ed ucational attainments of his parents; and third, the unexplained effects or unexplained variance. This proposed model (VPA) ta kes into account the discom fort, at least, of Glass and Dorn about the noninclusion of relevant and significant covari ates like student IQ, pretest scores, and some socioeconomic variables. On the occasion of the aforementioned Internet Discussion of the Tennessee ValueAdded Assessment System (Glass, 1995), Dorn inst ructively asserts, thus: Â“Â…entering the prior yearsÂ’ scores as covariates solves the problem. Solving GlassÂ’ conundru m means that one assumes a linear relationship between first set of scores and seco nd set of scores, but thatÂ’s much more tenable than assuming an expected gain thatÂ’s constant across the distribution of first sets of scores.Â” Further, it takes in to account the suggestions made earlier by Pascarella that multivariate analysis of covariance be used on crosssectional data. Likewise, it is not difficult to see that this VPA model should not be infirmed with the major flaws and/or weaknesses of the valueadded approach as described earlie r on, including that which induces teachers to avoid hard toteach or lowIQ students. So, let evaluation be studentcentered. Sh ift the focus of eval uation from teacher traits and behavior to what fundamentally mattersÂ—that is, student achievement. Pending the appearance and wi de acceptance of a better alternative, let traditional cognitive achievement tests measure student ac hievement. And by making the class the unit of analysis, crosssectional data on the component students are thereby available and to be used in a manner consiste nt with PascarellaÂ’s suggestion. The beneficial effect of an outputori ented (achievementoriented) evaluation model stems from the fact that the teacher is at liberty to us e his individual creativity and artistry Â— on top of the indications of scientific knowledge Â— to do and to act in a manner consistent with what he thinks is best in the classroom. The teacher is thus empowered in the classroom as he must be. In this connection, Sanders and Horn (1995)Â—quite acceptably this time Â—state as follows: By focusing on outcomes rather th an the process by which they are achieved, teachers and schools are fr ee to use whatever methods prove practical in achieving student academic progress. The VPA may be done in a way outlined hereunder as follows: Consider the basic variance equation: Y2 = X 2 + U 2; (Equation 1) where Y2 = total variance of stud ent achievement (posttest criterion scores);
PAGE 10
Education Policy Analysis Archives Vol. 13 No. 30 10 X 2 = explained variance attributab le to the set of independent variables measured, specified, and included in the analysis; and U 2 = unexplained variance attribut able to all the other variables not included in the analysis. In light of experience and practical reality, X 2 (explained variance) can be broken down into: T 2 = the variance attributable to a ll the known and unkn own, as well as the observed and unobservable factors (traits) that are embodied in the teacher (teacher variance); and C 2 = the variance attributable to the control variables the most important of which are the studen tÂ’s intelligence quo tient (IQ), the studentÂ’s socioeconomic status (mainly indicated by family income or alternatively by the proxy va riables: parentsÂ’ occupation and parentsÂ’ educational attainment), and prelearning achievement level (pretest score). By definition, it is presented that 1 = T 2 + U 2 ; (Equation 2) where 1 stands for Â“proportional teacher effect.Â” Thus, the basic variance equation (Equa tion 1) may now be rewritten as: 1 + C 2 = Y 2. (Equation 3) By transposition, this reduces to: 1 = Y 2 Â– C 2, (Equation 3.1) By definition, it is also presented that: 2= Y 2 Â– C 2 Â– U 2, (Equation 3.2) where 2 stands for the estimated teacher variance or Â“direct teacher effect.Â” The basic properties of 1 or 2, which can be inferred from the aforementioned equations, wh ich are listed as follows: 1. 1 appears to be inversely proportiona l to the magnitude of the variance attributable to the control variables ( C 2).
PAGE 11
Alicias: Toward an Objective Evaluation of Teacher Perfor mance 11 2. Holding constant the variance attributable to the control variables, 1 is always greater than 2; except in the lim iting case where the unexplained variance is zero, where they are equal. 3. If the magnitude of the unexplained variance ( U 2) is held constant and if the variance explained by a given set of control variables ( C 2) is taken into account, 1 and/or 2 appears directly proportiona l to the portion of student achievement variance ( Y 2) that is causally attributable to the teacher. 1 or 2 is a catchall indicator showing the co mbined proportion al or direct effect of any and all observ able and unobservable traits of the teacher that have something to do with the teac hinglearning process. This is something th at an inputoriented model or a valueadded model like that of Sa nders apparently cannot do. Now, the fundamental qu estion is asked: can we validly use the values to compare the teaching performance of a teacher versus any other teacher? The answer is a cautious yes, depending upon the magni tudes of the unexplained variance across the units of analysis (classes). If the magn itude fluctuates wildly across the units of analysis (across teachers or classes), then its validity is probably impaired. Thus, the validity of the VPA and that of the calculated Â’s appear to be dependent upon the realism of the assumptio n that the magnitude of the unexplained variance remains essentially homogeneous acro ss the units of analysis. Now, is this assumption realistic? Appare ntly the answer is yes. Let us examine the characteristics of a typical school setting. Teachers or classes are usually grouped into various departments along disciplinal lines. Departmental examinations are usually ad ministered, thus to a certain extent homogenizing the magnitude of the unexplaine d variance. Teachers and/or students are subjected to the same admi nistrator, same set of departmental curricula, the same administrative policies and procedures, and the same departmental learning resources and facilities; thus further homogenizing said magnitude. Th erefore, within a disciplinal department, it is quite realistic to erect the assumption that the magnitude of the unexplained variance remains constant across teachers and/or classes. Quite understandably, realism is diminished if the analysis is extended beyond the boundaries of a disciplinal department and/or across course categories. Two difficulties emerge. First, there is the diff iculty posed by the diff erent criterion scores arising from different course categories. Second, there is th e problem posed by the perceived differential difficulties in the t eaching and learning of the various course categories. In regard to th e first, standardizing the scor es appears to be the only curative procedure. In regard to the s econd, subjective weights reflective of the perceived or felt differential difficulties of teaching or learning the various courses can be factored into the algor ithm. Perhaps they can be us ed to make adjustments to the calculated values. However, careful thought is needed, since doing so is likely to invite the risk of contaminating the inherent objectivity of the algorithm. At any rate, the standardization of the criterion scores is also probably useful in this regard. The specification of the algorithm accord ing to the specifi c conditions of a peculiar institutional setting, the calculation of the values, the guidance of some underpinning principles, and the caveats that must be borne in mind are explained in detail in a separate technical document. The interested readerÂ—particularly the
PAGE 12
Education Policy Analysis Archives Vol. 13 No. 30 12 school administrator who is desirous to us e the VPAÂ—may contact this writer at his Email address: ealicias@mla.nsclub.net. References Archer, J. (1999). 1999 Edit orial Projects in Education, Vol. 18, Number 34, pp 2628. Dorn, S. (1994 and 1995). Â“Discussion of the Tennessee Value Added Assessment System.Â” EDPOLYAN, Inter net LISTSERVE, Arizona St ate University College of Education. Glass, G. V (1995). Disc ussion of the Tennessee Valu e Added Assessment System. Retrieved May 1, 2005 from http:/ /epaa.asu.edu/epaa /v13n30/tvaas.html. Glass, G. V & Martinez, B. A. (1993). Politics of Teache r Evaluation. Retrieved May 1, 2005 from http://glass.ed.asu.edu/gene/papers/pte.pdf. Hebbeler, K. M. (1985). Â“Approach of the Preschool Evaluation Project to Measuring the Effectiveness of Early Interven tion.Â” ERIC Acce ssion No. ED259485. King, F. J. and M. D. Reblyer (1984). Â“A lternative Designs for Evaluating ComputerBased Instruction.Â” Journal of Instructional Development ; v7 n3 p2329. ERIC Accession No. Â–EJ318720. Olson, L. (1998). Â“A Question of Value,Â” Education Week on the Web ; May 13, 1998. Pascarella, Ernest (1986). Â“Are ValueAdded Analyses Valuable? ERIC CDROM, 19851998. Pennsylvania Department of Education and LancasterLebanon Intermediate Unit 13. Value Added Assessment System (VAAS). Sanders, W. L. and Horn, S. P. (1995). Â“Educati onal Assessment Reassessed: The Usefulness of Standardized and Alternative Measures of Student Achievement as Indicators for the Assessment of Educational Outcomes.Â” Education Policy Analysis Archives Vol. 3, No. 6, March 3, 1995; ISSN 10682341. Slack, J. B. and St. John, E. P. (1998). Â“A Model for Measuring Math Achievement Test Performance: A Longitudinal Analysis of NonTransien t Learners Engaged in a Restructuring Effort.Â” ERIC CDROM, 19851998. Willard, D. J. and Oplinger, D. (2003). Â“Â’Valueadde d analysisÂ’ credits school districts for progress a student makes,Â” Columbus, Ohio: Beacon Journal ; May 26, 2003.
PAGE 13
Alicias: Toward an Objective Evaluation of Teacher Perfor mance 13 About the Author Eduardo R. Alicias, Jr. Vice President for Academic Affairs New England College Quezon City, The Philippines Professorial Lecturer (Graduate Program) Bulacan State University, Malolos City & Philippine State Colle ge of Aeronautics Villamor Airbase, Pasay City Email: ealicias@mla.nsclub.net Doctor of Education, Educational Administ ration, 1981, University of the Philippines, Diliman, Quezon City, The Philippines. Diploma in Education (economics of education and education in developing count ries), 19851986, University of London, UK, (British Council Fellow). Former Editor and Founder, The SEARCH Journal ISSN 01176439, 1989 Â– 1998.
PAGE 14
Education Policy Analysis Archives Vol. 13 No. 30 14 Education Policy Analysis Archives http://epaa.asu.edu Editor: Sherman Dorn, University of South Florida Production Assistant: Chris Murre ll, Arizona State University General questions abou t appropriateness of topics or particular articles may be addressed to the Editor, Sherman Dorn, epaaeditor@shermandorn.com. EPAA Editorial Board Michael W. Apple University of Wisconsin David C. Berliner Arizona State University Greg Camilli Rutgers University Linda DarlingHammond Stanford University Mark E. Fetler California Commission on Teacher Credentialing Gustavo E. Fischman Arizona State Univeristy Richard Garlikov Birmingham, Alabama Gene V Glass Arizona State University Thomas F. Green Syracuse University Aimee Howley Ohio University Craig B. Howley Appalachia Educational Laboratory William Hunter University of Ontario Institute of Technology Patricia Fey Jarvis Seattle, Washington Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas MauhsPugh Green Mountain College Les McLean University of Toronto Heinrich Mintrop University of California, Berkeley Michele Moses Arizona State University Gary Orfield Harvard University Anthony G. Rud Jr. Purdue University Jay Paredes Scribner University of Missouri Michael Scriven Western Michigan University Lorrie A. Shepard University of Colorado, Boulder Robert E. Stake University of IllinoisÂ—UC Kevin Welner University of Colorado, Boulder Terrence G. Wiley Arizona State University John Willinsky University of British Columbia
PAGE 15
Alicias: Toward an Objective Evaluation of Teacher Perfor mance 15 Archivos Analticos de Polticas Educativas Associate Editors Gustavo E. Fischman & Pablo Gentili Arizona State University & Universidade do Estado do Rio de Janeiro Founding Associate Editor for Spanish Language (1998Â—2003) Roberto Rodrguez Gmez Editorial Board Hugo Aboites Universidad Autnoma MetropolitanaXochimilco Adrin Acosta Universidad de Guadalajara Mxico Claudio Almonacid Avila Universidad Metropolitana de Ciencias de la Educacin, Chile Dalila Andrade de Oliveira Universidade Federal de Minas Gerais, Belo Horizonte, Brasil Alejandra Birgin Ministerio de Educacin, Argentina Teresa Bracho Centro de Investigacin y Docencia EconmicaCIDE Alejandro Canales Universidad Nacional Autnoma de Mxico Ursula Casanova Arizona State University, Tempe, Arizona Sigfredo Chiroque Instituto de Pedagoga Popular, Per Erwin Epstein Loyola University, Chicago, Illinois Mariano Fernndez Enguita Universidad de Salamanca. Espaa Gaudncio Frigotto Universidade Estadual do Rio de Janeiro, Brasil Rollin Kent Universidad Autnoma de Puebla. Puebla, Mxico Walter Kohan Universidade Estadual do Rio de Janeiro, Brasil Roberto Leher Universidade Estadual do Rio de Janeiro, Brasil Daniel C. Levy University at Albany, SUNY, Albany, New York Nilma Limo Gomes Universidade Federal de Minas Gerais, Belo Horizonte Pia Lindquist Wong California State University, Sacramento, California Mara Loreto Egaa Programa Interdisciplinario de Investigacin en Educacin Mariano Narodowski Universidad To rcuato Di Tella, Argentina Iolanda de Oliveira Universidade Federal Fluminense, Brasil Grover Pango Foro Latinoamericano de Polticas Educativas, Per Vanilda Paiva Universidade Estadual Do Rio De Janeiro, Brasil Miguel Pereira Catedratico Un iversidad de Granada, Espaa Angel Ignacio Prez Gmez Universidad de Mlaga Mnica Pini Universidad Nacional de San Martin, Argentina Romualdo Portella do Oliveira Universidade de So Paulo Diana Rhoten Social Science Research Council, New York, New York Jos Gimeno Sacristn Universidad de Valencia, Espaa Daniel Schugurensky Ontario Institute for Studies in Education, Canada Susan Street Centro de Investigaciones y Estudios Superiores en Antropologia Social Occidente, Guadalajara, Mxico Nelly P. Stromquist University of Southern California, Los Angeles, California Daniel Suarez Laboratorio de Politicas PublicasUniversidad de Buenos Aires, Argentina Antonio Teodoro Universidade Lusfona Lisboa, Carlos A. Torres UCLA Jurjo Torres Santom Universidad de la Corua, Espaa
