xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c19979999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00087
Educational policy analysis archives.
n Vol. 5, no. 18 (August 25, 1997).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c August 25, 1997
Academic freedom, promotion, reappointment, tenure and the administrative use of student evaluation of faculty (SEF) : part 3 : analysis and implications of views from the court in relation to accuracy and psychometric validity / Robert E. Haskell.
Arizona State University.
University of South Florida.
t Education Policy Analysis Archives (EPAA)
xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 5issue 18series Year mods:caption 19971997Month August8Day 2525mods:originInfo mods:dateIssued iso8601 1997-08-25
1 of 44 Education Policy Analysis Archives Volume 5 Number 18August 25, 1997ISSN 1068-2341A peer-reviewed scholarly electronic journal. Editor: Gene V Glass Glass@ASU.EDU. College of Education Arizona State University,Tempe AZ 85287-2411 Copyright 1997, the EDUCATION POLICY ANALYSIS ARCHIVES.Permission is hereby granted to copy any a rticle provided that EDUCATION POLICY ANALYSIS ARCHIVES is credited and copies are not sold.Academic Freedom, Promotion, Reappointment, Tenure And The Administrative Use of Student Evaluation of Faculty (SEF): (Part III) Analysis And Implications of Views From The Court i n Relation to Accuracy and Psychometric Validity Robert E. Haskell 1 University of New EnglandThis is the third of four articles by Haskell on th is subject. The other articles can be found at Volume 5 Number 6 Volume 5 Number 17 Volume 5 Number 21 Abstract: In two previous papers, it was noted that while a controversial history of research on the reliabilit y and validity of student evaluation of faculty (SEF) exists, it has not been typically viewed as an infringement on academic freedom, promotion, reappo intment, and tenure rights. As a consequence, legal aspects of S EF are neither readily apparent, nor available. Legal rulings, their impli cations and assumptions in relation to their accuracy and psychometric vali dity where SEF are integral to the denial of academic freedom, tenure, promotion, and reappointment are reviewed along with the legal pri nciples of Disparate
2 of 44 Treatment and Disparate Impact, and the scientific Precautionary Principle in policy decisions.Table of Contents Brief Overview of the Validity of SEF The Courts' Approach to the General Accuracy and Psychometric Validity of SEF Historical Overview of the Courts' Approach to the Validity of Faculty Evaluation Data Acceptance of Administrative Subjective and Untrained Evaluator Judgements Of SEF Data SEF as Social Judgement and Diagnosis Variables Affecting Validity Not Taken Into Account When Assessing SEF Instructional Variables Student Biases Variables Popularity Variables and Effectiveness The Courts' Reliance on Both Quantitative Data and Qualitative Comments in SEF Reliance on SEF v. Peer Evaluation Numerical Ranking of Faculty Use of Qualitative Written Student Comments Mixed Student Comments Transcendent Value of a Professor Over Teaching Quality Procedural, Burden of Proof, and Policy-Decision Criteria in Assessment of SEF Validity Assessment of SEF as Procedural or Process Issue Decision Criteria and the Scientific Precautionary Principle The Court's Approach to Validity of SEF in Relation to the Principles of Disparate Treatment and Disparate Impact Disparate Impact The Disparate Treatment and Impact Principles Generalized Beyond Statistical Significance of SEF Research Assumption # 1: Statistical Significance of Indicators of Teaching Effectiveness Assumption # 2: Statistical Significance of SEF of Teaching Effectiveness Measures Appropriate Learning Conclusion References Appendix: A Non Litigated Case of SEF Used in the Denial of Tenure and Reappointment ..........As I indicated in previous papers on SEF (Haskell, 1997a, 1997b), the history of legal rights demonstrates that issues not considered to h ave legal standing only come to have legal standing after a long process of advocacy. The evol ution of a policy or legal principle requires the accumulation of data, coalescing judge ments and arguments. To this end, this paper, will continue to examine court reasoning and rulings on SEF in cases involving the denial of academic freedom, tenure, promotion, and reappointment (AFTPR) decisions in relation to its implications and assumptions regard ing accuracy and psychometric validity. 2 ..........In a second paper on SEF, (Haskell, 1997b ), I abstracted from the text of located legal cases views from the court pertinent to SEF. The ap pendix of the second paper provided a verbatim abstracting of the text of each case relat ive to its SEF content. As a consequence, in summarizing the pertinent findings of that paper fo r the present one, for convenient referencing the specific case textual material for each section will be in placed in a footnote indicated at the beginning of each section heading and the indented "Summaries" are carried over from Part II.
3 of 44For convenience, I will use these abstracted legal views and rulings to examine their implications for the courts use of SEF in relation to their accuracy and psychometric validity. A final paper will address the implications of cour t rulings for academic freedom and instruction. As I noted in my second paper, not onl y are legal cases prima facially complex, but when specific legal definitions (e.g., disparat e treatment and impact) and other special Congressional Acts (e.g., EEOC) are superimposed on them, they become logically unwieldy, not just to the non legal scholar, but apparently t o the Courts as well. 3 ..........Finally, I would like to point out that t he issues examined in this series of papers are not primarily concerned with individual faculty rig hts but with the implications of SEF when used for administrative purposes on academic freedo m, educational quality, standards, and ultimately on the competence of graduates. 4 Brief Overview of the Validity of SEF..........As shown in Haskell (1977b) and reiterate d below, views from the court on the appropriate use of SEF vary so greatly that the con cept of variation might more descriptively be replaced by the concept of "randomness" were it not for the fact that there has been a consistent trend by the courts to accept SEF data a s it is presented to them by institutions. In presenting an analysis and implications of these vi ews from the court in relation to their validity, the detailed research literature on the v alidity of SEF will largely have to be bracketed. To do otherwise would take this article to far afield. Nevertheless, because the issue of validity is so central to this paper, an o verview of the SEF validity literature is a necessary foundation for the following analyses...........There is a long and controversial researc h history on SEF, with most early reviews and extant opinion---though certainly not all---suggest ing their general validity, with validity referring to the accuracy of SEF measuring teaching effectiveness. More recently, however, sophisticated statistical reviews of this past lite rature strongly suggest that earlier reviews of SEF literature were not rigorously analyzed and con trolled methodologically, thus casting serious doubt on their validity. As Barnett (1996), Greenwald (1997), Greenwald and Gillmore (1996) demonstrate, past reviews have tend ed to not be sophisticated critiques. Positions, suggesting cautious support for validity of SEF while at the same time expressing concerns about the adequacy of their support, inclu de, Abrami, Dickens, Perry, & Leventhal (1980). Reviews and empirical critiques that are cr itical of the validity of SEF include, Chacko (1983), Dowell and Neal (1982), Powell (1977 ), Snyder and Clair (1976), Vasta and Sarmiento (1979), and Worthington and Wong (1979). Some of the past reviews that have categorized the significant research that have foun d SEF to be essentially a valid measure of quality of instruction are: Cashin (1995), Cohen (1 981), Franklin & Theall (1990), Holmes (1972), Howard, Conway, and Maxwell (1985), Howard and Maxwell (1980, 1982), Marsh (1980, 1982, 1984), Marsh and Dunkin (1992), and Mc Keachie (1979). ..........Cahn (1987) suggests that student ratings do not measure the instructional effectiveness or the intellectual achievement of st udents. SEF measure student satisfaction, attitudes toward instructors course, student person ality, and their psychosocial needs. Cahn further suggests, students know if instructors are likeable, not if they are knowledgeable; they know if lectures are enjoyable, not if they are rel iable. In a meta-analysis Cohen (1983)---who basically accepts the validity of SEF---concludes f rom his study, "While the magnitude of the average rating/achievement correlation for the thir ty-three multisection courses is not overwhelming [14.4% of shared variance between rati ngs and the criteria], the relationship is
4 of 44certainly stronger and more consistent than we were led to believe..." (p. 455). And Dowell & Neal (1982) conclude that"The research literature can be seen as yielding un impressive estimates of the validity of student ratings. At their most valid, then, validit y of SEF refers to only 14% of the total variance. The literature does not, therefore, suppo rt claims that the validity of student ratings is a consistent quantity across situations. Rather, the evidence suggests that the validity of student ratings is modest at best and q uite variable...The variability in obtained validity coefficients even in studies with reasonab le methodological requirement lead us to suspect that the validity of student ratings is influenced by situational factors to such an extent that a meaningful, generalizable estimate of their validity does not exist. In general . .no meaningful estimate of the validity of studen t ratings can be provided with confidence that is generalizable enough to be usefu l..." (59-61).For example, studies demonstrate the following conf ounding variables: (1) Age, (2) gender, (3) class size, (4) year of student, (5) level of s tudent, (6) instructor style, (7) subject matter, (8) major or elective course, (9) student interest in subject matter, (10) instructor grading difficulty, (11) anonymous v.s signed ratings, (12) whether students are informed of their use, (13) instructor present v.s instructor absent while completing the evaluation (see for example, Divoky and Rothermel, 1988), (14) length of class p eriod, and a host of other variables. ..........Finally, the philosopher of science, Mich ael Scriven who has conducted rigorous work on evaluation procedures, (1995, 1993, 1991, 1988), particularly on the justification of inferring from ratings to conclusions about the mer it of teaching on the basis of statistical correlations between ratings and student learning g ains. He suggests that such inferences are invalid, unless a number of stringent conditions ar e met on the design, administration, and use of such ratings. He further suggests of faculty evaluation in general that, "All are face-invalid and certainly provide a worse basis fo r adverse personnel action than the polygraph in criminal cases. Based on examination o f some hundreds of forms that are or have been used for personnel decisions (as well as professional development), the previous considerations entail that not more than one or two could stand up in a serious hearing." Given this highly questionable state of affairs on the validity of SEF, the question is how do courts view validity in relations to its use for ad ministrative purposes? .The Courts' Approach to the General Accuracy and Psychometric Validity of SEF 5 ..........An issue directly related to the reliance on SEF for administrative purposes is its validity. Presumably the more valid SEF data in a g iven case, the more justifiable is the reliance on it for administrative purposes. From th e legal cases reviewed (in Haskell, 1997b), it is clear that the courts tend to accept SEF data as presented to them by institutions.Summary: With regard to requiring the general and statistic al accuracy of SEF, legal reasoning and rulings can be summarized (see Haskel l, 1997b) as ranging from: (1) accepting statistical analyses as a part of a plain tiff's effort to establish discriminatory treatment if it reaches proportions comparable to t hose in cases establishing a prima facie racial discrimination, (2) cautioning that statisti cs are not irrefutable, with their usefulness depending on surrounding facts and circumstances of a case, (3) maintaining that the court need not consider validity and is under no obligati on to establish the accuracy of administrative interpretations of SEF, (4) that ten ure criteria are not drawn with "mathematical nicety," (5) administrator's failure to perform statistical comparisons is not
5 of 44arbitrary and is reasonable, (6) especially if such is not required by a Faculty Association Contract, (7) nearly any use made of SEF, regardles s of its validity, is acceptable if it followed the standard practice of the university, ( 8) that creativity, rapport with students and colleagues, teaching ability, and other qualiti es are intangibles which cannot be measured by objective standards.Some courts (e.g., Fields V. Clark University 1987) have noted even when SEF are not gathered and evaluated according to accepted standa rds of scientific polling procedures it is nevertheless acceptable if the process followed sta ndard practice involved in other tenure decisions at the university (p.671)...........While there does exist a "substantial evi dence" standard which gauges whether an institution's decision-making body carefully consid ered the evidence and had a substantial body of evidence on which to base its decision, and an "arbitrary and capricious" standard which gauges whether a deciding body acted without reason or irrationally, (See Kaplin,1995, section 220.127.116.11. Standards of Judicial Review and B urdens of Proof 35), it appears these standards are frequently ignored in relation decisi ons based on SEF. ..........In general, the exception to the courts a lmost total disregard for the validity of SEF has been in cases involving EEOC issues. In such cases, the courts require precise accuracy. I will address this issue in more detail in the section of disparate treatment and impact below. Historical Overview of the Courts' Approach to the Validity of Faculty Evaluation Data ..........As noted previously, unlike general perfo rmance evaluations of faculty, SEF does not have a categorical legal history. Since SEF is but a subset of faculty performance evaluation in general, it is appropriate to briefly review the history of this more general area. Given SEF as a subset of faculty performance in general, it i s accordingly not surprising to see that the view from the courts on the validity of SEF paralle ls that of the courts view of faculty performance evaluation...........Historically, in terms of faculty evaluat ion instruments in general, (on both secondary and postsecondary levels) it is widely agreed by le gal scholars (Baez, Benjamin, and Centra, 1995) that "Despite the subjectivity of measuring t he quality of a faculty member's scholarship, service and teaching accomplishments, courts will rarely, if ever, question the appropriateness of an institution's criteria (or ho w they measure them) for granting reappointment, promotion, or tenure....they will ra rely substitute their judgments for those of peer review committees....Although juries may have less deference" (p.139). It might also be added that courts will seldom question administrati ve judgements of evaluations. It seems that faculty who challenge institutional evaluation tools very rarely succeed. Although the legal "competent and substantial evidence" standard places a significant burden of proof on the educational organization, it has not generally required that faculty assessment instruments are professionally validated (Rebell, 1990; Kaplin and Lee, 1995). Such rulings do, however, appear to vary by state or federal jurisdiction...........Psychometric standards of validity, relia bility, and specific evaluation techniques, are rarely incorporated in state laws, regulations, or common-law standards. Accordingly, cases that involve evaluation have tended to focus on adh erence to specific procedural requirements as set forth in state law or on genera l common-law notions of fairness and due process, not on expert psychometric standards. Alth ough state courts will require strict adherence to the procedural aspects of these requir ements and will strike down an arbitrary failure to use any apparent evaluative criteria, th e state courts tend not to probe the substance of evaluation criteria or methods (Rebell, 1990; Ka plin and Lee, 1995). As Copeland and Murry (1996) have put it, "the judiciary has genera lly behaved as though it believed that
6 of 44evaluations were made only after careful deliberati on and with procedural due process protections. In short, the judiciary has tended to act as if colleges and universities could be trusted to act in good faith" (p.246)...........Rebell (1990) outlines what he describes as a "striking example of the courts' traditional deferential attitude toward teacher eva luation" data (p.337). The decision of the United States Court of Appeals for the Eighth Circu it in Scheelhaase v. Woodbury Central Community School District (1973), involved the dism issal of a teacher whose contract had previously been renewed over a ten-year period. The reason for her termination was that she was incompetent as indicated by the low scores of h er students on the Iowa Test of Basic Skills (ITBS) and the Iowa Test of Educational Deve lopment (ITED). Despite the a number of expert witnesses testifying that it was inapprop riate to use such test scores as a basis for evaluating a teacher performance, the court dismiss ed Scheelhaase's claim. The claims were considered basically irrelevant by the court becaus e "such matters as the competence of teachers and the standards of its measurement" are not matters of constitutional dimension. ..........This early case involving a public school teacher is significant both because (a) of the Court's apparent lack of concern with the serious p sychometric issues raised by a reliance on student achievement scores as a sole stated basis f or termination and (b) because of the Court's almost total reliance on a school administr ator's psychometrically unsubstantiated, and quite possibly equally erroneous evaluation. On e of the concurring Scheelhaase case judges bluntly stated:The Board was entitled to rely upon the recommendat ion of conclusions of its superintendent, not-withstanding the existence of s trong opinions contrary to his regarding the use of the ITBS or ITED tests as a tool of Leac her evaluation...Thus, its decision, even though premised upon an apparently erroneous 'exper t opinion 'cannot be faulted as arbitrary and capricious. The Board's mere mistake in judgment or in weighing the evidence does not demonstrate any violation of subs tantive due process. (Emphasis added).Thus, even when states use student achievement scor es as an index of faculty proficiency, 6 courts have had an "apparent lack of concern with s erious psychometric issues raised by reliance on student achievement scores as a sole st ated basis for termination," again, relying on administrator's unsubstantiated evaluations (Reb ell, 1990). Thus, courts have historically adopted the position that they are not qualified to second guess peer-review committees, at least as long as committees do not act arbitrarily and instruments are consistently and fairly applied (Baez, Benjamin, and Centra, 1995; Kaplin a nd Lee, 1995; Rebell, 1990). Traditionally, notes Rebell, most other courts have tended to take a similar deferential stance in teacher evaluation cases...........There seems to be two exception to this. The first is in discrimination cases. In general, courts have tended to only require precise accuracy in cases where EEOC issues are involved (See below). The second, is in claims of unfair tre atment because of exercise of First Amendment free speech rights, including union-organ izing activities, or allegations of denial of Fourteenth Amendment rights to due process by te nured teachers or others with a reasonable expectation of continued employment will trigger federal court jurisdiction with greater scrutiny of data (Rebell, 1990)..Acceptance of Administrative Subjective and Untrain ed Evaluator Judgements Of SEF Data 7
7 of 44..........An issue directly related to both the rel iance on and statistical accuracy of SEF are views of the court regarding accepting or not accep ting subjective administrative judgements of faculty teaching effectiveness.Summary: With regard to accepting the subjective judgements of administrators evaluation of SEF, the legal reasoning and rulings can be summarized as ranging from: (1) accepting administrative subjective judgements if ( 2) they are deemed sincere (3) grounded on some evidentiary basis (4) if made on t he "vigor and variety of student criticisms" (5) "not arbitrary or capricious and we re exercised honestly upon due consideration," (6) based upon "much experience in reviewing student evaluations, (31) reasonably draw on that experience (7) and have rul ed that Presidents are not bound by factual findings made by majority members of a facu lty.Not only have the courts not traditionally examined faculty evaluations rigorously, they have tended not to require that evaluators be trained in the use, analysis, and interpretation of evaluation instruments. In general, state courts re viewing teacher evaluation practices will not analyze directly the substantive criteria used to e valuate teachers, nor the or qualifications of the raters. (Rebell, 1990). There are exceptions, h owever. ..........Some states, like Florida and Pennsylvani a now mandate such training. Florida specifically mandates school boards to provide trai ning programs to "ensure that all individuals with evaluation responsibilities unders tand the proper use of the assessment criteria and procedures" (Fla. Educ. Code, /sec 231 .29(2). In Pennsylvania (Rebell, 1990), employees must be evaluated "by an approved rating system which shall give due consideration to personality, preparation, techniqu e and pupil reaction in accordance with standards and regulations for such scoring as deter mined by rating cards to be prepared by the Department of Public Education...." (p.345-6). SEF as Social Judgement and Diagnosis..........Given the courts assumptions regarding va lidity and the untrained judgement of those making decisions based on SEF, a part of influencin g the courts is demonstrating relevant research. In the research on social judgement and c linical diagnosis, it is clear that the manner in which nearly all SEF data are analyzed is but a subset of the social judgement and clinical diagnosis literature, involving the same logical an d cognitive bias and distortions that result in the pervasive inaccuracy of social judgement in general and clinical diagnosis in specific. The findings of the judgement research literature a pplies to students making such judgements in evaluating faculty and to those interpreting the results; they are in fact making diagnoses. ..........Psychological research has recognized the severe cognitive problems and limitations of "intuitive," and "experience-informed" everyday jud gements for over thirty years, (Dawes, Faust, and Meehl, 1989; Faust, Guilmette, Hart, Ark es, Fishburne and Davey, 1988; Garb, H. N. 1989; Hayes, 1991; Larkin, McDermott, Simon, and Simon, 1980; Rabinowitz, 1993) yet the mistakes continue in everyday practice situatio ns. Interpretation of SEF are no different. As two authors who consider SEF literature valid (F ranklin & Theall, 1990)---point out:Even given the inherently less than perfect nature of ratings data and the analytical inclinations of academics, the problem of unskilled users, making decisions based on invalid interpretations of ambiguous or frankly bad data, deserves attention. According to Thompson (1988, p. 217) "Bayes Theorem shows that a nything close to an accurate interpretation of the results of imperfect predicto rs is very elusive at the intuitive level. Indeed, empirical studies have shown that persons u nfamiliar with conditional probability
8 of 44are quite poor at doing so (that is, interpreting r atings results) unless the situation is quite simple." It seems likely that the combination of le ss than perfect data with less than perfect users could quickly yield completely unacce ptable practices, unless safeguards were in place to insure that users knew how to reco gnize problems of validity and reliability, understood the inherent limitations of rating data and knew valid procedures for using ratings data in the contexts of summative and formative evaluation. (79-80).The authors conclude by noting, "It is hard to igno re the mounting anecdotal evidence of abuse. Our findings, and the evidence that ratings use is on the increase, taken together, suggest that ratings malpractice, causing harm to i ndividual careers and undermining institutional goals, deserves our attention." (p.79 -80). Recognizing such problems is not methodological nit-picking; they are pragmatic, par adigmatic, and scientifically fundamental. .Variables Affecting Validity Not Taken Into Account When Assessing SEF 8 ..........In conducting any research, it is a given there are a host of variables that affect outcomes. Put in experimental terms, there are a ho st of independent variables that affect the dependent variable (here teaching effectiveness). T he question is, how have courts addressed this crucial issue that impacts so centrally on val idity of SEF data? Instructional Variables..........Legal cases concerned with the validity o f SEF occasionally note various instructional factors that were not controlled in the faculty eva luation process.Summary: The variables noted in the legal cases reviewed in clude, (55) not controlling for class size, i.e., those obtained in small seminars from those obtained in large lecture classes, (56) those obtained from tenured faculty f rom those obtained from non tenured junior faculty, (57) not performing appropriate com parisons of SEF with other faculty, (58) noting SEF in all courses, not just to problem courses, (59) not mistaking student 'response' figures for actual student enrolment fig ures when using them to determine student attraction to a course, (60) using all cour ses taught, (61) taking into consideration faculty teaching a wide range of courses, versus th ose with lighter teaching loads, (62) number of new courses taught in a year, (63) whethe r graduate courses were taught at the same time as teaching undergraduate courses, (64) s electively mentioning only negative student comments, or (65) overly weighting negative comments, and (66) different procedures for gathering student opinion.Courts sometimes weigh these variables heavily, in most cases, however, the courts either ignore them or do not weigh them very heavily in th e total context of a particular case. 9 .Student Biases Variables 10 ..........A significant issue is how courts view st udent biases in assessing the reliability and validity of SEF.
9 of 44Summary: Student bias variables include reactions to (48) a cademically demanding faculty, that (49) thus thwart student expectations (50) difficult examinations (51) tough grading policy, (52) heavy workload in a course. (5 3) While most courts ignore these student biases in SEF, (54) occasionally a court wi ll recognize that difficult courses have to be given to the students and that such material is difficult for even the best teacher to get the material across.In general, however, it is overwhelmingly clear tha t courts seldom take these variable into account, despite the fact that such reactions often function as generalized affective overlays on SEF (see below)..Popularity Variables and Effectiveness 11 ..........A related student variable issue is the e xtent to which SEF measures popularity, not teaching effectiveness. Accordingly, it is instruct ive to see how courts view this issue.Summary: Court rulings range from saying that (9) in cases of exceptional research faculty that popularity should not play a role in terminati on due to teaching, to (10) in normal cases that a measure of popularity is related to te aching effectiveness.While not noted frequently, popularity appears to b e generally assumed to be involved in teaching effectiveness. But again, the courts are m ixed on this issue as well. In terms of the research literature there is little to no support f or popularity being a measure of teaching effectiveness in higher education. 12 .The Courts' Reliance on Both Quantitative Data and Qualitative Comments in SEF 13 Reliance on SEF v. Peer Evaluation..........Is it considered acceptable, for example, to rely heavily or even solely on SEF, or must they be used in conjunction with other evaluative m ethods?Summary: From the cases analyzed, it can be seen that court rulings range from saying that (1) relying primarily or solely on student eva luations is acceptable, to (2) placing little exclusive reliance on SEF, (3) in rare cases SEF ca n not be permitted to stand in the way of promoting or retaining professors who are excell ent in non teaching areas, (4) tenure decisions can not be based solely on SEF by student s who have not been made aware of the ramifications of their evaluations, (5) anonymo us documents or those "based on hearsay" should not be included in a faculty member 's file, (6) students should be made aware of the purpose and ramifications of their eva luations of faculty, (7) anonymous student evaluations should not be used, (8) peer ev aluations must also be a part of evaluating teaching.Again, courts range widely on the exclusiveness or non exclusiveness of SEF, even though
10 of 44books on how to conduct faculty evaluation (by auth ors who basically accept the validity of SEF, e.g., Seldin, 1984; Theall, and Franklin, 1990 ) for some time now have consistently emphasized that SEF should not be used as the only and/or primary method for assessing teaching effectiveness..Numerical Ranking of Faculty 14 ..........An important issue is how the courts view the relative weighting of SEF in administrative decisions of teaching competence. It seems to be common practice to ordinally rank and compare faculty to each other according to average SEF numerical scores.Summary: From the cases reviewed, numerical scores from SEF often result in faculty (22) being compared relative to other faculty, (23) being ranked relative other faculty, (24) with distinctions often being made on the basis of tenths of a decimal, (25) with most courts accepting these fine decimal distinctions.Despite the above overview of the research on the h ighly questionable validity of SEF, institution administrators and the courts continue to make and accept fine numerical distinctions in faculty scores from student evaluat ion questionnaires to ordinally rank faculty. Even given that SEF is valid to a level accounting for 14% of the variance, it is not psychometrically appropriate to accept such ordinal rankings. ..........It should be noted that SEF rate the majo rity of faculty as above average---whatever this means...........Ordinal scales do not tell us if a facult y half way down the scale is only half as good as the top ranked member. Thus without a criterion ref erenced standard, we have no way of knowing if everyone on the scale is an effective te acher, or conversely an ineffective teacher. Moreover, should all faculty who fall below the sta tistical "average" be eliminated? And if so, using the same logic, should we rank order and thereby eliminate all Olympic team members who fall below the team average? If the ans wer is 'yes,' then (a) we eliminate highly functioning athletes, and (b) it leads to an infini te regress where we end with only one or two on any given team. Currently, we have no idea if "s tatistical average" means good, bad, or indifferent teaching in terms of instructional effe ctiveness. .Use of Qualitative Written Student Comments 15 ..........Over and above quantitative data, the use of written comments, often single instances by students on their SEF forms seems wide spread by both educational administrators, faculty evaluation committees, and the courts.Summary: For the use of student comments, court views range s from (33) placing importance on a single comment (34) to several comm ents as significant information, (35) maintaining that statistical analyses of SEF need t o be bolstered by individual comments, (36) maintaining that while some very negative---e. g., racist, sexist---comments may be found, the court may find that they do not render S EF unreliable, (18) that such instances or impressions may be validated after the fact, (37) negative co mments often seem to outweigh positive ones, and (38) may often outweigh numerical data to the contrary, (39) negative comments need not be verified before actin g on them, to (40) that negative
11 of 44comments can not be used to undermine otherwise gen erally favorable comments received in an annual performance review.Clearly the views from the court suggest the legiti macy of not only using what is in fact anecdotal data, but often to raise it above more sy stematic (averaged) data. .Mixed Student Comments 16 ..........Just as quantitative SEF data may be bimo dal, so too written student comments may also be bimodal or mixed. How do courts (indeed, ed ucational administrators, and faculty evaluation committees) view and pronounce on such d ata?Summary: With regard to non numerically assessed written st udent comments, they are often qualitatively characterized as (41) a few wer e ambivalent, (42) a considerable number, (43) of mixed result, and selectively recog nized: (44) it would only be fair to add that there were a number of comments in favor, (45) there were also some negative comments, (45) sometimes placing the greater weight on past evaluations of teaching over current comments, (47) sometimes placing greater we ight on current comments over past positive evaluation of teaching.Again, with regard to single and mixed comments on SEF, the courts (administration, and faculty evaluation committees, See Appendix) tend t o weigh them far above their non representative and anecdotal-data value...........It seems to be generally assumed by most faculty and administrators that SEF are used by virtually all schools in the U.S. It is further assumed by many that SEF is necessary for both faculty evaluation of teaching effectiveness a nd thus for quality control of student learning. While its use is clearly wide spread (see Seldin, 1984; Crumbley, and Fliedner, 1995) in the U.S., and is increasing in Europe (Hus bands, and Fosh, 1993), what is not generally recognized is that there are schools that preclude its use in salary, promotion and tenure decision either totally, or in part, by prec luding the use of qualitative students comments. 17 .Transcendent Value of a Professor Over Teaching Qua lity 18 ..........Despite the importance placed on teaching there is precedent for both school policy and the courts---under certain conditions---to igno re poor teaching as indicted by SEF.Summary: (11) The courts and educational administrations ca n not allow low SEF to stand in the way of promoting or retaining professors who may be world renowned scientists, (12) deemed nationally or internationally exception al as a researcher, courts may nevertheless disregard SEF, (13) at least in these two cases the courts did not find the faculty exceptional. It would be interesting to see if what the court seems to accept in principle exists in fact.The above collective categories abstracted from cou rt cases are illustrated by a denial of tenure case described in the Appendix below, by a ( non litigated) case that contains an interesting difference from most of the cases revie wed here.
12 of 44Procedural, Burden of Proof, and Policy-Decision Cr iteria in Assessment of SEF..........Other overlooked issues involving SEF and its validity are the problems of (a) content versus process, i.e., whether the assessment of SEF data constitutes a process or procedural issue or (b) is simply a content issue. Validity Assessment of SEF as Procedural or Process Issue ..........An exemplar of the content and the proced ural/process distinction is often exhibited between trial and appellate courts. The latter ofte n only judge if correct procedural/due process was followed by a lower court. The content v procedural/due process distinction is typically used by college campus grievance committe es. When a tenure committee, for example, renders an unacceptable decision, a facult y member may challenge the decision. A grievance or appeal committee then may review the d ecision only in terms of if the correct process or procedures by which the decision was mad e was followed. The point here is that many such appeals committees do not define looking at the procedures by which SEF data were gathered and analyzed by a tenure committee or administrative evaluator as procedure/due process (e.g., whether the tenure com mittee just 'eye balled' the data and student comments, whether they compared the data to other similar faculty SEF, etc.), but as content and therefore not within its purview. Griev ance committees often therefore will not review the substantive content of SEF data on groun ds that it is not a procedural or process issue...........In general, given the courts tendency to accept the validity of SEF data, at least by default, how SEF data are assessed and used is ofte n considered to not be a process/procedural issue. At least one court has, h owever, considered how SEF data is assessed and used as procedural. This is evidenced in Christopher Turner v. The President of the University of British Columbia (1993), where it was stated thatthe Dean said, "there were few students in undergra duate literature courses since 1986/7---(3,8, and 6 respectively," thus mistaking student 'response' figures for actual student enrolment. The Board concluded that (5) "Th is misunderstanding is in our opinion sufficient in itself for a reconsideration, since t eaching was the focus..." (p.3), and (7) "we think that the comments and emphasis on the size of Dr. Turner's classes as evidence of poor teaching are open to objection and constitute errors of procedure and/or evidence (p.6). [italics added]As noted above, however, it appears that most court s, and indeed, perhaps most faculty grievance committees (See Appendix below) have not considered how SEF data is analyzed as a procedure/due process issue. The issue of the validity of SEF, then, would appear to have legal "due process" implications. Decision Criteria and the Scientific Precautionary Principle ..........Since SEF has haphazardly evolved along w ith a general acceptance of its validity as an appropriate measure of faculty teaching effectiv eness, the burden of proof somehow has been placed on faculty-as-challengers of such data to scientifically prove that SEF data is not
13 of 44valid--a strange state of affairs, at least in sc ience. And the standard of proof required has been typically high. In effect, faculty are guilty until proven innocent. So the process that exists is: Either (a) a legal abdication of the assessment of SEF by the court, relying on the good faith evaluation of SEF data by the institution, or (b) the court simply assuming its validly. 1. Placing the burden of proof on faculty who challeng e the data of demonstrating with scientific levels of certainty (statistical signifi cance or confidence level) that the data is not valid. 2. ..........Given---at the very least---the controver sial assessment of the validity level of SEF in measuring teaching effectiveness, in terms of decis ions and policy perhaps we should err on the side of caution in applying such data for admin istrative purposes. In the field of environmental science, Lemons (1996) and Lemons, Sh rader-Frechette and Cranor (in press) have suggested a Precautionary Principle when makin g policy decisions. In essence, this principle says that when making policy decisions ab out environmental harm, given (a) a certain level of possible harm, (b) the complexity/ uncertainty of data, and (c) the high level of proof (typically a 95 per cent confidence level) re quired for a scientific finding to be accepted by scientists, setting policy should not be based o n this level of scientific proof. The reason is this: To wait for such a confidence level may be to o risky given the level of harm that may be indicated (by the existence of data with a lesser c onfidence level suggests). In short, using scientific criteria that have been adopted for doin g science may often not be appropriate criteria for making policy decisions...........The reasoning surrounding the Precautiona ry Principle is too complex to fully delineate here. The reader is referred to the citations. In t he meantime consider the following analogy that in broad outline exemplifies the spirit of the Precautionary Principle: A dangerous tiger has escaped from a local zoo a few miles from your house. In the back of your house is a wooded area. Your child wants to go out and play in the woods. No one has actually seen the tiger in the woods or anywhere else around the neig hborhood. In other words, there is no scientific level of evidence that the tiger is anyw here around, or that your child would be in immediate danger by playing in the woods. Do you le t your child out to play in the woods? In most areas of science, the rule is to avoid type -I error---asserting there is an effect when there is none, and therefore place the burden on th ose who postulate an effect rather than on those who postulate no effect---and not so much con cerned with avoiding type-II error---asserting no effect when there is one. In a dopting SEF data as indicating teaching effectiveness administrators, faculty evaluation co mmittees and the court have engaged in type-I error---given both the level and burden of p roof. ..........Now there are two implications for the Pr ecautionary Principle as applied to SEF in relation to faculty and instructional quality. Firs t, given (a) the haphazard way SEF have been introduced and accepted by the courts (b) the level of possible harm of accepting SEF for administrative purposes of salary, promotion, denia l of tenure or non reappointment, to that faculty and more importantly (c) the effects of SEF used for such purposes has on the quality standards of higher education (see Haskell, 1997a) should such a burden of proof be demanded by the court of faculty challenging SEF da ta? Certainly, as shown below in disparate treatment and disparate impact cases, a k ind of Precautionary Principle is already in effect. Second, given the at least clearly conflict ing evidence of whether SEF demonstrates teaching effectiveness of a faculty, should not adm inistrators and faculty evaluation committees apply, for the same reasons, a similar P recautionary Principle stance?
14 of 44The Court's Approach to Validity of SEF in Relation to the Principles of Disparate Treatment and Disparate Impact 19 ..........Given the above findings on how the court s have tended to treat SEF validity issues, I would now like to further look at the implications. Federal courts---and to a lesser degree state courts---have adopted a more stringent approa ch to testing teacher evaluation cases, at least regarding primary and secondary teachers. Acc ording to Rebell, the four main reasons for this change are (1) the wider use of more strin gent evaluation techniques by institutions, which largely stem from legislative reform initiati ves that have led to an increased number of denials of teacher certification and terminations, (2) a disproportionate number of these certifications and termination involve members of m inority groups, (3) legal developments have broadened the jurisdiction of the federal cour ts to consider issues of social reform, and (4) judges' own increased experience in assessing p sychometric techniques in employment discrimination cases. It is perhaps 2 and 3, howeve r, that have had the most impact on the courts (Kaplin and Lee, 1995; Rebell, 1990)...........Educational reform issues from desegregat ion, special education, and other school-based litigations, has made the courts more experienced and more inclined to scrutinize educational testing requirements. As the consequence of federal Equal Employment Opportunity Commission (EEOC) criteria, in today's civil rights climate, courts are more likely to scrutinize the validity of the f aculty evaluation instrument, especially in terms of racial, gender, and age discrimination. Disparate Impact..........In regard to teacher evaluation in genera l in cases involving claims of discrimination under the equal protection clause of the Fourteenth Amendment, or under the anti discrimination statutes enacted to protect members of racial and ethnic minorities, women, handicap conditions, age, and other protected group s scrutiny of the case tends to be more probing and stringent. Such cases are of two basic kinds: (1) those involving discriminatory intent, called disparate treatment claims, and (2) those involving no intent, called disparate impact claims (see Kaplin and Lee, 1995, section 18.104.22.168. ). ..........Disparate impact claims in personnel eval uation is the use of assessment procedures that are facially (on their surface, or methodologi cally) neutral in their treatment of different groups, but which produce evaluation outcomes that inadvertently fall more harshly on one group than on another. Thus, proof of a discriminat ory motive is not necessary to establish a disparate impact claim. To establish a prima facie case of such adverse impact, a minority need only show a causal connection between the faci ally neutral employment practice and the disproportionate negative or adverse effects on him or herself as a member of a protected group. For example, a university tenure process may be found to discriminate against females because the evaluation process or evaluation criter ia favors male faculty more than female. In such cases, rigorous statistical analysis is typica lly used to establish disparate impact. ..........Discriminatory treatment and disparate im pact claims has made courts more inclined to specifically analyze educational testing instrument s for validity, and this increased involvement by the courts is predicted to increase. As of 1990, 41 states have mandated some form of standardized testing requirements as part o f their teacher certification process Because many of these exams are claimed to have a d isproportionate negative impact on minority candidates, competency tests have triggere d a number of large scale federal class
15 of 44suits. Again judge Rebell (1990) notes,In June 1988, the United States Supreme Court issue d a ruling which is likely to accelerate the trend toward increased judicial involvement in teacher evaluation matters. That case, Watson v. Fort Worth Bank and Trust (1988), extended to judgmental employment practices the Court's 1971 holding in Criggs v. Duk e Power Company (1971) that standardized employment tests having a disparate impact on minorities must be shown to be job-related. Although the Court's ruling in Wats on was unanimous, there was substantial disagreement among the Justices as to h ow closely courts should scrutinize particular practices and validation techniques Whatever the precise standard of review ultimately implemented, there is little doubt that the federal courts will be more likely to scrutinize nonobjective evaluation procedures as a result of Watson (p.339).Thus any instrument or evaluation criteria that in effect places an unfair burden on those being evaluated has been judged to exhibit what is legally termed disparate impact ..........The present point is that while the court s have not, and continue to not rigorously scrutinize SEF, they have for sometime now applied fairly rigorous standards to evaluations both in the workplace and in academia to cases invo lving discrimination of protected groups, whether the discrimination is purposeful, or by dis parate impact. Not every indication of racism, however, may be considered by a court to be proof of discrimination. ..........For example, In Yu Chuen Wei v. Vermont State Colleges Faculty Fede ration (1995), the Labor Relations Board said, "with respect to co mments that while some students had written that she was a "slant eyed bitch," and that she should "go back to China....We also are not persuaded that the racism evident in the studen t evaluations of Grievant made student evaluation results unreliable. The percentage of ev aluations in which racism by students was evident was approximately one percent of the total evaluations" (p.306). Assuming some level of covert racism, how does one disentangle th e generalized affective racist and sexist overlay of students evaluation on a total questionn aire? 20 The Disparate Treatment and Impact Principles Gener alized ..........In the evolution of any legal policy or p rinciple its extension often occurs by generalization or analogical transfer, extending a principle thought to apply to only one area to other areas (See for example, Anderson and Schad ewald, 1991; Golding, 1984; Levi, 1949; Marchant, Robinson, Sunstein, 1993). Currently both corporate and academic cases of straight forward discrimination and the more inadve rtent discrimination cases based on disparate impact often trigger the courts to rigoro usly scrutinize the methodology and statistical data of such evaluations not typically accorded to non discrimination cases. Presumably, in discrimination cases the court's int erest is in establishing validity and using rigorous statistical methods in ascertaining the "t ruth." If this is the case, then by clear logical implication and inference---as we have seen---in ge neric cases of evaluation the court could be said not to be in the truth business. As documen ted above, in non discriminatory cases, courts have assumed the "truth" lay in the appropri ateness of an institution's criteria and rarely would substitute their judgments for those o f peer review committees, adopting the position that they are not qualified to second gues s peer-review committees, at least as long as committees do not act arbitrarily and instrument s are consistently and fairly applied. The burden of proof is on the faculty challenging an in stitutional decision. Some courts have only been concerned with consistency and fairness of app lication, even if the methods of evaluation are clearly defective. 21 Generally, however, the courts have acted as thoug h they believed that institutional evaluations were made o nly after careful deliberation and with
16 of 44procedural due process protections...........The question is: why not make the same as sumptions regarding discrimination? The answer is that, understandably, the courts have acc epted that there has existed a widespread conscious and non conscious ethnic, gender, age r eligious belief, sexual orientation, and handicap bias in society, such that they can not si mply rely on the "truth" or good faith behavior of an institution or its data. Given this, the argument is made that herein lies the distinction, and reason, for treating non discrimin atory cases differently from cases where either discrimination has been charged (treatment c laim) or where discrimination has be inadvertent (disparate impact claim)...........Thus the courts have tended to accept the judgement and "good faith" motivations of organizations. Unlike in the past, however, just as the data are in regarding discrimination of protected groups in academia, so too the data are n ow sufficiently in to cast serious doubt on the courts assumption of "truth" residing in corpor ate and academic data on discrimination, so too is it in on (a) the questionable validity of SEF, (b) the internal politics of administration and faculty relations which can revo lve around student retention and unpopular ideas, and (c) the economic pressures on institution to not tenure faculty and to sometimes terminate tenured faculty, all of which c an have serious contaminating consequences institutional decisions...........The importance of this is that while cour ts have scrutinized SEF for evidence when civil rights discrimination has been questioned or suspected, they have not applied the same rigor to the validity of evaluation instruments or have held as suspect other institutional biasing variables. The courts continue to assume a kind of pre 1960s academic Camelot. If such a round table of academic knights ever did his torically exist or was merely mythical, it certainly now exist only in myth...........One compendium of legal findings in highe r education specifically notes SEF and recognizes the accepted application of the principl es of disparate treatment and impact along racial and gender lines in SEF. It should noted tha t the disparate treatment and impact issues when applied to SEF, is of course, no different tha n any other disparate impact case, except that the student evaluation data in such cases will be scrutinized by the courts. The authors (Baez and Centra, 1995) suggests that the SEF resea rch in the area of race and gender discrimination, has been inconsistent, and suggest that while deserving of more attention, the inconsistency of the research makes it unlikely tha t the courts will sustain such a claim. Some courts, however, have found in favor of faculty in such cases. For example,In Cynthia J. Fisher v. Vassar College (1995), after a bench trial, the district court fo und that, in denying Fisher tenure, Vassar had discrimi nated against her by reason of (a) her sex in violation of Title VII of the Civil Rights A ct of 1964, (b) her age in violation of the Age Discrimination in Employment Act. The court fou nd that the termination of Fisher's employment resulted not from any inadequacy of her performance, qualifications, or service, but rather from pretextual and bad faith e valuation of her qualifications. Scrutinizing Vassar's report on Fishers teaching ab ility which included reviews of her student evaluations that were said to reflect "cons istent problems with clarity and her ability to illuminate difficult material" but which were otherwise generally positive. The district court found that the Vassar's biology depa rtment had distorted her teaching recommendations by "selectively exclud[ing] favorab le ratings," by selectively "focus[ing] on the two courses in which she had difficulties" a nd by "applying different standards to her than were applied to other tenure candidates" ( Id. at 1209). The court further observed that "the males tenured while Dr. Fisher was on the faculty were praised for their fine teaching while Dr. Fisher was criticized, although the facts on which the Committee's determinations were based (student evaluations, Bio logy Majors Reports and [Student Advisory Committee] reports) revealed that Dr. Fish er's evaluations were superior to theirs" ( Id. at 1211). The court noted that statistical analysi s may be a part of a plaintiff's
17 of 44effort to establish discrimination under a theory o f disparate treatment.The point here is that if this had not been a dispa rate treatment discrimination case the biases and distortions of data about her teaching student evaluations would likely have gone un examined. 22 ..........It would seem, then, that this discrepanc y in the discrimination-based search for "truth" should be used as---and provide justification for--a kind of generalized disparate impact principle to legally invoke or generalize a fairnes s principle that the same rigor be applied to non civil rights cases such as SEF. As Kaplin point s out, however, current law generally prohibits courts from such generalization. 23 So the issue of change apparently becomes not so much one for the court as it is a policy issue f or both higher educational administration to use as a guideline and for legislatures to legislat e change. 24 ..........Age discrimination in SEF is another poss ible bridge in this potential extension of disparate impact. The Age Discrimination in Employm ent Act of 1967 (ADEA) requires employers to evaluate persons on their qualificatio ns or ability to competently perform their job, and not on the basis of age. Like any other em ployer, colleges and universities are likewise prohibited from considering a faculty memb er's age in making decisions about employment, salary increases, promotion, tenure, an d retention. Yet, there is evidence that SEF do discriminate on the basis of age, with older faculty receiving lower student ratings (Feldman, 1983). There are a host of other variable s like class size, or teaching a courses within a student's major as opposed to elective cou rse, or teaching freshman v.s upper level students, that also make a kind of default "dispara te impact" if such variables are not controlled in the analysis of SEF data...........What is being suggested here is that in t he interest of justice, equity, truth, and in "fact finding," the courts and institutions should scruti nize all SEF data as rigorously as they do disparate treatment and disparate impact cases. Cur rently data and conclusions from SEF are seldom scrutinized (as indeed are other issues in t he denial of tenure or promotion not equally scrutinized) as they are in discrimination and disp arate impact cases. Justice, however, is not only blind to ethnic, gender, age, sexual orientati on, religious belief, and handicap status, it is blind to institutional economic pressures and other biasing variables within academic institutions. Thus, biases and distortions of the S EF data are not revealed in non discrimination cases as they are in disparate treat ment and disparate impact cases. As a consequence, in terms of revealing unfair attributi ons based on SEF data, those covered under EEOC guidelines have a "truth finding" advantage ov er those who are not covered. Beyond Statistical Significance of SEF Research..........Having reviewed SEF cases and examined th e significance of validity, I would now like t to turn the issue of validity on its head. U nderlying statistical research on SEF that attempts to establish its validity is a complex of contextual variables and assumptions seldom addressed. 25 In this section, I will address some of these cont extual variables and assumptions that I suggest cut through and render t he best of statistical research on SEF showing teaching effectiveness nearly irrelevant. U nderstanding is not acquired by statistical significance alone. Certainly showing statistical v alidity of SEF is a necessary condition, but it is not a sufficient condition for understanding their meaning and for its use in administrative decisions. It is an understanding of these contexts and assumptions that underlie statistical validity research on SEF that educational policy-makers and the courts need to think long and hard about accepting SEF for assessing instructional competence and using it for promotion, tenure, and reappointment d ecisions. 26
18 of 44 Assumption # 1: Statistical Significance of Indicat ors of Teaching Effectiveness ..........An assumption underlying statistical anal yses of SEF is that we know what the indicators of effective teaching are. To my knowled ge, the research does not support this assumption. What makes us so sure that many of the questions we ask on SEF questionnaires are all that related to effective student learning. Consider, for example, the typical question "Was your instructor organized?" This question in t urn entails a myriad of assumptions and conditions about effective teaching. Would Socrates for example, be perceived as organized by most students---being peripatetic and just askin g a lot of questions? And what makes us think-----at least for some students and some kinds of subject matter---that just going into class, being Socratic, asking a lot of provocative questions, and confronting students by challenging their belief systems may not be the mos t effective instructional and learning method in the long run to get students engaged and to think critically? 27 What evidence is there that either being perceived as organized or a ctually being organized is a necessary condition for effective instruction? I know of no r igorous supporting evidence. Indeed, many of my friends in the humanities, much to the dismay of my behaviorist and cognitive colleagues---and sometimes myself---would suggest t hat systematic and sequentially structured teaching methods are simply structural a nalogues of our technological society (see, for example the classic by Jacques Ellul (1964)...........Consider, too, a question that, while it is not directly asked on SEF questionnaires is implied in other questions in various forms, inquir ing "Does your instructor mainly lecture?" Though there is precious little rigorous evidence s howing that lecturing is inherently an ineffective teaching method, it is clearly persona non grata among many educational theorist. Lecturing is "out" while collaborative learning is "in"---but apparently not so considered by many faculty (for both valid and invalid reasons)...........While I happen to agree that being organi zed is generally good, and that collaborative learning is perhaps good for certain student popula tions, subject matters and desired outcomes, the question is: are they appropriate ind icators of effective teaching applied to individual faculty as claimed? The answer to this q uestion is they are not appropriate indicators of effective teaching applied to individ ual faculty---and this applies even if the statistical research strongly supported the claim. This is an important point that, as I recall, is addressed in the faculty evaluation literature only by Scriven (1988). I will quote Scriven at some length...........Scriven observes that in the attempt to r ender teacher evaluation more scientific the field rushed into focusing on research-based indica tors, teaching indicators which sound research supposedly demonstrated are positively cor related with successful student learning. These indicators orPopular envies are structured presentations, active involvement, emphasis on positive reinforcement, high eye contact, high frequency of question asking, provision of learning objectives, frequent feedback, use of multi-media ( p.4)....the provision of a brief outline of topics to be covered in a day's lesson can be justi fied on administrative grounds, since substitute teachers must get some guidance; but the requirement that anything like that be provided to students, for pedagogical reasons--a cl aim often said to be supported by research---cannot be justified. The use of instruct ional objectives or any other kind of advance organizer is simply a characteristic of one s style of teaching, not a duty of the teacher. Nor can such an outline be required as evi dence of preparation (arguably a duty), since a teacher using a textbook--or for that matte r, memory--may do as well or better than one with lengthy lesson plans listing activities an d testing procedures (p.7-8).
19 of 44The presence or absence of these factors, says Scri ven, defines a style of teaching. He maintains that any reference to a 'teaching style' in teacher evaluation is not valid, regardless of whether there exists a research basis for thinki ng the style is correlated with teaching effectiveness. 28 He goes on to explain:A major source of confusion in discussing the use o f indicators is that the research is often presented as showing that 'the best way to teach' i s by using high eye contact (or whatever), whereas all it really shows is that ther e a slight tendency for better teachers to exhibit this characteristic, for reasons which migh t include the fact that they were taught to use it, although in fact it's not a help at all. The reader is seduced by the relative plausibility of the style recommendations, whereas you'd never buy the idea of using eye color or skin color. But plausibility isn't necessi ty, and absent necessity, you're just a stylist Our kids don't need stylists, they need goo d teachers; and if you can't distinguish the two, you're in the wrong business (p.7). 29 Scriven is not denying the validity of statistical inference. Useful information is contained in a statistical correlation, and there are circumstan ces in which that information can be put to good use. It can even be put to good use in making decisions about people---but only when no better data is available because of limitations on time or resources. 30 Scriven maintains that such teaching effectiveness indicators are inv alidfor essentially the same kind of reason that the ev aluation of personnel by the color of their skin or their church affiliation is necessari ly invalid. While it is true that much racial prejudice, sexism, etc., is based on false beliefs about the groups discriminated against, the essential flaw in it goes deeper than that. The essential flaw is that even if women in general are less strong than men, you shouldn't. us e gender to discriminate against a particular candidate for a position as a luggage-ha ndler, but only a job-related strength test or series of observations in a trial period on the job. And this is nor just for ethical/legal reasons, but also for scientific reasons and reason s of efficiency (p.4)....Which means you can't discriminate against a teacher on the grounds that s/he exhibits some approach to teaching that research has shown is less likely to be successful. Whites are statistically less likely to be good basketball players than blacks, b ut you can't kick the whites off the squad the day you discover that the statistics are worse than you thought. nor would you be any good as a coach is you used skin color as a criteri on for selection. You have to look at the individual's success, not at the success of groups to which the individual belongs (p.4).Finally, Scriven suggests a reason for the almost t otal disregard of the validity of SEF by the courts documented in this paper (and my previous pa per, Haskell, 1977b). He understands the implications for courts recognizing the fallacy of such indicators: He says, The current fallacy of using such statistical-indicators are,"as certai n to crash in the courts---eventually---as the most blatantly racist hiring practices. We may have only a short breathing space before the courts and defense attorneys begin to see the under lying similarity of these two approaches....The consequences for states and distr icts will be chaotic; old decisions may be reversed on appeal, huge damages may be awarded, th ose hearings will clog the system, and there will be no legitimate process to take the pla ce of the illicit one (it is because of this potentiality for disaster that we are giving a long er than-usual treatment of the issue here) (p.5). Assumption # 2: Statistical Significance of SEF of Teaching Effectiveness Measures Appropriate Learning
20 of 44..........An assumption that is virtually unnoted i n the literature is that given SEF is eventually found to measure teaching effectiveness---and this "given" is only for the sake of the current argument---it is assumed that what is thereby being measured is appropriate learning. This assumption is arguably incorrect for at least two r easons. I say it is arguably incorrect, as whether the assumption is correct or not depends on other differing assumptions about higher education...........First, let us not fool ourselves into thi nking that we know what effective teaching is for all populations of students and subject matters. Th ere is no shortage of possible indicators of effective instruction and learning, but most are no t articulated within an adequate theory of effective instruction or learning. At the very leas t, "effective" is relative to a given student population. And when referring to teaching effectiv eness are we referring to measuring short term or long term learning? 31 ..........In addition, as Abrami (1989) and others (see Cohen, 1983) have suggested, most studies on the relationships between student rating s and instructor-generated student learning have been done with learning outcomes collected lar gely from freshman classes, and---more importantly---learning at the lowest level of Bloom 's taxonomy. Similarly, the literature on transfer of learning shows that when student transf er of learning is found, it reflects the lowest level of concrete transfer. So even if we ar e effective in achieving this level of effectiveness, what have we achieved? This brings m e my main point. ..........I suggest that teaching effectiveness and appropriate learning in higher education are two different logical and empirical entities. I sha ll now address these two differing assumptions together. If the data showing (a) stude nt level of unpreparedness, (b) student ability level as measured by most national tests, ( c) unrealistic student expectations about learning, (d) grading, (e) feeling of entitlement, (f) motivation level, (g) good faith motivation for evaluating faculty, (h) maturity level, and (i) hours spent studying have been either in decline for years, or have become increasingly inap propriate is accepted, then effectiveness in teaching most of these students does not necessaril y---and most likely does not---mean appropriate learning. For purposes of clarity (and at some risk of seeming not only insensitive, but as a right wing radical, which I a ssure the reader I am not), let me demonstrate why teaching effectiveness is separate from appropriate learning by using what may be considered an extreme scenario as an example : Suppose that the American Disabilities Act as applied to higher education is amended to include having to admit the mentally retarded, thus requiring making whatever i nstructional adjustments need to accommodate their disability........... Now assume that such adjustments are mad e, e.g., speaking slower, simplifying and otherwise decreasing the amount of content to be ma stered, along with the depth of understanding and critical thinking. In addition, a ssume that if such adjustments and other classroom behaviors that were once apporpriate for a previous level of student are not accommodated and that this is reflected in low SEF score. Now assume that because of pressures such adjustments have been made and that SEF findings for those teaching the disabled students unequivocally shows teaching-effe ctiveness. The question then becomes: is this appropriate learning for a higher education co urse? 32 ..........Most will likely respond to this question with a resounding "no." Some, on the first assumption noted above may say "yes." Some will res pond by maintaining the above scenario is extreme and inappropriate. The fact is, however, that this scenario is simply a quantitative extension not qualitatively different from what has been occurring in the lowering of admission and course requirement standards that has been occurring for some time. So the question now becomes, not simply teaching effective ness but teaching effectiveness at what level of learning, and by implication, academic sta ndards. This is an issue that needs to be
21 of 44addressed nationally by faculty. Being well versed in logic as well as statistics, Scriven (1988), of course, understands this. In a similar c ontext he notes,It's not even true that 'it all boils down to how m uch the students learn from the teacher': if it did, the teachers of mentally-retarded students would automatically be the worst teachers. In fact. they are often much better teach ers than those teaching smart students, because smart students survive bad teaching better. (How many of the research studies naively treated "amount learned" as the criterion a gainst which they "validated" the indicators?) p.7And herein lies the ghost in the machine of most st atistical validation studies of SEF---at their very best: There is nothing wrong with the st atistics only with the meaning of what they are purportedly measuring. Thus the problem is not a flaw in the data or the measurement instrument, but a flaw in the measurer...........Finally, to conclude this section, the im plications for SEF in general and for the issue of validity seem clear. The issue of validity of SE F, then, is not the primary issue it appears to be, and serves inadvertently to hide the significan t issue of academic standards. 33 I will address this issue in relation to academic freedom and academic standards in more detail in my final paper. 34 Conclusion..........From most of the above cases---even given that, as challengers, the burden of proof has been on faculty --it seems clear that the courts have not been kind to faculty with regard to student evaluations. 35 Some clearly see the courts various involvements i n academic matters as detrimental to academic freedom. Arguably, rulin gs do often seem to shape it in inappropriate---and not so arguably---inconsistent and contradictory ways. "It is not clear, however," suggests Rebell (1990b), "that increased judicial involvement will have such a detrimental impact. In some measurement situations, courts have exhibited a sophisticated understanding of the complex judgmental factors at stake, and their insistence on thorough-going implementation of improved, fairer a ssessment devices has enhanced, rather than impeded, the development of professional stand ards" (p.340). He goes on to point out that, "because the state of the art concerning teac her-evaluation practices is at a sensitive developmental stage, extensive court intervention a t this point can substantially influence---for better or worse---the future direct ion of basic practice in the field" (Rebell, 1990b, p.344). Thus whether increased judicial inte rvention in faculty matters will have a positive or a negative impact on professional evalu ation practice depends on providing the courts with appropriate psychometric data and other scientific procedures. ..........Given the above rulings and the courts pr opensity to accept faculty/institutional agreements, it would seem as Kaplin and Lee advise, regarding academic freedom that "it is especially crucial for institutions to develop thei r own guidelines on academic freedom and to have internal systems for protecting academic freed om in accordance with institutional policy" (p. 192) would be especially true for a det ailed SEF policy, especially including how the data is to be assessed...........The fourth and final paper will address t he implications of court reasoning and rulings for academic freedom, standards, and instructional decisions.
22 of 44 Notes1. Address correspondence to: Robert E. Haskell, P h.D., Professor of Psychology, Department of Social and Behavioral Sciences, Unive rsity of New England, Biddeford, Me. 04005. Email: firstname.lastname@example.org I would like to thank Professor John Damron, of Douglas College for continually providing me with s ources, support, and advice, and especially Professor William A. Kaplin, School of L aw, Catholic University of America for his invaluable legal counsel and for reading a draf t of this paper. Interpretive liberties with the legal material and any other problems and omitt ed legal nuances are my responsibilities. [BACK to document] 2. As with my second paper (Haskell, 1997b), the f ocus here will be delimited to how the courts reviewed have addressed SEF issues within va rious legal challenges to the denial of academic freedom, tenure, promotion, and reappointm ent by institutions of higher education. There are multiple legal variables that define an a ction or influence an outcome in a particular case. Among them are the statutes or other sources of law being applied, the cause of action being asserted, the prescribed prima facie case, th e allocation of burdens of proof, and the standards of judicial review (see, e.g., Kaplin and Lee, 1995, section 1.3 & section 22.214.171.124). For my purposes here, I will not be concerned with these variables. Accordingly, this paper will neither be concerned with the outcome of the l egal rulings, nor with the complex legal reasoning on which the rulings were based. My purpo se is to review the general reasoning of the courts on SEF from a "reasonable man" standard and from a policy point of view. [BACK to document] 3. To the layman, legal rulings regarding SEF are a veritable thicket, often seeming that the use of context to differentiate one apparently simi lar case from another functions as a kind of ad hoc carte blanche to justify preconceptions and positions. [BACK to document] 4. A largely neglected---or ignored---important fu nction of education is its social function. Education is not just for the benefit of the indivi dual but for the benefit of society. Like it or not, we in higher education have accepted the socia l function of certifying competence of our students entering into an increasingly complex worl d. The certifying function has become especially important since the introduction of voca tional programs into university curricula. [BACK to document] 5...........In Johnson v. University of Pittsburgh (1977), the court said (7) "We have repeatedly approved the use of statistical proof where it reac hed proportions comparable to those in this case to establish a prima facie case of racial disc rimination in jury selection cases . Statistics are equally competent in proving employment discrim ination. We caution only that statistics are not irrefutable. They come in an infinite varie ty and, like any other kind of evidence they may be rebutted. In short, their usefulness depends on all of the surrounding facts and circumstances" (8) The court further said in Footno te # 20: "Considerations such as small sample size may of course detract from the value of such evidence" (p.1361). 5...........In Peters v. Middlebury College (1977), it was maintained that (5) "A professor's value depends upon his creativity, his rapport with students and colleagues, his teaching ability, and numerous other intangible qualities wh ich cannot be measured by objective standards" (p.860).5...........In Fields V. Clark University (1987), the court noted that (10) Fields' "attacks the university's use of her student evaluations because they were not gathered and evaluated
23 of 44according to accepted standards of scientific polli ng procedures. In response, the court agreed, saying, "She is probably correct. The use m ade of the student evaluations in her case, however, followed the practice at the defendant's u niversity in other tenure decisions" (p.671).5...........In Cynthia J. Fisher v. Vassar College (1995), the court noted that (7) "statistical analyses may be a part of a plaintiff's effort to e stablish discriminatory treatment" (p.1209). 5...........In Yu Chuen Wei and the Vermont State Colleges Faculty Federation (1995), the court ruled that (4) "The Court need not consider t he accuracy of these administrative determinations, and that (24) tenure criteria "are not drawn with mathematical nicety." The board further ruled that (25) "the Dean and the Pre sident, both reviewed Grievant's student evaluations carefully. Their failure to take it a s tep further, and perform a statistical comparison of Grievant's student evaluations with t hose of other faculty members who have been granted tenure was not arbitrary and was reaso nable; (26) Such a comparison is nowhere required by the Contract, [and] (27) we decline to hold such an involved comparison is necessary before a reasonable tenure determination can be made" (p.311). 5...........In Dr. Brian Maclean v. President of The University of British Columbia (1991), the court concluded (38) "that the instrument was not p erfect, that it had flaws, and that the very limited number of samples (because of the very limi ted number of courses and students surveyed over the period) impaired its reliability. (p.30). (39) "However, we accept the evidence of Dr. [X] that the instrument has some va lue, directed toward the specified factors. The court noted that (28) "One problem with the que stionnaire is that it solicits bad points as well as good points. Despite that caveat, we conclu de that the inclusion of the qualitative comments was not a significant error" (p.32).5...........In Robert Kramer v. The President of the University of British Columbia (1992), the Board said (19) Given certain Departmental procedur es, "there is a danger that some negative class commentary will dominate the discussion and w ill not be the 'independent' opinion of all of the students. (20) This is especially true i n the context of the direction to assess "effectiveness" versus "popularity" (p.10). They fu rther noted, (18) Given that "There was no peer review at all; no member of the Department aud ited any of Dr. Kramer's lectures. There was, therefore, nothing to guide the Department but the student comments," and "no way to test the accuracy or fairness of the undoubtedly di sturbing comments in Asian Studies" (p.10).5...........In University of Regina Faculty Association v. Univers ity of Regina (1993), The Board argued (6) that "the University was under an obligation to verify negative comments before acting on them" (p.4). 5...........In Christopher Turner v. The President of the Universi ty of British Columbia (1993), the Board said, (7) "while not ignoring some studen t unhappiness with Dr. Turner's teaching style, we think that the comments and emphasis on t he size of Dr. Turner's classes as evidence of poor teaching are open to objection and constitute errors of procedure and/or evidence" (p.6). [BACK to document] 6. This is an important area but will not be dealt with here because student achievement scores as a measure of teaching effectiveness is al most exclusively used on the secondary level of education. [BACK to document] 7...........In Dyson v. Lavery (1976), the court found that despite questionable errors it concluded that administrative judgements were accep table because, "they were sincere and grounded on some evidentiary basis" (p.111); and (5 ) "In the absence of a finding that same were sexually motivated, the administration's profe ssional judgment must be respected"
24 of 44(p.111 all italics added).7...........In William Sypher v. Vermont State Colleges Faculty Fe deration (1982), (7) sufficient evidence exists from which the Dean and President could have reasonably concluded Sypher was not above average in his teach ing effectiveness; (8) the Board went on to say that if they adopted the Colleges' view that Sypher was not reappointed because of his teaching effectiveness, no argument advanced by him defending his teaching was likely to persuade the President because his decision was mad e on the "vigor and variety of student criticisms" (p.135).7...........In Carley v. Arizona Board of Regents (1987), The court ruled (18) the University president was free to consider factual findings mad e by minority members of the academic freedom and tenure committee and any other evidence which he found relevant in determining whether to deny renewal of teaching con tract to non tenured instructor. The president was not bound by factual findings made by majority members of committee (P.1103).7...........In Yu Chuen Wei v. Vermont State Colleges Faculty Fede ration (1995), it was noted that (28) The Dean and the President obviously had much experience in reviewing student evaluations, and could reasonably draw on that expe rience in each tenure review. (p.311); judgements "were not arbitrary or capricious and we re exercised honestly upon due consideration,"....that Deans and Presidents have much experience in reviewing student evaluations, and could reasonably draw on that expe rience" (p.311). 7...........In Dr. Brian Maclean v. President of The University of British Columbia (1991), the court said, (40) The relevance and quality of the s cores are "a matter of weight for the various decision-makers, and we assume that they were reaso nably aware of the limitations of student evaluations and gave them the weight they deserve" (p.30). 7...........In Robert Kramer v. The President of the University of British Columbia (1992), the board concluded, "In the final analysis, we feel th at this review of the Head's comments on teaching, which would be the sole evidence upon whi ch the Dean and the President could rely, shows that it was incomplete and might have b een misleading" (p.12-14). 7...........In University of Regina Faculty Association v. Univers ity of Regina (1993), he Board said teaching was wrongfully evaluated, but upheld denial of tenure on grounds of inadequate scholarship.7...........In Christopher Turner v. The President of the Universi ty of British Columbia (1993), The board concluded that (11) "there were sufficien t errors of procedure and/or evidence to return the case for reconsideration" (p.11). [BACK to document] 8. In Lieberman v. Grant (1979), Lieberman attempted to introduce approxima tely ten personnel files concerning the tenure proceedings o f other faculty in the English department for comparison. (6) Recognizing that such evidence would have had some minimal probative value, the Court, exercised its discretion under Fe d. R.Ev. 403, and excluded it on the ground that "such probative value would be substantially o utweighed by the delay and waste of time, which introduction of such evidence would have nece ssarily entailed....The plaintiffs case without such evidence seemed almost interminable, c onsuming 52 trial days over a two-year period. That is long enough" (p.873).8........... In Fields V. Clark University (1987) notes but does not admonish the non separat ion of student remarks from small seminar courses and t hose from large lecture classes. 8...........In Cynthia J. Fisher v. Vassar College (1995), the district court found (2) that the biology department distorted Fisher's teaching reco mmendations by (3) "selectively exclud[ing] favorable ratings," by "focus[ing] on t he two courses in which Dr. Fisher had difficulties" and (4) by "applying different standa rds to her than were applied to other tenure candidates" (p.1209).
25 of 448...........In Yu Chuen Wei v. Vermont State Colleges Faculty Fede ration (1995), it was noted that (19) "The statistical comparison demonstrates that Grievant was evaluated higher by students than her [male colleague] with respect to upper level classes, but that (20) [male colleague] was evaluated higher than Grievant in lo wer level classes. Given (21) this "mixed" result, the statistical comparison of evaluations d oes not demonstrate by a preponderance of the evidence that Grievant's students rated her the same, or better, than [male colleague]" (p.305). Wei maintained that (16) her students rate d her the same or higher than the male colleague's students rated him. The Board disagreed saying, (19) "We note that the comparison offered by Grievant is somewhat weak sin ce [male colleague] was tenured in 1988, and those student evaluations of his which we re compared with Grievant post-dated his tenure review by a number of years...further saying "we decline to hold such an involved comparison is necessary before a reasonable tenure determination can be made" (p.305). 8...........In Dr. Brian Maclean v. President of The University of British Columbia (1991), the Board noted that (19) the reviewing faculty held in -class discussions about his teaching. 8...........In Robert Kramer v. The President of the University of British Columbia (1992), Kramer argued that the most significant mistake was the failure to consider all aspects of his teaching. For example, only his teaching in 1989-90 was considered, whereas (9) he had taught a wide range of courses over the previous th ree years (10) had three new courses that year, (11) plus a graduate course. Moreover, (17) T he department head indicated that his teaching was not up to the departmental "standard." The standard appeared to be the performance of the tenure-track faculty, though Kra mer was one of the most junior faculty members (p.8). (15) Only one of the more than thirt y numerically rated questions was used: "Rate instructor bad to good." (16) While a number of negative student comments were quoted in the department Head's letter, there were a number of very positive comments, and these were not mentioned at all.8...........In Christopher Turner v. The President of the Universi ty of British Columbia (1993), the Dean said, "there were few students in undergra duate literature courses since 1986/7---(3,8, and 6 respectively," thus mistaking student 'response' figures for actual student enrolment. The Board concluded that (5) "This misun derstanding is in our opinion sufficient in itself for a reconsideration, since teaching was the focus..." (p.3), and (7) "we think that the comments and emphasis on the size of Dr. Turner's c lasses as evidence of poor teaching are open to objection and constitute errors of procedur e and/or evidence" (p.6). [BACK to document] 9. Given the extensive variation of rulings on SEF cases, from the perspective of a non legal professional it seem that legal reasoning carries t he use of contextual analysis and variables to an extreme, making it possible---and justifiably le gally---to rule just about anyway a court wants to rule. The logical extension of such reason ing would lead to each case being unique and nonsignificantly related to any other case. [BACK to document] 10...........In Johnson v. University of Pittsburgh (1977), the court noted that (10) "It has also been pointed out that in some cases difficult cours es have to be given to the students and the material is such that it is difficult for even the best teacher to get it across. 10...........In Carley v. Arizona Board of Regents (1987), he (7) characterized his professional style as being a "demanding teacher contrary to som e student expectations," (8) Because of this, he maintained his popularity suffered and res ulted in low student evaluations, (9) examination of his student comments indicated that Carley was correct in his assessment as 61% (49 out of 80) negative student comments focuse d on these values. The court ignored these findings.10...........In Dr. Brian Maclean v. President of The University of British Columbia (1991), it
26 of 44was noted that (21) While the knowledge, interest a nd enthusiasm of Dr. MacLean were acknowledged, "the problem appeared to be one of st yle or personality." 10...........In Robert Kramer v. The President of the University of British Columbia (1992), the Board noted that (26) It was obvious that almost al l of the classes were upset about an examination which was considered more geography tha n Asian Studies, and (27) they didn't like the marking. (28) They also felt the workload was far too heavy for an "introductory" course. The Board apparently only noted this variab le. [BACK to document] 11...........In Johnson v. University of Pittsburgh (1977), the court said, "It is also obvious that the court and the administration of universities ca nnot permit students to exercise a veto over professors who may be world renowned scientists and yet if the students rate them unfavorably can be terminated at any time because o f unpopularity" (p.1366-7). 11...........In Carley v. Arizona Board of Regents (1987), he (8) he maintained his popularity suffered as reflected in his low student evaluation s 11...........In Robert Kramer v. The President of the University of British Columbia (1992), he maintained that (14) Student evaluations were consi dered from the standpoint of his popularity, not his effectiveness.11...........In Brian Maclean v. President of The University of Bri tish Columbia (1991), (35) The Faculty Agreement specified that "Evaluation of teaching shall be based on the effectiveness rather than the popularity of the ins tructor." Courts have ruled in various directions on this issue.11...........In Robert Kramer v. The President of the University of British Columbia (1992), the board noted (21) "As for the 'popularity vs. effect iveness' debate, a discouraging or hostile attitude is a part of effectiveness as much as it i s of popularity" (p.8). 11...........In Christopher Turner v. The President of the Universi ty of British Columbia (1993), the Board ruled, (8) while popularity is no t competence nor effectiveness, to the extent that it encourages students it has some rela tion to both" (p.7). [BACK to document] 12. There may well be research showing that being a popular teacher affects learning on elementary and secondary levels of education, I kno w of no such rigorous research on the post secondary level. In my view, one of the proble ms is that all too often we automatically transfer findings from elementary and secondary lev els to higher education. [BACK to document] 13...........In Johnson v. University of Pittsburgh (1977), the court noted that it (5) "has placed little reliance on students' surveys....students in a given course rating a teacher, or professor, some of them as excellent, others as terrible and i n between, many who say passable, mediocre etc.... we cannot say it was unreasonable for the tenured faculty to consider this along with other matters" (p.1359). (8) "It is also obvious that the court and the administration of universities cannot permit studen ts to exercise a veto over professors who may be world renowned scientists" (p.1366-7). A sim ilar view was expressed in Yu Chuen Wei v. Vermont State Colleges Faculty Federation (1995). 13...........In Peters v. Middlebury College (1977), the court gave some weight to an administrative devaluing of a set of positive stude nt evaluations of a faculty that said (2) "The department chair sent a letter to the president of the college, saying, The course of action I recommend is not likely to be popular with students who, though they in part recognize her intellectual limitation, are warmly responsive to h er enthusiasm, energy, openness and ready human concern" (p.860).13...........In Carley v. Arizona Board of Regents (1987), the court said, (23) "Carley has cited
27 of 44no authority that relying primarily or solely on st udent evaluations would be impermissible. We have found none" (p.1105, italics added).13...........In Guam Federation of Teachers v. The University of Gu am (1990), the Guam Federation of Teachers challenged the use of SEF in tenure and promotion decisions (Blum, 1990). The Board (1) ruled to remove anonymous stud ent evaluations from professors' tenure files, (2) The union said the use of SEF violated t he union's contract with the university, (3) which provides that anonymous documents or those "b ased on hearsay" should not be included in a faculty member's file, (4) The court further ruled that (5) students should be made aware of the purpose and ramifications of thei r evaluations, and (6) anonymous student evaluations should not be used.13...........In Robert Kramer v. The President of the University of British Columbia (1992), the Board noted that (18) "The most important perceived error in the teaching evaluation, in the opinion of the Board, is the reliance solely upon t he student evaluations and written comments for the 1989 course evaluations. There was no peer review at all; no member of the Department audited any of Dr. Kramer's lectures" (p .10). 13...........In University of Regina Faculty Association v. Univers ity of Regina (1993) a Canadian Arbitration Board ruled that (3) "With res pect to teaching, it is our opinion that the evidence of unsatisfactory performance is very weak indeed ...It is important to note that the basis of the comments, particularly the negative on es in the fall of 1992, were written student assessments... [and] Although these assessments are expressly recognized in Art. 17.19 of the collective agreement, to base important career deci sions on them only does not seem justified" (p.4). The Board further ruled (4) that tenure decisions could not be based solely on assessments which were completed by students who ha d never been made aware of the ramifications of their statements. (5) [I]f evaluat ions are to be used for serious career development purposes those completing them should b e aware of the potential consequences of their participation" (p.4) (8) "To base serious career decisions narrowly on student evaluations is not to be encourage... (9) If teachi ng is to be seriously evaluated for career purposes, whether for positive or negative purposes it seems incumbent upon Faculties not to rely only on classroom administered evaluations but to broaden the base of assessment" (p.4). 13...........In Christopher Turner v. The President of the Universi ty of British Columbia (1993), the Board ruled, (9) while the [Faculty Ass ociation] Agreement permits, but does not mandate either student reviews or peer reviews, and the methods of assessment 'may vary', we do conclude that the reliance placed on these ve ry limited student reviews must have been great, since there was no other evaluation referred to. Where there is no other evidence sought, student comments will have an apparent impo rtance and credibility that they may not deserve... (10) We would strongly recommend peer re view in the reconsideration which we are requiring" (p.7). The board further noted that (8) "This board has been asked on a number of occasions to pass judgment on the relevance of s tudent evaluations to the [Faculty Association] Agreement criteria for good teaching. Good teaching is an elusive concept. Students may not be good judges during a course; th eir judgment might be quite different several years later in life. (p.7). [BACK to document] 14...........In Dyson v. Lavery (1976), a student evaluation ranked her 46th of 48 teachers. 14...........In Lieberman v. Grant (1979), the court noted (4) a compilation of stude nt ratings showed that the cumulative ratings for members of t he department ranged from a low of 4.09 to a high of 8.95. She had a cumulative rating of 7 .06, which ranked her 12th out of the 15 junior faculty members. The 7.06 figure included th e ratings from a previous semester in which the plaintiff received a rating of 8.18. Prio r to this rating in the spring of 1972, the plaintiff's cumulative rating was 6.7.14...........In Carley v. Arizona Board of Regents (1987), it was noted that (1) of the 13 faculty
28 of 44in his department of art, he was ranked fifth, (2) by his chairman he was ranked 7th, (3) student evaluations, however, ranked him last: 13th of 13 (p.1105). 14...........In Robert Kramer v. The President of the University of British Columbia (1992), the court noted (24) scores in the other two courses we re higher---3.45 in one, 3.91 in another, against a "faculty average" of 4.22. The board furt her noted, "In the result, one got a 2.82 and one got a 3.07...the difference is statistically in valid in any event" (p.10). [BACK to document] 15.15...........In Dyson v. Lavery (1976), the course said (1) "A number of students apparently had voiced displeasure over the quality of her clas s preparation and presentation" (p. 111 (3) "These impressions said the court, "were largely confirmed after the initial decision to not rehire her had been made, by a student evaluation t hat ranked her 46th of 48 teachers in the Business Department" (p.111, italics added).15...........In Johnson v. University of Pittsburgh (1977), the court said, (3) "we have the instance referred to in Finding 27 (p.1359, italics added). 15...........In Lieberman v. Grant (1979), the court noted (3) based on complaints re ceived from several students ," to the effect that Lieberman's interest in femin ism caused her to ignore other themes in literature (p.873, italics a dded). 15...........In William Sypher v. Vermont State Colleges Faculty Fe deration (1982), (1) some of the student comments noted that, "When students try to disagree he shoots you down and tries to degrade you in front of the class," (p.115 ), while others said, "encourages student participation as much as possible... encourages stu dent to express their ideas freely and not worrying how 'dumb' it may sound...always wants you point of view." (P.115) (2) With regard to the numerical ratings, the Board's opinio n was that (3) "regardless of a strong majority of students' rating his teaching as above average, (4) the existence of a significant minority of students feeling degraded, humiliated, and embarrassed can reasonably lead an evaluator to question a teacher's effectiveness" (p .115). 15...........In Yu Chuen Wei v. Vermont State Colleges Faculty Fede ration (1995), the Board said, (22) "the statistical comparison does not tak e account of the comments made by students on the evaluation forms. Grievant's studen t evaluations are striking in how often mention is made of Grievant's communication difficu lties, particularly language difficulties (p.304-5). The board further noted with respect to comments that while some students had written that she was a "slant eyed bitch," and that she should "go back to China," (30) "We also are not persuaded that the racism evident in t he student evaluations of Grievant made student evaluation results unreliable. The percenta ge of evaluations in which racism by students was evident was approximately one percent of the total evaluations" (p.306). 15...........In Robert Kramer v. The President of the University of British Columbia (1992), (2). The department Head viewed Kramer's 1989-90 co urse evaluations "with some alarm"....(4) Even more disturbing to the departmen t Head was that a considerable number of students in their written comments stated that Dr. Kramer was biased, sarcastic, and hostile to the material and that a number of students had stat ed that Dr. Kramer's teaching would cause them to stay away from the Asian Studies department (5) There were also some diametrically apposed positive comments" (p.10).15...........In University of Regina Faculty Association v. Univers ity of Regina (1993), The Board argued (6) that the University was under an o bligation to verify negative comments before acting on them. Consequently, (7) the fact t hat Dr Jalan had received some negative evaluations from students could not be used to unde rmine the otherwise generally favorable comments he had received in his annual performance reviews" (p.4). 15...........In Dr. Brian Maclean v. President of The University of British Columbia (1991), the court noted that (25) "With respect to the "qualita tive" scores---i.e., the "comments," there was a clear error. The qualitative comments from a number of courses were read and
29 of 44commented on, and conclusions were drawn from them which went into the "file." Both Reviewing faculty read and commented on them, as di d the Department Chair in her letter to the Dean. Yet the Dean had clearly stated in a depa rtmental memo that the qualitative comments were not to be used for administrative or promotion purposes. (26) While in the abstract there is no reason why such comments would not be relevant, if the Department had a rule against their use, or in other words if they were "for the professor's eyes only," then it was a significant breach of Departmental rules to u se them" (p.31). (27) In the opinion of the Board, so long as the comments were fairly presente d, they offered the PAT [Promotion and Tenure Committee] and others a better balanced view of the teaching qualities and problems of Dr. MacLean than the quantitative statements alo ne" (p.31). (28) The court noted that "One problem with the questionnaire is that it soli cits bad points as well as good points. Despite that caveat, we conclude that the inclusion of the qualitative comments was not a significant error" (p.32). [BACK to document] 16.16...........In Johnson v. University of Pittsburgh (1977), the court noted (2) they "approached this question of teaching ability with considerable doubt, in view of the fact that in prior years there does not appear to have been a ny criticism of her teaching and also in view of the fact that...there was evidence that the department chairman, had informed her after one of her lectures in 1971 what a great lect ure it had been;" On the other hand, the court said (3) "we have the instance referred to in Finding 27 (p.1359, italics added). 16...........In Fields V. Clark University (1987, it was observed (3) a few of which, from students in Fields' seminars, were "wildly enthusia stic" about her enthusiasm, commitment and presentations; (4) a few were ambivalent; (5) w ith a considerable number being extremely negative, particularly (6) with regard to her large lecture classes in basic courses in sociology.16...........In Yu Chuen Wei v. Vermont State Colleges Faculty Fede ration (1995), moreover, they said, (19) "The statistical comparison demonst rates that Grievant was evaluated higher by students than [her male colleague] with respect to upper level classes, but that (20) [the male colleague] was evaluated higher than Grievant in lower level classes. Given (21) this "mixed" result, the statistical comparison of evalu ations does not demonstrate by a preponderance of the evidence that Grievant's stude nts rated her the same, or better, than [male colleague]" (p.305).16...........In Dr. Brian Maclean v. President of The University of British Columbia (1991), it was noted that (20) In general, the in-class peer r eports were mixed but favourable. The in-class discussions were more problematic. (p.30). (21) While the knowledge, interest and enthusiasm of Dr. MacLean were acknowledged, "the p roblem appeared to be one of style or personality." It was further noted that (29) "As ag ainst the low figures, they disclosed a number of good qualities in Dr. MacLean---enthusias m for his subject, wide knowledge of the literature, much out of class assistance to stu dents, and a commitment to seeking good work from students. (p.31). (30) The reviewing facu lty report noted the comments about Dr. MacLean's "derogatory manner, biased opinion, unwil lingness to listen," were matched by "clear, stimulating, very helpful after class." And (31) "some students have told us that the comments made were not representative of the class as a whole and were unduly influenced by the process" (p.41). (32) "A number of students, both from earlier years and from his current classes, furnished letters of support, and in preparation for the appeal, some furnished affidavits with respect to particular matters such as the 'intimidation' discussion in Soc. 250 and events in Soc. 490 and 520 in the fall of 1989. (p.33) 16...........In Robert Kramer v. The President of the University of British Columbia (1992). (16) While a number of negative student comments we re quoted in the department Head's letter, there were a number of very positive commen ts, and these were not mentioned at all.
30 of 44(25) "We have examined all of these written comment s. There was a very wide range of comments. There were not 29 comments saying sarcast ic and biased comments; but there were certainly 29 comments which included either cy nical, sarcastic, biased, insulting, negative, condescending, belittling, opinionated, a rrogant, nihilist, and destructive.... (29) However, it would only be fair to add that there we re a number of comments in favour of Dr. Kramer, stating that the student "liked the course immensely," "now interested in Asian Studies;" "helps create a relaxed atmosphere," "rea lly enjoyed him," "very approachable and knowledgeable," "very enthusiastic," "captivates au diences with his humour," "very effective" (p.12). (30) "In the other two courses, both small, both Japanese language, there were also some negative comments" (p.12).16...........In Christopher Turner v. The President of the Universi ty of British Columbia (1993), the board noted that (6) "While there is no question of Dr. Turner's competence as a teacher at all levels, teaching evaluations for the last several years show that his effectiveness is marred by what students perceive as excessive fo rmality, lack of enthusiasm and dullness....In a previous promotion attempt, his te aching was briefly described as "very competent" but student evaluations indicate further improvement to be "better than adequate" (p.2) [BACK to document] 17...........I wish to thank to Patrick B. Shaw, At torney for AAUP for referring me to Ms. Linda Lott, Administrative Coordinator, Hofstra Uni veristy Chapter, AAUP, who conducted a search for me of a faculty collective bargaining contract database being developed there. Ms. Lott searched the database with "several key wo rds that relate to academic freedom, teaching methodology and student evaluations. The o nly word that was identified in some of the contract provisions was 'student evaluation'"(P ersonal communication, March 21, 1997). It should be noted that very few explicit reference s in the contracts to the use of signed/unsigned SEF or the use/nonuse of comments w ere found in this developing database. Some of the instances found are:17...........At Rider University, the agreement sta ted "The College may not use course evaluations for purposes of discipline, promotion, or tenure, unless introduced for such purposes by the faculty member."17...........At Western Michigan University, the ag reement stated "Only the ratings shall be included in all promotion, reappointment, merit, an d tenure recommendations, together with such other evaluations of teaching competence as ma y be employed by faculty members and made available. Western agrees to consider all the evidence of teaching competence that is presented in evaluating teaching faculty and shall not use unsubstantiated structured comments in personnel decisions." I have already no ted the ruling at the University of Guam (Blum D. E. (1990, October 3). which stated that (1 ) students not being made aware of the purpose and ramifications of their evaluations, (2) the anonymous nature of student evaluations, (3) the invalid analysis of SEF, and t herefore, (4) SEF in effect being anecdotal and hearsay data. Since most SEF results are prepar ed anonymously, an instructor has no recourse to confront his/her evaluators. As will be addressed below, the anonymous nature of SEF is beginning to also be questioned by arbitrati on boards. 17...........I am informed from a colleague at St. John's University (New York) that, though SEF are mandated, they are not used administrativel y. I suspect there are many more schools (likely those who have union contracts) that do not use SEF administratively or who limit its use. I might note here for those who maintain that without SEF used administrative that there is no quality control over instruction and that the refore student learning will suffer, to check with the schools who do not use SEF administrativel y for a reality check on their assumption. [BACK to document]
31 of 4418...........In Johnson v. University of Pittsburgh (1977), the court said, "It is also obvious that the court and the administration of universities ca nnot permit students to exercise a veto over professors who may be world renowned scientists" (p .1366-7), noting, "It is obvious that a professor may be possessed of excellent qualificati ons as a research scientist and not necessarily be able to prove his or her worth as a teacher, concluding that, (9) "in cases where one has an outstanding scientist of national or int ernational reputation, one may decide to promote and give tenure notwithstanding inability t o come across as a teacher, this however is not one of those cases" (p.1366-7).18...........In Yu Chuen Wei v. Vermont State Colleges Faculty Fede ration (1995), (31) Wei's last claim charged that the College violated the Co ntract by denying her a promotion, even though both her scholarly performance and professio nal activities were exceptional. Article 22(E) of the College provides for otherwise grantin g promotion if the President decides that "performance in one of three areas has been excepti onal" (p.314). The Board concluded that "Although Grievant had a significant publication re cord, most of it was developed before coming to Castleton" (p.315). (33) In terms of exce ptional scholarship, Dr. Wei maintained she had solved a significant mathematical problem ( apparently published). The Board's response was, (34) "although Grievant claimed to ha ve solved the Erdos conjecture, [the]Dean reasonably concluded that she had not est ablished that she actually had solved the conjecture. Under these circumstances, and given ou r consideration of the discrimination issue previously discussed, we conclude that (35) G rievant has not established discrimination. The Colleges reasonably, and based on legitimate reasons, concluded that Grievant had met the tenure standards in this perfo rmance area but that her performance was not exceptional" (p.315).18...........In Dr. Brian Maclean v. President of The University of British Columbia (1991), the court said, (34) "while a superior research and pub lication record cannot overcome a poor teaching record, it might tip the scales where the teaching record was 'on the edge'" (p.10). [BACK to document] 19. I would like to again acknowledge the invaluab le assistance of Professor William Kaplin (Personal conversation, May 28, 1997). In construct ing this section on the generalization of the principles disparate treatment and impact, I ha ve gone where wiser and more skilled sailors would perhaps have elected not to sail. Wit hout Bill's counsel I would clearly have sailed off the edge of this legal world. As it is, I may be dangerously close. But since my intent in not a strictly legal one I can perhaps be given some latitude. [BACK to document] 20. Statistically, a case can be made that the rac ist comments of one percent represent a larger (but unknown) number of racist attitudes tha t simply were not overtly stated. Thus even one percent overtly racist comments should be a red-flag to look deeper into such a situation. And given a commitment to eradicating ra cial and gender discrimination, should not the overt one percent (plus) evidence of racism on a SEF be treated more seriously? [BACK to document] 21. For example, in Fields V. Clark University (1987), the court noted that (10) Fields' "attacks" the university's use of her student evalu ations because they were not gathered and evaluated according to accepted standards of scient ific polling procedures. In response, the court agreed, saying, "She is probably correct. The use made of the student evaluations in her case, however, followed the practice at the defenda nt's university in other tenure decisions" (p.671). [BACK to document]
32 of 4422. Given the current attitude of the courts in no n EEOC cases toward accepting the institutional interpretation of the SEF data, for t hose covered as members of a protected group ---assuming sufficient prima facie evidence to do so, and possible illegalities notwithstanding---using one's protected status coul d be used as a strategy for insuring a rigorous analysis of one's SEF data. [BACK to document] 23. Once again, numerous legal variables prohibit such generalization. See Kaplin and Lee, 1995:23........... 126.96.36.199. Standards of judicial review and burdens of proof. Postsecondary institutions have numerous processes for making int ernal decisions regarding the status of faculty, students, and staff, and for internally re solving disputes among members of the campus community. Whenever a disappointed party see ks judicial review of an institution's internal decision, the reviewing court must determi ne what "standard of review" it will apply in deciding the case. This standard of review estab lishes the degree of scrutiny the court will give to the institution's decision, the reasons beh ind it, and the evidence supporting it. Put another way, the standard of review helps establish the extent to which the court will defer to the institution's decision and the value and fact j udgments undergirding it. The more deference the court is willing to accord the decisi on, the less scrutiny it will give to the decision and the greater is the likelihood the cour t will uphold it. Issues regarding standards of review are thus crucial in most litigation.23...........In turn, standards of review are relat ed to the "burdens of proof" for the litigation. After a court determines which party is responsible for demonstrating that the institution's decision does or does not meet the standard of revi ew, the court allocates the burden of proof to that party. This burden can shift during the cou rse of the litigation (see, for example, Section 188.8.131.52). Burdens of proof also elucidate t he elements or type of proof each party must submit to meet its burden on each claim.23........... 184.108.40.206. Standards of Judicial Review and Burdens of Proof 35 or defense presented. Such issues are also critical to the out come of litigation and can become very complicated (see, for example, Section 220.127.116.11).23...........There are many possible standards of r eview (and likewise many variations of burdens of proof). The standard that applies in any particular litigation will depend on numerous factors: the type of institution subject t o the review (whether public or private); the type of claim that the plaintiff makes; the institu tion's internal rules for reviewing decisions of the type being challenged; the character of the contractual relationship between the institution and the party seeking court review; and the common law and statutory administrative law of the particular state (see thi s volume, Section 1.3.1), insofar as it prescribes standards of review for particular situa tions. At a subtler level, the court's selection of a standard of review may also depend on comparat ive competence-the court's sense of its own competence, compared with that of the instituti on, to explore and resolve the types of issues presented by the case.23...........If a court is reviewing the substance of a decision (whether the institution is right or wrong on the merits) it may be more deferential than it would be if it were reviewing the adequacy of the procedures the instit ution followed in making its decision-the difference being attributable to the court's expert ise regarding procedural matters and relative lack of expertise regarding substantive judgments ( such as whether a faculty member's credentials are sufficient to warrant a grant of te nure). [BACK to document] 24. Again I am indebted to Bill Kaplin for this im portant point.
33 of 44 [BACK to document] 25. An assumption that students are qualified eval uators of teaching effectiveness I will deal with in the following paper. [BACK to document] 26. I am not against using SEF for feedback to fac ulty. Some method of assessing faculty teaching effectiveness needs to be developed. No pr ofession can be completely self policing. In terms of using SEF to assess teaching effectiven ess and its use in tenure, promotion, reappointment, and merit salary increases, however, we need to proceed much more carefully than we have. As an initial general resolution to t he problem of their use, I suggest that it be used as a "red flag" that can then set in motion a systematic faculty inquiry into the situation. [BACK to document] 27. One teaches in such a classic mode today at on e's litigious peril. See Pinsker (1989). [BACK to document] 28. Scriven further suggests that "Unfortunately, most of what you see in a classroom are the features of teaching style, and you can't use any o f them, because no amount of research can justify you in counting off brownie points for styl e against demonstrably badly or well-performed duties. You might as well try subtra cting 10% of the purse from a professional golfer's pay-out on the grounds that h e or she has an inelegant swing. You must get data on all duties; and when you have that, why would you need anything else?" (p.6). [BACK to document] 29. SEF has become politicized. At a recent facult y development meeting on campus a well known "consultant" from a business school who has p ublished books and articles of faculty evaluation was brought in by administration to work with faculty on a SEF and teaching portfolios. I mentioned Scriven's work to him. The consultants flip response was to call Scriven's views "fringe." I let it be known that th is kind of non scholarly and ad hominem response was not acceptable. He has been critiquing and developing methods of assessment for some time. He takes a rigorous no-nonsense, yet practical approach to evaluation. For readers who may not know of Scriven's background hi s work, he is a philosopher of science of international reputation who was co-editor of th e foundational series, The Minnesota Studies in the Philosophy of Science In 1967 he coined the terms formative and summative evaluations that are coin of the realm today in eva luation research (Scriven 1981); he founded and was the first president of what is now the Amer ican Evaluation Association. Thus to call Scriven's views "fringe" is at best the epitome of arrogance, at worse, its more likely due to ignorance. [BACK to document] 30. Most rigorous statisticians, however, would as a matter of course agree with Scriven. One of the first things we learn in psychological r esearch is that statistical correlations can not be automatically applied to individuals. How th is is generally transferred even by those who understand it in one domain to a different doma in is quite another---yet important educational---question in itself. [BACK to document] 31. Research into transfer of learning suggests ma ny traditional methods that appear to produce effective learning may in fact be counterpr oductive relative to long term transfer of learning. For example, it's generally the case that immediate feedback during learning results in more efficient learning. It therefore seems to l ogically follow that immediate feedback during learning would result in more efficient tran sfer of learning. Recent findings, however,
34 of 44indicate that under certain conditions delayed feedback is more efficient. Other examples point out the difference between understanding typi cal learning principles on the one hand, and principles of transfer of learning (i.e., gener alization of learning and long term application) on the other. Schmidt and Bjork (1992) note "we have repeatedly encountered research findings that seem to violate some basic a ssumptions about how to optimize learning in real-world settings." For example, increasing th e frequency of information about errors to learners during practice improves their performance The fact is, that increasing the frequency about errors can work in just the opposite manner f or long term retention and for transfer. Further counterintuitive effects come from research showing that increasing the variability of a task during practice depresses performance during training, but may increase transfer of performance after training when conditions are alte red from the original training situation. Still other data show that performance on solving a puzzle is virtually perfect with no delay between instruction and application, but rapidly de clines as the delay is increased (e.g., where periods of delay are two weeks and one, two, three, and four months). In contrast, performance on a similar puzzle was worse than perf ormance on the same puzzle at first, but stayed relatively constant over a delay of four-mon ths. In other words, the transfer effect was much more persistent than the specific effects of l earning a particular puzzle. Singley and Anderson (1989). And so it goes. [BACK to document] 32. As I was writing this section (5/10/97), I hea rd on the NBC national news about Jon Westling, (1995) the new President and former Provo st of Boston University in trouble for suggesting that disability laws requiring special s tandards for students are contributing to lowering academic standards. I then went on the int ernet and found the following: "The disability laws are sacred cows, but they must at t he very least be tethered so that they cannot be used to force universities to lower academic and other standards." [BACK to document] 33. This section should avert the frequent "he-sai d-she said, you-show-me-yours-I'll-show-you-mine" approach to a ssessment of the SEF validity literature by administrators and some faculty, wher e the apparent conflicts in the literature are rationalized away by simply saying "well, some stud ies show that SEF are largely not valid, but other studies show it is valid." [BACK to document] 34. In my first paper (Haskell, 1997a), I suggeste d that one of the reasons that SEF has not been viewed as a threat to academic freedom and gen erated more interest is because many do not consider it high status research and do not see its encompassing implications for quality education---even after reading some of the findings on SEF. Many of my colleagues, including those who have basically supported my eff orts and work on the issue have said to me "Well, you've made your point about SEF by makin g us aware of it and by publishing articles, why don't you now put it aside and get ba ck to your real scholarship." [BACK to document] 35. As one scholar (Damron, personal communication April, 1997) who read a draft of this paper observed: from "the legal decisions you revie w in your paper it is clear that untenured and/or politically incorrect faculty are often cons idered to be "fair game" by administrators, with literally _any_ superficially plausibly excuse serving as a rationale for dismissal. Use of such strategies reveal that faculty are often regar ded as little more than term employees who are as disposable (and replaceable) as tissues. Cle arly, there is a very serious ethical issue
35 of 44 here, and a hugely hostile attitude toward academic freedom and faculty in general....the great variety of decisions you've reviewed and their asso rted implications for the coherence and ethics of the legal processes that gave rise to the m...it seems to me that many judges and arbitration panelists have little sense of how to p roceed in hearing[s] involving academics." [BACK to document] ReferencesAbrami, P.C., Dickens, W.J., Perry, R.P., & Leventh al, L. (1980). Do teacher standards for assigning grades affect student evaluations of inst ruction? Journal of Educational Psychology 72 107-118. Abrami, P.C. (1989). How should we use student rati ngs to evaluate teaching? Research in Higher Education 30 (2). 221-27. Anderson, U., & Schadewald, M., (1991). Analogical transfer and expertise in legal reasoning. Organizational Behavior and Human Decision Processe s 48 272-290. Baez, B., & Centra, J. (1995). Tenure, promotion, a nd reappointment: Legal and administrative implications. Ashe-eric Higher Education Report No. 1 Barnett, L.D. (1996). Are teaching evaluation quest ionnaires valid? Assessing the evidence. Journal of Collective Negotiations in the Public Se ctor 25 (4). 335-349. Bauer, H.H. (1996). The New Generations: Students w ho don't study (prepared for the symposium, "The Technological Society at Risk" Annu al Meeting, AOAC International, Orlando (FL), 10 September 1996). Email email@example.com for a copy of the article. Blum, D.E. (1990, October 3). U. of Guam removes ev aluations from files. The Chronicle of Higher Education Section: Personal & Professional, p. A21 Brian Maclean v. President of The University of Bri tish Columbia (1991). Held at Vancouver, B.C. January 28--June 20.Brimelow, Peter (1996). Devalued diplomas. Forbes April 22. Carley V. Arizona Board of Regents, 737 P.2d 1099 ( Ariz. App. 1987). Cashin, W.E. (1995). Student ratings of teaching: T he research revisited. IDEA Paper No. 32 Manhattan, KS: Kansas State University, Center for Faculty Evaluation and Development. Chacko, T.I. (1983). Student ratings of instruction : A function of grading standards. Educational Research Quarterly 8 (2), 19-25. Cohen, P.A. (1981). Student ratings of instruction and student achievement: A meta-analysis of multisection validity studies. Review of Educational Research 51 281-309. Cynthia J. Fisher v. Vassar College (1995). United States Court of Appeals for the Second Circuit Nos. 1179, 1303, 2275 Docket Nos. 94-7737, 94-7785, 94-9125 Christopher Turner v. The President of the Universi ty of British Columbia (1993).
36 of 44Arbitration Appeals Board Case No.: AI-256 Vancouve r, B.C. July 19,20 and 21. Cohen, P.A. (1983). Comment on 'A Selective review of the validity of student ratings of instruction.' Journal of Higher Education 54 (no) 4. Copeland, J.D., & Murry, J.W., Jr. (1996). Getting tossed from the Ivory Tower. Missouri Law Review 61 233-327. Divoky, J.J., & Rothermel, A. (1988). Student perce ption of the relative importance of dimensions of teaching performance across type of c lass. Educational Research Quarterly 12 40-45 Dowell, D.A., & Neal, J.A. (1982). A selective view of the validity of student ratings of teaching. Journal of Higher Education 53 51-62. Dyson V. Lavery, 417 F.supp. 103 (E.d. Va. 1976).Dawes, R.M., Faust, D., & Meehl, P.E. (1989). Clini cal versus actuarial judgment. Science 243 1668-1674 Dilts, D.A., Samavati, H., Moghadam, M.R., & Haber, L.J. (1994). Student evaluation of instruction: Objective evidence and decision making Journal of Individual Employment Rights 2 73-86. Ellul J. (1964). The technological society New York: Vintage Books. Faust, D., Guilmette, T.J., Hart, K., Arkes, H.R., Fishburne, F.J., & Davey, L. (1988). Neuropsychologists' training, experience, and judge ment accuracy. Archives of Clinical Neuropsychology 3 145-163. Feldman, K.A. (1983). Seniority and experience of c ollege teachers as related to evaluations they receive from students. Research in Higher Education 18 3-124 Fields V. Clark University (1986). Civil Action No. 80-1011-s, United States District Court for the District of Massachusetts, 40 Fair Empl. Pr ac. Cas. (Bna) 670, March 14,; Vacated and Remanded May 8, 1987.Fighting grade inflation. (1994) Science 264 (27 May), 1255. Franklin, J., & Theall, M. (1990). Communicating st udent ratings to decision makers: Design for good practice. In Theall, M. & Franklin J. (Eds .), Student Ratings of Instruction: Issues For Improving Practice San Francisco: Jossey-Bass. Garb, H.N. (1989). Clinical judgment, clinical trai ning, and professional experience. Psychological Bulletin 105 387-392. Golding, M.P. (1984). Legal reasoning New York: Alfred A. Knopf. Greenwald, A.G. (1996). Applying social psychology to reveal a major (but c orrectable) flaw in student evaluations of teaching University of Washington, Draft Manuscript, March 1, submitted for publication.Greenwald, A., & Gillmore, G. (1997). No pain, no g ain? The importance of measuring
37 of 44course workload in student ratings of instructions. Journal of Educational Psychology (forthcoming).Haskell, R.E. (1997a). Academic freedom, tenure, an d student evaluation of faculty: Galloping polls in the 21st century. Education Policy Analysis Archives 5 (6) [Refereed Online Journal]. Available: http://olam.ed.asu.edu/epaa/v5n6.html Haskell, R.E. (1997b). Abridgement of academic free dom, promotion, reappointment and tenure rights by the administrative use of student evaluation of faculty: (Part II) Views from the court. Education Policy Analysis Archives 5 (6), [Refereed Online Journal]. Available: http://olam.ed.asu.edu/epaa Hayes, S.C. (1991). The emperor's clothes: Examinin g the 'delusions' of professional psychology: The healthy skepticism of David Faust. Science 1 22-25. Holmes, D.S. (1972). Effects of grades and disconfi rmed grade expectancies on students' evaluations of their instructor. Journal of Educational Psychology 63 130-133. Howard, G.S., & Maxwell, S.E. (1982). Do grades con taminate student evaluations of instruction? Research in Higher Education 16 175-188. Howard, G.S., & Maxwell, S.E. (1980). Correlation b etween student satisfaction and grades: A case of mistaken causation? Journal of Educational Psychology 72 810-820. Howard, G.S., Conway, C.G., & Maxwell, S.E. (1985). Construct validity of measures of college teaching effectiveness. Journal of Educational Psychology 77 187-196. Johnson V. University of Pittsburgh (1977). 435 F.s upp. 1328 (W.d. Pa.). Kaplin, W.A., & Lee, B. (1995). The law of higher education: A comprehensive guide to legal implications of administrative decision makin g San Francisco: Jossey Bass. Larkin, J., McDermott, J., Simon, & Simon, H. (1980 ). Expert and novice performance in solving physics problems. Science 208 1335-1342. Lasch, C. (1979). Schooling and the new illiteracy (Chapter Six), in The culture of narcissism: American life in an age of diminishing expectations New York: Warner Books. Lemons, J. (ed.) (1996). Scientific uncertainty and environmental problem-so lving Cambridge, MA: Blackwell ScienceLemons, J., Shrader-Frechette K., & Cranor C. (in p ress). The precautionary principle: Scientific uncertainty and type-I and type-II error s Foundations of Science. Leo, J. (1996). No books, please; We're students. U.S. News and World Report September 16.Levi, E.H. (1949). An introduction to legal reasoning Chicago: University of Chicago Press. Lieberman V. Grant (1979). 474 F.supp. 848 (D.conn. ). Lovelace v. Southeastern Massachusetts University ( 1986). 793 F.2d 419 (1st Cir.).
38 of 44Marchant, G., Robinson, J., Anderson, U., & Schadew ald, M. (1991). Analogical transfer and expertise in legal reasoning. Organizational Behavior and Human Decision Processe s 48 272-290.Marsh, H.W., & Dunkin, M.J. (1992). Students' evalu ations of university teaching: A multidimensional perspective. Higher Education: Handbook of Theory and Research 8 143-233. Marsh, H.W. (1984). Students' evaluations of univer sity teaching: Dimensionality, reliability, validity, potential biases, and utility. Journal of Educational Psychology 76 707-754. Marsh, H.W. (1982). Validity of students' evaluatio ns of college teaching: A multitrait-multimethod analysis. Journal of Educational Psychology 74 264-279. Marsh, H.W. (1980). The influence of student, cours e, and instructor characteristics on evaluations of university teaching. American Educational Research Journal 17 219-237. McKeachie, W.J. (1979). Student ratings of faculty: A reprise. Academe 65 384-397. Neath, I. (1996). How to improve your teaching eval uations without improving your teaching. Psychological Reports 78 1363-1372. Peters V Middlebury College (1977). 409 F.supp. 857 (D. Vt.). Pinsker, S. (1989). Teaching in a litigious age. Change 21 (July/August), 50-54. Powell, R.W. (1977). Grades, learning, and student evaluation of instruction. Research in Higher Education 7 193-205. Rabinowitz, J. (1993). Diagnostic reasoning and rel iability: A review of the literature and a model of decision-making. The Journal of Mind and Behavior 14 297-316 Rebell, M.A. (1990). Legal issues concerning teache r evaluation. In Millman, J., & Darling-Hammond, L., (Eds.), (pp. 337-355) The new handbook of teacher evaluation Beverly Hills, CA: SAGE Publications.Robert Kramer v. The President of the University of British Columbia (1992). Arbitration Appeals Board Case No. AI-245. Held at the Universi ty April 22 and 23. Sacks, Peter Generation X Goes to College, Chicago & LaSalle (IL): Open Court, 1996. Singley, M.K., & Anderson, J.R. (1989). The transfer of cognitive skill Cambridge, MA: Harvard University Press (p.199). Scheelhaase v. Woodbury Central Community School Di strict (1973), In Rebell, M, A. (1990). Legal issues concerning teacher evaluation. In Millman, Jason& Darling-Hammond, Linda, (Eds.), (pp. 337-355) The new handbook of te acher evaluation. Beverly Hil ls: CA. Sage Pub.Schmidt, R. A., & Bjork, R.A.(1992). New conceptual izations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science 3 207-217.
39 of 44 Scriven, M. (1988). The new crisis in teacher evalu ation: The improper use of research-based' indicators. Professional Personnel Evaluation News (January) p.4. Scriven, M. (1987). Validity in personnel evaluatio n. Journal of Personnel Evaluation in Education 1 923. Scriven, M. (1991). Evaluation Thesaurus (4th Ed.) Newbury Park, CA: Sage Publications Scriven, M. (1993). The validity of student ratings : In Teacher evaluation. Evaluation & Development Group Scriven, M. (1995). A unified theory approach to te acher evaluation. Studies in Educational Evaluation 21 111-129 Scriven, M. (1981). Summative Teacher Evaluation. I n J. Millman (Ed.) Handbook of teacher evaluation (pp. 244271). National Council on Measurement in E ducation. Beverly Hills, CS: Sage Publications.Simon, W.E. (1996). The dumbing down of higher educ ation. Wall Street Journal March 19. Snyder, C.R., & Clair, M. (1976). Effects of expect ed and obtained grades on teacher evaluation and attribution of performance. Journal of Educational Psychology 68 75-82. Stern, J., & Flynn, P.D. (1995). Students propose a course of action for grade inflation. The Bucknellian (Feb, 20).[Online] Available: Sunstein, C.R. (1993). On analogical reasoning. Harvard Law Review 106 741-79. University of Regina V. University of Regina Facult y Association and Dr. Pradeep Jalan. (1993). Case No.: AI-298 Regina, Saskatchewan, July 9. Vasta, R., & Sarmiento, R.F. (1979). Liberal gradin g improves evaluations but not performance. Journal of Educational Psychology 71 207-211. Westling, J. (1995). Getting Government Out of High er Education. Boston University, delivered May 3. Available Online: http://www.heritage.org/heritage/library/categories /education/lect533.html Worthington, A.G., & Wong, P.T.P. (1979). Effects o f earned and assigned grades on student evaluations of an instructor. Journal of Educational Psychology 71 764-775. William Sypher v. Vermont State Colleges Faculty Fe deration. (1982). 5 VLRB 102. Yu Chuen Wei and the Vermont State Colleges Faculty Federation. (1995) 18 VLRB 261. Appendix: A Non Litigated Case of SEF Used in the D enial of Tenure and Reappointment..........The following is a case of the denial of tenure and reappointment primarily on grounds of apparent non outstanding teaching as measured by SEF. The case illustrates most of the
40 of 44issues discussed in this paper. As in most such den ials, Dr. Tichenor did not go to litigation as litigation is costly given the low odds of the c ourts finding in favor of faculty. The case is an example of the likely thousands of such cases (g iven that there are over 3,500 colleges in the U.S.) that do not go to litigation. In most res pects this case is probably typical relative to the inappropriate use of SEF in faculty dismissals...........The case involves Dr. Linda L. Tichenor, assistant professor of biology, who was denied reappointment during the 1996/97 academic ye ar at a small private, student tuition-dependent university. The SEF material pres ented here was given to me for use here by Dr. Tichenor. It is from a draft paper by her th at will appear in a special monograph series to be published by The Society of College Science Teaching edited by Mario W. Caprio. ..........The unique aspect of this case of non rea ppointment due to SEF, ostensibly reflecting teaching ineffectiveness, is that Dr. Tichenor grad uated from Idaho State University, Doctorate of Arts program in biology. The program, established by the Carnegie-Mellon Foundation in the late 1960's, specifically prepare s future faculty for college science teaching pedagogy as well as for teaching a broad range of c ourses within the discipline of life sciences. The program requires a breadth of backgro und in the biological sciences, knowledge of learning processes and pedagogy, aware ness of the objectives of an undergraduate education, and the development of a s ound educational philosophy. "As a result of my training," says Dr. Tichenor, "my clas sroom was to become my 'research bench.'" This is fairly unique in higher education. Few high er education faculty have any formal training in teaching (with the exception of faculty in the discipline of Education). Granted, while a degree in teaching does not guarantee effec tive teaching, it does attest to skills that most other faculty have not formally acquired. Dr. Tichenor has been awarded research grants and has published her pedagogical views and experie nces in college science teaching journals. 1 ..........As Dr. Tichenor points out, "several nati onal reports have called for reform in college science teaching (Michael 1989; AAAS Report on the National Science Foundation Disciplinary Workshops on Undergraduate Education 1 990; Moore 1993; Sigma Xi 1990). The AAAS Report (1990), suggests that conventional science courses do not reflect the practice of science "at its best." The report recom mends that pedagogical techniques be directed toward open-ended and investigatory labora tories in order that the teaching of science be driven by real problems rather than cont rived textbook exercises and bring the spirit of scientific inquiry to undergraduate studi es. The idea of student-designed laboratories supports the pedagogy suggested to foster student i nquiry. I implemented these types of laboratories into my physiology course (Tichenor, 1 996)." ..........At her university, it is required that te aching be assessed as "outstanding" for tenure to be granted. A faculty tenure committee rates teachi ng efficacy from (a) peer reviews, (b) student teaching evaluations, (c) yearly department al chair reports, and (d) other evidence submitted by the candidate. Dr. Tichenor notes that "Although the committee's final report in my case was positive overall, tenure was denied bec ause my teaching was considered not 'outstanding.' Its decision was supported by the de an of my college." The letter from the faculty Reappointment, Promotion and Tenure committ ee reads:Dr. Tichenor has an unusual doctorate (my emphasis) focused on college teaching in the sciences and is very interested in non-traditional teaching methods. In spite of, or maybe due to this, her teaching evaluations are mixed. Th e review process is exacerbated when consistently mixed student evaluations are at odds with supportive statements made by peer reviewers. Often, the latter reflect peer appr oval of the philosophy and pedagogical approach of the instructor, while student comments address the instructor's relative success in facilitating the students' learning. Suc h dichotomy of opinion is especially problematic when attempting to evaluate a candidate for tenure. Contradictory statements
41 of 44are quite evident in this candidate's teaching eval uations. Her dean described her teaching effort as energetic, innovative and valuable to the University, but concluded that she had not distinguished herself in this preparation...........In reaching their decision, the review co mmittee reportedly did not do a statistical analysis and comparison of her SEF with other facul ty and selectively picked out certain negative comments by some students...........From the four criteria of (a) peer review s, (b) student teaching evaluations, (c) yearly departmental chair reports, and (d) other evidence submitted by the candidate, only SEF was apparently used for the denial decision. Nearly the entire department faculty supported her and protested as a collective body the denial of te nure decision to the Dean and to the President, to no avail. A subsequent grievance comm ittee failed to reverse the negative reappointment decision...........In commenting on her SEF responses, Dr. T ichenor points out that here evaluations in general physiology over a four-year period reflect evidence of difficulties that students have with innovative teaching:Q: What do you suggest to improve this course?A: 'Rely more on traditional teaching concepts tha n on "progressive" new teaching ideas . 'A. 'In my opinion, we could have learned a lot mor e had we stuck to a conventional lecture format.'A. 'Shifting to a more traditional style would be an improvement, the way it is set up now is a nice idea, but doesn't work as well as its sup posed to.' A. 'More and better notes would be good.' and fina lly, A. 'I would suggest that there be more lecturing a nd less time sitting in a circle and wasting time staring at our neighbors and then when questions are asked we are told to look it up for the exam.'..........On the other hand, some students who have grasped the active learning style more readily may say:Q: What part of the course was most beneficial to y ou? A. 'The fact that we had to learn through our own research, interests, etc. were most beneficial to me. I learned a lot more on my own th rough independent presentations than I would have merely listening to a lecture on the sub jects.' A. 'Learning groups require people to be able to c ommunicate with one another and therefore teach communication skills not usually ta ught in school. Learning groups also help people to learn problem solving skills and way s of approaching problems. Learning these skills are very important. When students leav e college they are usually filled with a lot of book knowledge but not much knowledge of how to approach solving a problem. Learning groups can do this because the student is responsible for assimilating information and helping other students understand i t. Also, group learning is important because it helps students learn to work in a team a nd take responsibility for themselves and others.'A. 'The part of the course most beneficial to me w as the group discussions in class about the material. We are all able to relate to terms an d events that enables the material to be better understood.'..........Dr. Tichenor, perhaps typically, observes : "The association of mixed reviews and innovative teaching is probably familiar to many te achers who take the risk of implementing these methods. By innovative, I mean something diff erent from the "teacher-as-textbook" model of teaching." She also, perhaps typically, th at "I have found that numerical ratings for me are more positive than the open ended questions; and as it turns out, my [numerical]
42 of 44teaching evaluations are not at all 'mixed' but abo ve average." ..........In terms of preparing students for a non lecture teaching format she has engaged in-class discussions "on primary literature, oral p resentations, cases studies presented both orally and written formally, student-designed labor atories, peer evaluation, and other active learning strategies. Experiences students have disl iked the most have been class periods built upon a discussion of previously assigned readings. Even though I give them a list of discussion questions prior to the class, they somet imes fail to read the discussion material." ..........Dr. Tichenor concludes: "there are some d ifficulties with the use of active learning models. Since most students have grown up with the traditional lecture method, many assume that lecturing is the only way to conduct a class. Students may feel cheated by a new approach especially if they are asked to generate t heir own material for class. This innovative style places new and unfamiliar demands on students Students may feel a lack of self confidence and feel overwhelmed with the type of wo rk expected of them." Commentary..........This case demonstrates most of the findin gs outlined above in this paper: (1) assumed validity of SEF, (2) reliance on SEF for administra tive assessment of teaching effectiveness, (3) the subjective interpretation of SEF data by (4 ) untrained evaluators, both administrative and faculty, (5) reliance on SEF over peer evaluati on, (6) ignoring the many variables in the implicit comparison of SEF data involved in the fin al decision that Dr. Tichenor was not outstanding, including (7) student bias variables, (8) selective use of qualitative written student comments, (9) over superior numerical avera ges on the SEF instrument, (10) justifying the interpretation of unacceptable teach ing effectiveness by selective and subjective emphasis on the negative student comments in a mixe d series of comments that include positive ones, (11) assumes the metaphor of student as consumer, (12) that students should have "vote" in what is appropriate teaching methods which in term assumes, (13) students are qualified to do so, and (14) assumes that SEF v alidly measures teaching effectiveness. ..........There is an interesting set of interrelat ed ironies involved in this case. The university, which prides itself on being a teaching institution in apparent contradistinction to research oriented universities, denied Dr. Tichenor tenure a nd reappointment, a faculty who not only (a) has a rare doctoral degree that specifically pr epared her for college science teaching, but (b) was engaged in what the literature suggests is one of the most effective teaching methods---collaborative and student-centered teachi ng----, 2 (c) was creatively trying to improve on those methods, (d) had been awarded a re lated grant, and (e) had published articles on her teaching in professional journals. A final irony is that Dr. Tichenor has accepted a new appointment at large state universit y where part of her duties will be to assess teaching...........Given the teaching orientation of the uni versity, it would seem reasonable to expect support and encouragement for such teaching activit y, especially in the light her numerical SEF score being above average. The question remains as to why this irony exists. While the answer is certainly complex, one of the answers is to retain student tuition dollars. I will deal in more detail with this issue in my final paper. Appendix Endnotes1. Tichenor, Linda L. (1996). Student-designed phy siology laboratories: Creative instructional alternatives at a resource-poor New E ngland university. Journal of College Science Teaching 26 (3), 175-181; Tichenor, Linda L. and Joseph Kakarek a. (1995). An
43 of 44 interdisciplinary teaching approach by integrating cell biology and biochemistry: A scientific learning community at the University of New England Journal of College Science Teaching 25 (2), 144-149. [BACK to document] 2. I might note that I am not a supporter of colla borative, student-centered teaching methods for certain populations of students such as are cur rently found on many colleges. [BACK to document] About the AuthorRobert E. Haskell Robert E. Haskell has been teaching college and un iversity level courses for over twenty years. He earned his Ph.D. from the Pennsylv ania State University in Psychology and Social Relations, his M.A., and B.A. from San Franc isco State University. His areas of research and teaching include: transfer of learning analogical reasoning, small group dynamics. Major publications include: four books, t he latest of which is, The Future of Education and Transfer of Learning: A Cognitive The ory of Learning and Instruction For The 21st Century (forthcoming), and numerous presentations, chapter s, and research articles in national and international journals. He also ser ves on several editorial review boards, and is Associate Editor of The Journal of Mind and Beha vior. He is former Chair, and currently Professor of Psychology, Department of Social and B ehavioral Sciences, University of New England.Professor of PsychologyUniversity of New EnglandBiddeford, Maine 04005UNE Home Page: http://home.maine.rr.com/une/ E-mail: firstname.lastname@example.org Copyright 1997 by the Education Policy Analysis ArchivesThe World Wide Web address for the Education Policy Analysis Archives is http://olam.ed.asu.edu/epaa General questions about appropriateness of topics o r particular articles may be addressed to the Editor, Gene V Glass, Glass@asu.edu or reach him at College of Education, Arizona State University, Tempe, AZ 85287-2411. (602-965-26 92)EPAA Editorial Board Michael W. Apple University of Wisconsin Greg Camilli Rutgers University
44 of 44 John Covaleskie Northern Michigan University Andrew Coulson email@example.com Alan Davis University of Colorado, Denver Sherman Dorn University of South Florida Mark E. Fetler California Commission on Teacher Credentialing Richard Garlikov firstname.lastname@example.org Thomas F. Green Syracuse University Alison I. Griffith York University Arlen Gullickson Western Michigan University Ernest R. House University of Colorado Aimee Howley Marshall University Craig B. Howley Appalachia Educational Laboratory William Hunter University of Calgary Richard M. Jaeger University of North Carolina--Greensboro Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Green Mountain College Dewayne Matthews Western Interstate Commission for Higher Education William McInerney Purdue University Mary P. McKeown Arizona Board of Regents Les McLean University of Toronto Susan Bobbitt Nolen University of Washington Anne L. Pemberton email@example.com Hugh G. Petrie SUNY Buffalo Richard C. Richardson Arizona State University Anthony G. Rud Jr. Purdue University Dennis Sayers University of California at Davis Jay D. Scribner University of Texas at Austin Michael Scriven firstname.lastname@example.org Robert E. Stake University of Illinois--UC Robert Stonehill U.S. Department of Education Robert T. Stout Arizona State University