xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c20009999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00192
Educational policy analysis archives.
n Vol. 8, no. 48 (October 01, 2000).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c October 01, 2000
"Put teaching on the same footing as research?" : teaching and learning policy review in Hong Kong and the U.S. / Orlan Lee.
Arizona State University.
University of South Florida.
t Education Policy Analysis Archives (EPAA)
xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 8issue 48series Year mods:caption 20002000Month October10Day 11mods:originInfo mods:dateIssued iso8601 2000-10-01
1 of 34 Education Policy Analysis Archives Volume 8 Number 48October 1, 2000ISSN 1068-2341 A peer-reviewed scholarly electronic journal Editor: Gene V Glass, College of Education Arizona State University Copyright 2000, the EDUCATION POLICY ANALYSIS ARCHIVES. Permission is hereby granted to copy any article if EPAA is credited and copies are not sold. Articles appearing in EPAA are abstracted in the Current Index to Journals in Education by the ERIC Clearinghouse on Assessment and Evaluation and are permanently archived in Resources in Education "Put Teaching on the Same Footing as Research?" Teaching and Learning Policy Review in Hong Kong an d the U.S. Orlan Lee Hong Kong University of Science & TechnologyAbstract The Research Assessment Exercises (RAEs) in hugely expanded universities in Britain and Hong Kong attempt mammo th scale ratings of "quality of research." If peer review on that scale is feasible for "quality of research," is it less so for "quality of teachin g"? The lessons of the Hong Kong Teaching and Learning Quality Process Rev iews (TLQPRs), of recent studies on the influence of grade expecta tion and workload on student ratings, of attempts to employ agency theor y both to improve teaching quality and raise student ratings, and of institutional attempts to refine the peer review process, all suggest that we can "put teaching on the same footing as research" and include professio nal regard for teaching content and objectives, as well as student ratings of effectiveness and personality appeal, in the proces s. Â…in the winter term of 1992, the Simon School facul ty passed a
2 of 34 resolution, that determined: "[T]o establish a facu lty committee to evaluate teaching content and quality on an on-g oing basis. The intent of the proposal is to put the evaluation of teaching on the same footing as the evaluation of research. The committee will have the responsibility to evaluate both the c ontent and presentation of each faculty member on a regular ba sis to be determined by the committeeÂ…. The output of this pr ocess should be reports designed to provide constructive feedback to faculty and evaluations to be considered in promoti on, tenure, and compensation decisions." (Faculty Meeting Minut es, University of Rochester, William E. Simon Graduate School of Business Administration, February 26, 1992, cited: Brickley and Zimmerman 1997, p. 5, emphasis added).Introduction "Put teaching on the same footing as rese arch?" I can hear my scholarly colleagues ask, "You mean another attempt to credit those who do 'teaching' to the detriment of their 'research'?" No, my friends, what I understan d from the quote in the box is that administration would measure the quality of "teachi ng" on the same basis they demand from "research." In 1997, in response to growing concern a bout maintaining quality of teaching and learning in expanding institutions of higher educat ionÂ—not only in Hong Kong, but worldwideÂ—the University Grants Committee (UGC) (Ho ng Kong), undertook a study of the process by which teaching and learning quali ty was to be evaluated in Hong Kong institutions of higher education. This became known as the Teaching and Learning Quality Process Review (TLQPR) of 1997. A series of institutional studies address ed critical problems bound to arise in an atmosphere of democratic interest in promoting expa nsion of economic opportunity and social mobility by means of wider access to higher education. It also revealed concerns within the institutions and the academic profession at large regarding free exercise of the functions of research and teaching, and their survi val in light of calls for greater public accountability. The UGC panel assigned to conduct the Tea ching and Learning Quality Process Review (TLQPR) of the author's own University expre ssed its concern about the institution's reliance, almost exclusively, on mean quantified scores of student responses to course surveys to assess the quality of teaching and learning. This has also been a significant problem in teaching quality assessment in U.S. institutions since adoption of formalized "student evaluation" mechanisms as the r esult of student protest movements in the late 1960s and the 1970s. No doubt, every teacher likes to be apprec iated by his or her students. Similarly every student has an interest in minimizing risk in evaluation of his or her own course performance. But surely this situation describes a source of conflict of interestÂ—likely on both sidesÂ—as much as a demonstration of the "va lidity" of "student evaluation" of teaching and learning on the theory that "the custo mer is always right." A considerable volume of published resear ch in this area attributes a "validity" to figures that are allegedly replicable because of th eir apparent "consistency and stability." Yet, we are also told that: "The literature on vali dity, though extensive, remains very
3 of 34fluid and not perfectly conclusive." Still other re searchers find that teaching ratings and learning are only "weakly related." Some authorities on the literature tell u s that in part this predicament arises from research concentrating on "construction of instrume nts to yield items and subscales which [are] intended to measure student learning ou tcomes." Yet they also report that others have found "content validity," i.e., "positi ve relationships between student ratings and achievement." Chief factors that would establish "valid ity," these experts tell us, are that evidence suggests that students and instructors see m to agree on what constitutes "effective teaching" and on the qualities of "an id eal professor." This conclusion must be flawed if, as the present author suspects, the lite rature of education theory, and practical experience of student responses indicate that these two do not always share agreement on what "achievement" is, what "good teaching" is, and perhaps even on what "education" itself should aspire to. This article compares the presumption of "validity" of "student evaluation" of teaching quality with the results of recent studies at the University of Washington on the influence of grade expectation and workload on stud ent ratings, on the results of attempts, at the University of Rochester, to employ agency theory both to improve teaching quality and raise student ratings, and wit h the peer review model employed at the City University of Hong Kong. I. Concerns about Quality of Teaching and Researchin Expanding Institutions in Times of Contracting B udgets In the Plenary Address of an Internationa l Conference on the Application of Psychology to the Quality of Learning and Teaching held in Hong Kong, Professor Robert J. Sternberg of Yale University (Sternberg, 1998) warned that universities that have used IQ tests, and other standardized measures of practical intelligence or practical experience as sole standards of university admissio ns, have created self confirming systems. "Only those with high IQs succeed, because only those with high IQs are admitted." The "tragedy" of this self selection as a "social goal," he said, is that "in our emphasis on skills that benefit the individual, we have created societies in which. .the optimization of our individual outcomes at the expe nse of common well-being is becoming ever more pervasive." The point of this paper is similar: if by "Quality of Teaching and Learning" we mean what style of Teaching and Learning is most po pular with our students, or most satisfies the expectations they bring with them fro m their schools, or what they believe most readily facilitates their immediate needs in g etting jobs or obtaining professional certification, that is what they will confirm to us in student ratings. If, on the other hand, our goal is to con tribute to modifying the tendency to the rote learning and recitation method, and to promoti ng critical thinking and general educationÂ—as the Vice Chancellors of both sponsorin g institutions of the Hong Kong Conference, the University of Hong Kong, and the Ho ng Kong University of Science & Technology, urged in their opening addressesÂ—then w e better attempt to balance student input, with reasonable professional efforts to meet those expectations. In response to numerous and growing conce rns about maintaining quality of teaching and research in expanding institutions of higher education, not only in Hong Kong, but worldwide (see, e.g., Clark, 1995)(Note 1 ), the University Grants Committee
4 of 34(UGC) (Hong Kong),(Note 2) has undertaken studies t hat will affect the funding of both the research and teaching sides of university funct ions. Three Research Assessment Exercises (RAEs), studies of the research being don e in Hong Kong universities, were carried out in 1994, 1996, and 1999. A study, not o f teaching and learning quality as such, but of the process for reviewing the quality of teaching and learning in Hong Kong institutions of higher educationÂ—the Teaching and L earning Quality Process Review (TLQPR) (see: Massy ; French)Â—followed in 1997, and a second is proposed for 2000-2001. Both sets of studies addressed critical p roblems bound to arise in an atmosphere of democratic interest in promoting expansion of econo mic opportunity and social mobility by wider access to higher education. Both also reve al concerns within the institutions and the academic profession at large regarding free exercise of the functions of research and teaching, and their survival in light of calls for greater public accountability. The author has already described some of the profession al concerns arising in the Research Assessment Exercises, the RAEs (see: Lee, 1998). Th e following discussion will address similar concerns with respect to the TLQPR. Whereas the author has expressed some reservation with respect to the former (the RAEs), he is generally in agreement with the latter (the TLQPR)Â—and especially as it affects his home university. II. Measuring Teaching and Learning Quality The announcement of an International Conf erence on the "Application of Psychology to the Quality of Learning and Teaching" (Hong Kong, June, 1998), indicated that it "strongly emphasize[d] cutting-ed ge research on the application of psychological principles to improving learning and teaching quality, with the aim of developing a global perspective on learning and ach ieving motivation" (HKU; HKUST, 1997). With research on psychology of teaching a nd learning so highly specialized that a paper submitted to the Hong Kong conference require d at least one of 27 keyword codes to classify it before it could be considered, it wo uld appear that there are at least that many psychological perspectives alone from which to evaluate quality of teaching and learning. No wonder the TLQPR was troubled to find institutions with only student ratings in place. II. A. Standardized Student Ratings SurveysII. A. 1. Sole Use of "Student Evaluations" It is understandable, in light of the mul tiplicity of just the psychological perspectives on teaching and learning, that the UGC (Hong Kong) panel assigned to conduct the 1997 Teaching and Learning Quality Proc ess Review (TLQPR) of the author's own University expressed its concern about our University's reliance, almost solely, on mean quantified scores of students respo nding to semester surveys to assess the quality of teaching and learning in our various courses: "There appears to be little systematic monitoring of teaching and learning qual ity [at HKUST] other than through the [student] teaching evaluation questionnaires... ( TLQPR, 1998, para. 16). This phenomenon is doubtless far more pervasive than onl y at HKUST, or only in Hong Kong. The problem surely reflects not only that uni versities do not know better ways to
5 of 34evaluate teaching, but probably also that they have no clear idea of what they want to accomplish in their courses either. Despite the University response to the TL QPR, this imbalance was still reflected in the subsequent HKUST, Faculty Handbook, 1997 where, after indication that review of faculty performance for retention or promotion w ould involve consideration of "research, teaching, and service," it is made clear that unlike the case with "research" and "service": Reviews of teaching performance rely to a greater or lesser extent on student evaluations . (HKUST, 1997, p. 169, emphasis added). The appearance of being responsive to stu dent concerns is such a pre-occupation with university administrations that follow the Ame rican model, that finding a professionally acceptable method of evaluating what reasonable people recognize to be the essential characteristics of good teaching cont inues to elude them. One of the leading American authorities on "student evaluation"Â—who ha s great hopes of reforming the prevailing systemÂ—concedes privately: Most universities in the USA give lip service to us ing information other than student ratings for teaching evaluation. Howev er, at most places the information obtained by other means (teaching portf olios, peer evaluation) is rarely put into a form that permits ready use fo r evaluation. Consequently most places end up relying primarily on student rat ings. That was precisely the HKUST administrati on's response to the TLQPR. Despite elaborate verbal acknowledgment of the existence of all other means of evaluating teaching in theory, the official "Progress Report t o the University Grants Committee" (2 March, 1998), comes full circle to student ratings, and essentially concedes that at HKUST there is nothing elseÂ—students evaluate teaching. The unive rsity administration then lists "repeat offenders" and "monitors" facult y "accountability": A more formal use of the student evaluation results to monitor Department accountability for teaching performance was introdu ced in the past year. It involves the identification, by the Academic Affair s office, of a group of instructors with particularly poor records of perfo rmance in the previous year. Department Heads were provided with a list of any faculty members in their own Departments who have been so identified, and asked to take appropriate corrective actions to help these instru ctors improve. In subsequent years, Department Heads will have to pro vide, for any instructor who turns up on the list as a "repeat offender," de tails on what actions, if any, were taken, and a statement of planned future actions to address the problems. (TLQPR Progress Report, 1998, p. 2). Surely, every teacher likes to be appreci ated by students. But, is that why our University relies almost exclusively on that one me asureÂ—what our students say about usÂ—to assess our teaching competence? I doubt it se riously. In Hong Kong, as elsewhere, institutional growth accompanied growth of student population. A subsequent dramatic change in the rat e of student population growth, together with declining economic growth, means that there is, now, a heightened awareness of inter-institutional competition for st udent applicants (see: e.g., JUPAS, 1997), which leads inevitably to greater sensitivit y to student tastes and student demandsÂ—doubtless one of the chief sources of the student evaluation of teaching" movement in the first place (cf. Imrie, 1996). Institutional growth, especially in Hong Kong, had been phenomenal in recent
6 of 34years (see: UGC, 1996). We are told, that full time equivalent enrollments (FTEs) in higher education increased from 42,000 in 1990-91 t o 62,000 in 1995-96, or an increase of roughly 47% in only five years, giving rise to c oncerns about how institutions would be able to maintain the quality of teaching and lea rning (HKU, 1997, para. 3), but also about how new institutions would fare in regard to competition for student enrollments. II. A. 2. Why Is There No Other Established Measure ? Over the years, there has been a great de al written about the overemphasis on, and inherent conflict of interest in, "student evaluati on" of professional performanceÂ—for which there is no parallel in any other profession (see: Appendix: "Conflict of Interest," 1974-82, and "Formative"and "Summative" uses, 1970s ). But how did it happen that there was no existing institutional system of measu rement of teaching and learning effectiveness in the first place, that would have a ddressed quality of teaching and learning concerns suitably, prior to the massive ex pansion of the use of "student evaluation"? Ask any college or university teacher and you are bound to get a sense of why: "Academic freedom" (Note 3) (cf. Flexner, 1967 )Â—i.e., from the perspective of what the Germans call, Lehrfreiheit ," the "freedom to teach without interference." Non e of us is particularly fond of having other colleagu es, or administrators, poking their noses into how or what we teach. As a consequence of our profession's conc ern with generations of political and ideological attempts to control what we can do or s ay in the classroom, we have been brought up with an academic legacy of resistance to thought control and, therefore, have developed no mechanism or standard, universally acc epted, for assessing what we do, professionally, or how well we do what we do in the classroom. Consequently, the teaching profession was an easy target for institut ions seeking to satisfy reformist demands in this area in the late 1960s and early 70 s. For this reason, and because of our even greater subservience to those in the education schools, in teaching technology, and in educational testing, we have allowed new profess ions to arise which specialize in telling those of us who teach "how to do it better. (cf. UGC, 1996, p.8) (Note 4) All of us in the academic world know that our students will observe and react to our flaws and weaknesses as much as to our strength s. Yet, when it comes to assessment of our professional performance and abilities, most of us expect the same courtesy in evaluation as is accorded to other professionals (c f. Appendix, "Consumerism," 1976-91) (Note 5)Â—and to our students: evaluation by those who understand what we are atte mpting to do; evaluation by those who have a professional underst anding of what we should do; evaluation without conflict of interest; Â—as well a s, of course, evaluation for effectiveness. II. A. 3. Need for Student Feedback There is no need to convince the present authorÂ—at one time or another a candidate for five university degreesÂ—that students often have valid opinions and cogent arguments. Which one of us, as a student or a facul ty member, has not sat through
7 of 34lectures, and even whole courses, that we would be ashamed to have given ourselves. Simply being boring is a malady that even the best of us suffers from at times. These are concerns, which certainly should not be silenced, a nd perhaps also deserve some greater outlet for discussion on all campuses. The Harvard Crimson Confi-Guide once served a function like this. At one time the independent Harvard University student newspape r gathered and published student comments on their Harvard coursesÂ—a short web searc h revealed that they still do. But that is all it purports to be. It makes no pretense of being a "survey," of being "scientific," or even of being "quantitative" in it s results. It refers to itself as embodying: "Irreverent and honest appraisals of your favorite (and not so favorite) Harvard courses": Be very careful what you do with this gui de. Read. Enjoy. Laugh out loud. The goal of the Confidential Guide to Courses is . to help students by giving them the lowdown on classes. Is it good? Is it a gut? Does the professor give interesting lectures? Are the exams difficult? This guide generally succeeds in providin g that information, but that doesn't mean the articles have all the answers. The y are meant to be helpful, but they can't necessarily be taken at face value. Each article is an opinion piece written by a student who took the class recently. The author can say whatever he or s he wants, no matter how big the chip on his or her shoulder. It's important to remember that different people can come away from the same class with diffe rent impressions. . .( Confi-Guide 1998). Instructors know, or ought to know, that they can get feedback from their students on how effective their teaching style is. Some do t his by survey; some by private chat; some by instinct. But this does not mean that every student comment is good as gold or ought to be taken to heart. A professional person h as to know for himself or herself what to make of such comments. That is not what standard ized testing or survey research does, however. As we all know, you cannot argue wit h the question where you already know that the tested population is so large that th e examinersÂ—or the survey expertsÂ—are only looking for a positive or negative response pre-defined to carry specific conclusory meaning. That may sound like po or survey or test writing. Nevertheless, practically speaking, any teaching ra ting questionnaire will call for these same up or down responses. Professor Wilbert McKeac hie, probably the most authoritative figure in the student ratings genre w rites critically of this technique: . effective teachers come in many shapes and si zes. Scriven (1981) has long argued that no ratings of teaching style (e.g. enthusiasm, organization, warmth) should be used, because teaching effectiven ess can be achieved in many ways. Using characteristics that generally hav e positive correlations with effectiveness penalizes the teacher who is eff ective despite less than top scores on one or more of the dimensions usually associated with effectiveness. Judging an individual on the basis o f characteristics, Scriven says, is just as unethical as judging an individual on the basis of race or gender (McKeachie, 1997, p. 1218). With all respect, there is something disi ngenuous about this admission. Those who have done most to promote the concept of "validity" of measures here admit they may be accurate only for what they measure literally. Then they argue t hat they do not measure what administrators are known to want to apply thei r quantifiable results for. They give
8 of 34teaching assessment committees a howitzer and tell them to use it like a smart bomb: Almost as bad as dismissal of student ratings, . is the opposite problemÂ—attempting to compare teachers with one ano ther by using numerical means or medians. Comparisons of ratings in different classes are dubious not only because of between-classes differe nces in the students but also because of differences in goal, teaching metho ds, content, and a myriad of other variables (McKeachie, 1997, p. 1222). In other words, (1) ratings are considere d "valid," yet, (2) the quantified results relate only to individual performance. That is, the y may presumably be used for "formative" and "summative" purposesÂ—i.e., to advis e that particular instructor how to improve teaching and, ultimately, to advise the personnel committe e how to judge effectiveness of that instructor. However, whereas results are e xpressed in quantified form, the scores for identical qualities are to be considered "not comparative." It may be that schools with great sophist ication in the use of student survey scores express such a qualification as to how student nume rical ratings are to be appliedÂ— publicly. In practice, however, I do not see any he sitation in considering an 80% rating of one instructor equivalent to an 80% rating of an other. At the author's University, for example, both get congratulatory letters from the D ean. Similarly, with a 40% rating for two years in a row, any instructor is bound to be c onsidered a "repeat offender." Accordingly, with regard to survey sophist ication at HKUST, we are forewarned: "Note that the descriptions of the ratings should n ot be taken literally." (HKUST, 1998) Read further, however, and one is told that: "The a verage scores for all courses is in the range 60-70, so that the 'average' course has an 'a bove average' rating (HKUST, 1998)." Does this mean that our administrators ar e so sophisticated about statistical and survey measures that they count these scores for no more than a simple exercise in measuring student opinion? Not on your life. We alr eady know from Section II.A.1. above, that "Reviews of teaching performance rely t o a greater or lesser extent on student evaluations ," and "repeat offenders" will be dealt with. Let me say first of all that the Hong Kon g University of Science & Technology would rate itself as among the top universities in AsiaÂ—if not in the world. But "the average scores for all courses," judged by our stud ents, we are told here, are rated between D+/Cand C+/B-. Heaven help the instructor s whose average grades for their own students actually looked like that! But perhaps you may say that our students are more honest about us than we are about them. What is the source of this disparity in r atings between faculty of students and students of faculty? Grade inflation can also have varying sourcesÂ—since, according to this report, at least, it is not simply producing h igher faculty ratings. Presumably the faculty believe that they are achieving better resu lts with students than students give them credit for. Does it go too far to suggest that the two may have different concepts or goals of teaching and learning in mind, and that th at is what their respective grades and ratings scores are measuring? This disparity in concepts and goals of e ducation will be dealt with further below (at Section II.A.6). In this connection, however, l et us take a closer look at something else Wilbert McKeachie alludes to in passing in his paper in the "Current Issues" section of the American Psychologist (November, 1997) devoted to controversy over findi ngs in the students' ratings research. McKeachie is willin g to admit exactly the inherent contradiction of goals and objectives in student ev aluation of teaching:
9 of 34There are . two problems that detract from the usefulness of ratings for improvement. . Many students prefer teaching th at enables them to listen passivelyÂ—teaching that organizes the subject matte r for them and that prepares them well for tests. . .Cognitive and motivational research, however, point s to better retention, thinking, and motivational effects when students ar e more actively involved in talking, writing, and doing. This inherent conflict of interest, notwi thstanding, McKeachie justifies the continued reliance on the ratings survey system on the basis of what it is conceptually intended to achieve, i.e., "feedback": The second problem is the negative effect of low ra tings on teacher motivation. . .A solution for both of these probl ems is better feedback (McKeachie, 1997, p. 1219:1). Only one set of convictions can conceivab ly attempt to justify knowingly relying on a system of assessment that you concede is based on conflict of interest: (1) the persuasion that an institutional system of measurem ent of teaching effectiveness is mandatory for personnel decisions; and (2) that no professional measurement compares in "validity" (as we shall see shortly, he says as much) with student ratings. Here, I suspect we do have the root of th e dichotomy in the grading and ratings problem: "Many students prefer teaching that enable s them to listen passively. .and that prepares them well for tests," and judge faculty on that basis. On the other hand, many faculty members are persuaded that "retention, thin king, and motivational effects" are greater "when students are more actively involved i n talking, writing, and doing." I suspect that they also tend to grade on the belief that they are achieving results of this kind. While each scoring system may be perfectly ho nest as far as what it purports to measure is concerned, as McKeachie says, ". .the two problems detract from the usefulness of ratings for improvement," i.e., for t he much vaunted "formative" effect. McKeachie, further on, gingerly admits, the two sys tems simply do not relate to each other: "However, student ratings are not perfectly correlated with student learning. . (McKeachie, 1997, p. 1219: 2) The "solution for both of these problems [may be] better feedback." However, while educational technologists may believe that th ey are promoting feedback, there is in reality little communication about these matters in large public institutions, either between faculty and students, or between each among themselves. Student ratings are an educational technology product that, regardless of the mildly qualified claims of those who argue "validity," provide academic administrato rs with what purports to be quantitative measurements of teaching effectiveness Â—and that is precisely how the survey technologists expect them to be used: But what about the use of student ratings for perso nnel decisions? Here again the authors of the articles in this Current Issues section [of American Psychologist November, 1997] provide reassurance. All of the a uthors (and I join them) agree that student ratings are the sin gle most valid source of data on teaching effectiveness. In fact, as Marsh a nd Roche (1997) point out, there is little evidence of the validity of an y other sources of data. (McKeachie, 1997, p. 1219:2).
10 of 34II. A. 4. Attractiveness of "Student Evaluation" Su rveys The beauty of student ranking surveys for a college or university administration is that they are cheap, and that they purport to offer exact quantitative and, like it or not, comparative figures between faculty members. On their face, th ey appear to be the unqualified ranking by a representative sampling of students taking a courseÂ—without need for discursive explanationsÂ—moral, legal, or p rofessional. The president of the author's university also reports that instructors h ave been fired because of low ranking in student evaluation surveys: ". . In terms of syst em, all courses are evaluated by students and the results are disclosed on the World Wide Web ; unsatisfactory teaching performance has resulted in many cases of contract nonrenewal or salary bar. . ."(Woo, 1997) In a note in reaction to the foregoing ob servations, the President seems to take a more balanced view: "We certainly cannot just rely on student evaluation scores Good teachers often get remembered only long after the s tudents have graduated." This was despite subsequent publication of the "Report to th e University Grants Committee" (2 March, 1998) cited above. Obviously the President h as sensibilities as a teacher as well as an administrator.II. A. 5. Crucial Variables and Consistency and Sta bility of Results With the exception of some actually somet imes crucial variables(Note 6)Â—prior subject interest, class size, time of day a course is taught, rank of the instructor, grades expected, and course load which educational measure ment investigators acknowledge affect student ratings of faculty in some way (cf. Appendix)Â—there have been a number of student ratings researchers who have argued that the student survey system is "consistent and stable." That is, they argue, simil ar ratings are seen to be attributable to the same faculty members, regardless of the subject matter they teach, and from year to year. Moreover, some investigators attribute close correlations to more professional appearing reviews by peers, administrators, and alu mni (cf. Appendix). Yet, while such correlations between resu lts of different groups of survey subjects may exist at times, other researchers tell us that, teaching ratings and learning are only "weakly related" (Gramlich; Greenlee, 1993). To the extent that this is true, it would tend to link the rating with the faculty member's t eaching style or personality, and would tend to obviate one supposed major purpose of ratin gs, i.e., that they are "formative," that they can be used to assist the instructor to a chieve improvement either in the teaching itself, or in its reception by students. Nevertheless, some researchers in this ar ea attribute a "validity" to figures that are supposedly replicable because of their apparent "co nsistency and stability." Yet, the same authority tells us: "The literature on validit y, though extensive, remains very fluid and not perfectly conclusive" (Arubayi, 1987, p. 27 0). In what A.G. Greenwald has called "the be st of the largest group of construct-validity studies" (Greenwald, 1997, p. 11 84) there seemed to be evidence to support correlational validity between student rati ngs in multisection courses. Here the results of student ratings were compared for differ ent instructors giving different sections of the same course, where similar or ident ical examinations were given to different sections with students with similar abili ty (Abrami; Cohen; d'Apollonia, 1988). The present author, who has, heretofore, limited himself to reviewing the literature on this subject, must interject at this point that he has observed completely unforeseen but sharply conflicting statistical results on this particular kind of experiment. The
11 of 34II. A. 6. Is There Validity If There Is No Agreemen t on Outcomes? The same authority on the literature who argued "validity" because of apparent "consistency and stability" tells us that part of t he predicament of "fluidity in research results" lies in the research concentrating on "con struction of instruments to yield items and subscales which were intended to measure studen t learning outcomes" (Arubayi, 1987). He reports that others have found "content v alidity," i.e., "positive relationships between student ratings and achievement" (Arubayi, 1987). Other factors that would establish "valid ity," this expert tells us, are that: Evidence suggests that students and instructors seem to agree as to what leads to good teaching. Similarly, . very close similarity between the perceptions of students . on what constitutes a[n] "ideal professor." If students can agree with their instructors as to what constitutes effective teaching and the qualities of an ideal professor then one m ight be sage to conclude that students are mature enough to rate or evaluate instructors and instruction (Arubayi, 1987, p. 270f. emphasis added ). Reliance on near-exclusive use of "studen t evaluation" of teaching is bound to arouse concern for those of us in Hong KongÂ—where t here are also faculty members to be found, who, while deeply attached to the region, their students, and the subject matter of their fields do not share agreement with their students on what "achievement" is, what "good teaching" is, and perhaps even on what education" itself represents. In no way does it dispose of the issue to say that those faculty members are themselves out of joint, and that the situation wil l be cured by localizing expatriates out and putting local people in their place. The defini tions of "education" and "achievement" are not simply heritage and culture-bound. An insti tution like the Hong Kong University of Science & Technology is overwhelmingly staffed b y PhDs from the world's leading universities. Are we to believe that they are prepa red to abandon the educational values they hold for themselvesÂ—and upon which they want t heir own research and career accomplishments to be judgedÂ—when they instruct the ir students? "We ought to teach every course the same way we would teach majors in the United States," our University President Woo Chia W ei is reported to have opinedÂ—somewhat at odds with what as an administrat or he seems to be telling us. Are we to believe that there is one set of values for t he world, and another for our own students? How would I teach in the U.S.? Like an Iv y League graduate would be expected to: Evaluating how we GATHER FACTS; Establishing how we DEFINE A PROBLEM; IDENTIFYING ISSUES and METHODS leading to various S OLUTIONS of a problem; STRESSING REASONING over factual information; STRESSING HOW WE REACH CONCLUSIONSÂ—NOT OPINIONS (Le e, 1997). Does this form of teaching offer an advan tage to Hong Kong and to China? Many of us believe it doesÂ—not least of all the Vice Cha ncellors who keynoted the international conference in Hong Kong on Teaching a nd Learning Quality.
12 of 34 By no means do all Western educated schol ars in Hong Kong pursue this method. But, those who do, know that this style of teaching is not the mainstream tradition of the region. The instructor dedicated to this approach i s, therefore, faced with the deliberate choiceÂ—of attempting to bring his or her students o ut of their protection of silence and anonymity to develop discursive verbal abilities (L ee, 2000) orÂ—of abandoning what he or she believes is both sound practiceÂ—and attainab le with persistenceÂ—in order to pursue the more accepted purely didactic approach t hat will gain him better ratings. Many of our students are afraid that depa rture from their accepted learning habitsÂ—and how such a change in them will be receiv ed by their peersÂ—will create a disadvantage to them in competing: first with their own classmates for grades, then with their fellow graduates, for jobs. They are, therefo re, more at home with the standardized testing and curved grading results aspect of the Am erican heritage, believing that they must receive and repeat exact information to be "te stable," and that it is, therefore, "unfair" to them to introduce new standards of teac hing and learning that suddenly give away their "place on the curve." These conclusions are not based upon a fo rmal scientific survey, but do derive from years of listening to student comments, both p ersonal and anonymous. However, more formal case studies in Hong Kong have produced similar results. In a case study on law student learning in English at the University o f Hong Kong, for example, three language use researchers conclude: "Â…by the time st udents reach the end of their secondary education and probably well before that p oint, they have internalised a set of unstated survival strategies for choosing which lan guage to use [Cantonese or English] or, indeed, whether to communicate at all in a give n situation." (Corcos; Churchill; Lam, 1998). They refer to a set of implicit socio-cul tural rules derived by an earlier researcher in this area: If you want to talk to another student in a friendl y way and without seeming superior, you must not use English; Do not show off your language proficiency in front of your peers; You should deny such proficiency if anyone praises you; You must hesitate and show difficulty in arriving a t an answer when called upon by the teacher; You must not answer the teacher voluntarily or enth usiastically in English; You must not speak in fluent English (Wong, 1984, a s cited). Similar defenses to class response techni ques apply in other parts of the world (even in some parts of the U.S. where "class partic ipation" is established doctrine), however, in Hong Kong, university instruction in En glish, a foreign language, though still the basis for official and business communica tion, serves as cover for non-participation. Actually response in Cantonese i s no betterÂ—if students are not accustomed to verbal reasoning. II. B. Measurement and Enhancement of Teaching by P eer Review Of course you listen to your studentsÂ—and you adjust to whoever comes. But is that all there is? If better teaching and enhanced learning are desired, experience tells us that they can be encouraged or cultivatedÂ—the eleme nts are all well-known. (Note 7) We may agree that there is a difference b etween encouraging enhanced quality of teaching and learning, and merely conducting a surv ey to see whether teaching conforms to students' established expectations. However, enc ouraging better teaching by whatever
13 of 34method may involve changing incentives and investin g greater resources, and may, therefore, discourage administrators from pursuing such a course too vigorously in times of contracting budgets. But testing is cheap, and a ppears to satisfy the student constituency. II. B. 1. Changing Incentives from Research to Teac hing The process by which incentive structure can be changed in a university environment has been described in the literature in the same terms as changes in incentive structure in business. This process was e mployed in efforts to reinforce the teaching and learning environment at the William E. Simon School of Business Administration at the University of Rochester, and apparently in other leading American business schools, when the administrations determin ed that environmental factors affecting them, leading to competition for public f unding and for student applicants, were similar to those described at the outset of th is paper as leading to the Research Assessment Exercises (RAEs) and Teaching and Learni ng Quality Process Review (TLQPR) in Hong Kong (see: Brickley; Zimmerman, 199 7Â—the following relies on that report). The birth rate has long been declining in the United States, leading, over the years, to declining numbers of children in schools, and, a s a result, declining numbers of students in colleges and universities. In the late 1980s this reduction in numbers of applicants was also felt in the graduate schools of businessÂ—combined with a lower demand for MBAs as a result of economic conditions. Competition for applicants among American business schools first led to enhanced spending on public relations, then on scho larships, and, finally, on enhanced spending on incentives to improve the teaching envi ronment. At about that time, Business Week began publishing a biannual list of top-20 busines s schools, and asked graduating students and recruiters to rate the schools according to opportunities 1) eith er in class or in extracurricular activities, and 2) t o nurture and improve your skills in leading others (Byrne; Leonhardt, 1996). Focus on Research emphasis, so important in the competitive standing of former years, received no special mention, and seemed to h ave fallen by the wayside in a competition fired expressly by students' interests. Concern with media rankings seems to have b een quite intense. The Simon School at Rochester, was for example, listed in the Business Week top-20 business schools in 1988, and 1990, but not in 1992. As a result, a num ber of business schools, including Rochester, were led to serious reconsideration of t heir academic programsÂ—emphasizing enhanced incentives to improve teaching. A faculty report at Rochester called for efforts to: . increase teaching incentives, and make the ch ange clearly visible to applicants, students, administrators and faculty ("MBA Program Status Report," University of Rochest er, William E. Simon Graduate School of Business Administration [June 14 1991] cited: Brickley and Zimmerman, 1997. Cf. also: "The Report of the T ask Force on Improvement," M.I.T., Sloan School of Management [M ay 7, 1991]). To meet the demands of that situation, the School of Business Administration at the University of Rochester determined to become mo re competitive in the market for business school applicants. In the process, they de termined to enhance their standing as a
14 of 34top-20 business school by seeking to attract studen t applicants by an enhanced teaching and learning environmentÂ—a significant change from the emphasis on advanced Research in the 1980s, when the applicant level was strong and rising. II. B. 2. Changing to a Peer Review Measurement Sys tem It is interesting to observe that at about the same time as The Simon School at Rochester was engaged in the process of re-assessin g its system for teaching evaluation, a similar process was underway at the City Universi ty of Hong KongÂ—for different reasons. In 1993, the year before full university st atus was conferred on the then City Polytechnic, the Academic Board (now the Senate) es tablished a Quality Assurance Committee which laid down guidelines for, among oth er things, teaching evaluation (QAC, 1993). While emphasizing that teaching evalua tion must include student feedback as a substantial primary element in the pr ocess," the Guide makes clear that teaching evaluation must also be an institutional d etermination: conforming with stated policy and principles ," based on all available evidence ," fully documented ," and accessible ": Teaching evaluation must conform to the Principles stated. . Teaching evaluation schemes must be documented. . The primary purpose of any teaching evaluation scheme should be to improve teaching. Teaching evaluation schemes must include student feedback as a substantial primary element. . Wh ere a scheme is designed to evaluate teaching for assessment purposes, evide nce must be included from other appropriate sources such as peer review, individual reflection, expert observation, etc., in addition to student feedback. . Those entrusted with using the information from teaching evaluation s for decision-making related to career progression should be skilled in interpreting and drawing together the different sources of information. . In all cases the staff member being evaluated must be fully consulted. . Provisions should exist for regular review of the . evaluation sc hemes and of the institution's evaluation procedures (QAC, 1993, p. 1f.) (The first paragraph is taken from "polic y," the remainder from "principles." The Guide is undated, but acknowledge s Hall; Cedric; Fizgerald, 1994, as the source from which its princ iples were developed.) This policy has been applauded in the TLQ PR at City University. Yet, both from the TLQPR, and from faculty comments, one gets the impression that this system has not been fully implemented at City University either. In both cases cited above, recourse to a p eer review measurement system was motivated by new roles of the institutionÂ—calling f or greater attention to the teaching and learning mission. On the other hand, both insti tutions (or their faculties?) were remarkably sensitive to the implication that either matters of professional competency or career decisions might be driven purely by reaction to data arising solely from student inputs. Clearly, both institutions were acutely att entive to the importance of maintaining ultimate institutional responsibility for professional decision-making, and correspondingly, professional information gathering As a result of the situation described in the foregoing section, the Simon School made a significant decision to change from dependen ce solely on the student quantitative
15 of 34rating system for course and instructor, to a highl y organized qualitative peer review system. Based on the evidence of the cited study that teaching ratings and Learning was only "weakly related" (Gramlich; Greenlee, 1993), a nd on the concern that "some instructors game student ratings by reducing course work loads and cutting analytic content," or "...hand out cookies, bagels, and wine and cheese the last day of class when student ratings are administered" (Brickley; Zimmer man, p. 5), in the winter term of 1992, the Simon School faculty passed a resolution, that determined: [T]o establish a faculty committee to evaluate teac hing content and quality on an on-going basis. The intent of the proposal is to put the evaluation of teaching on the same footing as the evaluation of r esearch The committee will have the responsibility to evaluate both the c ontent and presentation of each faculty member on a regular basis to be determ ined by the committee. . The output of this process should be reports de signed to provide constructive feedback to faculty and evaluations to be considered in promotion, tenure, and compensation decisions. ("Faculty Meeting Minutes," University of Rochester William E. Simon Graduate School of Business Administration [Februar y 26, 1992], Brickley; Zimmerman, p. 5 emphasis added). In the case of City University of Hong Ko ng, the faculty Quality Assurance Committee (QAC) took a more systematic approach, in a manner befitting its role in determining future guidelines for policy of a major university, it devoted its early efforts to outlining statements of principles on quality an d quality assurance. While these principles clearly were to acknowledge the role of students and other "stakeholders," e.g., employers and professional bodies, they were not to be construed in such a way as would utterly disenfranchise the teaching faculty: "The systems of quality assurance must be capable of operating independently of the p articipation of particular individuals and have an integrity which enables judgements to b e formed that are unaffected by other managerial imperatives." (QAC, 1993, p. 4) What is recognizable from the City Univer sity statements and principles is that these derive from faculty deliberations and are not simply imposed from above. In this respect, they are unique in circumscribing the acti vities of the whole institution: Quality assurance policies should embrace all activities of the institution (QAC, 1993, p. 4). These principles not only recognize the institution 's public roles and obligations to student's and other "stakeholders," they declare th at they will apply "in all aspects of the staff's role including teaching, research, and admi nistration" (QAC, 1993, p. 4). II. B. 3. Implementation of the Peer Review System As long as an informal quantitative stude nt rating of course and faculty member was the only goal, it could be accomplished with co mparative ease by passing out and collecting questionnaires at the end of the semeste r. If the evaluation of teaching were now to be put "on the same footing as evaluation of Research," then an objective means of qualitative measurement of the work of the cours e and the faculty member had to be found. For this purpose, the Rochester Business Sch ool faculty established a "Committee on Teaching Excellence" (CTE). The Commi ttee developed a set of procedures, following the example of psychoanalysis by first setting about evaluating six of the courses taught by members of the Committ ee itself:
16 of 34 By the end of the 1993 academic year the CTE established a process, that except for minor changes, remains in effect th rough 1997. This process includes benchmarking the class with other top business schools: using a two-person evaluation team to observe lectures review material and conduct student focus groups ; video taping several classes; full committee discussion of the course ; and a final written report which goes to the instructor and the Dean's office and which is inclu ded in the faculty member's personnel file. . In addition to evaluating nine indi vidual courses each year, the CTE held several seminars to discuss teaching. Thes e forums allowed faculty to share their experience on various topics including: teaching cases, using computerbased presentation packages, and ma naging class discussion ("cold" calling). These seminars in the 1995 academic year were the first faculty seminars devoted to teaching (Bri ckley; Zimmerman, 1997, p. 5, emphasis added). Evaluating the teaching processÂ—involving analysis of quality of inputs or preparation and materials, form of classroom delive ry, and measurement of effect upon students and their achievementÂ—is necessarily a tim e intensive effort for all Committee members. The opportunity cost to evaluate one cours e was estimated at (US)$15,000. In the case of the City University of Hon g Kong, as well, the section of the CityU Policy and Guide for Developing Teaching Evaluation Schemes dealing with peer review specifically refers to evidence drawing on t he following topics, and calls for citation of evidence in each case: subject expertise : (up-to-dateness of content material); 1. module design : (relationship between content and objective, sequ ence, etc.); 2. enhancing student learning : (activities included, assessment requirements, et c.); 3. module organisation : (variety of experiences, reading lists, availabil ity of materials, etc.); 4. supporting departmental goals : (from departmental objectives); 5. research supervision (QAC, 1993, sec. 2.2.2). 6. The guidelines conclude with the admoniti on that any peer review scheme must emphasize "expertise," "integrity," and "training" (QAC. 1993, sec. 2.2.2), both in the collection of data and its interpretation. No doubt this system, as well, must require a considerable "opportunity cost" that the institutio n considers is justified.III. An Assessment System that Dwells on the Past?Or Education Policy with Increased Incentives for T eaching? It should not be necessary here to enumer ate the extent of the literature on opinion survey research. Neglect of comparative validation of an investigator's particular empirical method, or neglect of the potential impac t of preexisting biasesÂ—both among the research subjects, and among the investigatorsÂ— would ordinarily arouse sufficient consternation among scholars of the field that such results would receive little credibility. As the foregoing has suggested, however, there has been little attempt to obtain general agreement on the standards of psychometric validity of student ratings of
17 of 34teaching despite the fact that investigators are we ll aware that their findings are being put to practical use in so-called "formative" and summative" evaluation of members of their own profession. Very simply, there appear to be two camps : 1) Those who treat student ratings as a reasonable "input" to "formative" and/or "summative teaching assessmentÂ—along with all other professionally accepted indices; and 2) T hose who consider that student ratings are the "valid" and sufficient basis for "formative and "summative" evaluation of teaching by themselves. Institutions that employ st udent ratings alone tend to be interested primarily in quantitative and comparativ e resultsÂ—i.e., numerical values that can be employed across the board to gauge and rewar d faculty performance. Within the context of the empirical resea rch reports, however, little interest is shown in qualitative criticism of the formulation o f survey questions in student opinion surveysÂ—and little attention is given to the impact of value systems in interpretation of survey questions. The foregoing has shown that lead ing authorities in the area: e.g., Scriven and McKeachie recognize the danger of confu sing "characteristics that generally have positive correlations with effectiveness" with either "effectiveness" per se, or as all there is to be said for good teaching, or, more imp ortant, what teaching policy should aspire to. Recognizing the needs of students in acqu iring the skills to comprehend and master the subject matter of their field, and respo nse of the instructor to the needs of a particular body of students is certainly one aspect of good teaching. But formation of forward looking education policy, cannot endlessly avoid the necessity of considering the obligation of the instructorÂ—and of the institu tionÂ—to the public and to the profession of teaching, to pursue clear educational goals which reflect the ambitions of our civilization and not simply those of any one ge neration of students whose priority is solely admission to professional qualification. III. A. Haskell's Survey of the Literature of Pscho metric Validity of Student Ratings and of Whether There is a Cause of Action f or Violation of Academic Freedom for Reliance on Student Ratings in Personne l Decisions to the Exclusion of Everything Else The serious omission of a qualitative dis cussion of psychometric validity of student ratings has been addressed in a comprehensi ve, at times rambling, series of four articles, a study of the literature of student rati ngs theory by Robert E. Haskell, Professor of Psychology at the University of New England in t he United States (Haskell, 1997a,b,c,d). Haskell is clear about his own personal p osition, "SEF [student evaluation of faculty] is deceptive regarding its negative implic ations for higher education" (1997b, p.3), and that the present system ". .sets up a c onflict of interest between the instructor and quality of education. .[the] opposite of the original intent of SEF which was the improvement of instruction" (1997a, p. 16). It is i nescapable that these considerations must return to the forefront of academic discussion at the turn of the century as democratization of access to higher education, now combined with increasing budgetary constraint, forces institutions to concentrate on i ssues of quality and accountability ." Haskell's contribution lies in providing a kind of qualitative comparative survey of the ratings literature. He also recognizes that imp roper use of student ratings can result, and has resulted, in litigation over abuse of proce ss in renewal, salary, and tenure
18 of 34decisions. He has attempted to study the possible r emedy of use of the issue of violation of "academic freedom" in such litigation where liti gants have attempted to identify academic freedom with freedom of speech, which enjo ys unqualified protection under the American Constitution. Haskell points out the conspicuous disreg ard of faculty rights throughout the period in which reliance on student ratings of facu lty has been associated with student and minority rights causes: "A recent booklet on The Law of Teacher Evaluation (Zirkel, 1996) contains no mention of SEF cases. No r does a recent comprehensive legal guide for educational administrators (Kaplin and Le e, 1995), nor do other reports (Poch, 1993) on the legalities of academic freedom, tenure and promotion" (Haskell, 1997b, p. 2). Haskell's insight into the value of consi dering how the courts have reacted to cases based on student ratings could have led to a more s ignificant contribution if his results had been more systematic and analytical. The second article, particularly, would have benefited from closer collaboration with a person t rained in handling this kind of material. The colossal labor represented by this va st qualitative review of the literature of the field, notwithstanding, the value of the aut hor's discussion of judicial opinion, is practically limited to the enumeration of 78 cases where the issue of over reliance on, or neglect of, student ratings has been raised. Some o f the cases are properly cited, others are not. High level court reports are listed side b y side with low level. There is no attempt to distinguish between where reference to r atings would support the faculty member's case but are ignored, and cases where nega tive results are relied on to make decisions that should have been supported by profes sional opinion. There is little analysis of whether arguments for use of ratings on either side were well-taken. There is, furthermore, no distinction mad e between decisions based upon use of ratings, and mere obiter dicta or comments in passing mentioning ratings. Nevert heless, from Haskell's investigation of this problem we can begin to recognize that the concept of "academic freedom" does not seem to have been de veloped very far by the American courts themselves as a First Amendment (i.e., freed om of speech) category in connection with student ratings. (Note 8) On the other hand, t here appear to be a number of efforts to combine complaints supported by reliance on stud ent ratings with a theory of discrimination on the basis of sex or raceÂ—which is statutorily based and has a more consistent jurisprudence. Courts have developed mea sures such as "disparate impact" of policies on protected groups to support claims of i llegal discrimination. Haskell makes the valid point that wherea s some lower courts have, in the past, distinguished between "freedom of speech," that was protected, and "action" in connection with expression of opinion, that was not protected (notably in Lovelace v. S.E. Mass. Univ ., 793F.2d 419 [1st Cir.1986] ), the U.S. Supreme C ourt has overtaken them (Haskell, 1997d, p. 5). In 1989, the U.S. Supr eme Court ruled that flag burning could be seen as political expression, and would, i n that sense, be protected under the First Amendment ( Texas v. Johnson 491 U.S. 397 ; see also: United States v. Eichman 496 U.S. 310 ). On the other hand, there appears to be no American case law expressly protecting what the Germans call Lehrfreiheit ," i.e., freedom to teach with respect to methodology, coverage or organization of material, and grading. Indeed the cases cited suggest that some courts would allow interference i n this area on the basis of institutional or public policy. A teacher's right to say, or teach, what he or she believed to be professionally defensible would be protected. Of course, the requi rement that a faculty member's expression of opinion be professionally defensible is clearly a limitation that would not
19 of 34apply to othersÂ—students, for example, or student r atings. Students, and other interested members of the public, can say whatever comes into their headsÂ—providing that it is not outright defamation. Perhaps because of lack of a sufficient n umber of appeals one does not learn whether any of these cases has led to a rule adopte d either in the American state or federal courts. However, we do learn that numerous judicial reservations can be cited against relying on student ratings aloneÂ—to the exc lusion of professional opinionÂ—in faculty personnel decisions (Haskell, 1997b, passim ). Impressively, the Canadian examples cited seem to stress the need for balance between student ratings and professional assessment more than the American case s. At the same time, we see the courts' hesi tation to interject themselves into institutional decision-making. Haskell quite accura tely characterizes the courts' unwillingness (unlike juries) to inquire into subst antive criteria an institution applies for personnel evaluation as long as the procedural safe guards appear adequateÂ—i.e., that the standard is applied generally to all faculty member s (Haskell, 1997c, p. 4)Â—even though such criteria may appear to be incompetent when app lied for the purpose. That was the case for a schoolteacher previously renewed over a 10 year period but terminated because her pupils ranked too low on the Iowa Test of Basic Skills (ITBS) and Iowa Test of Educational Development (ITBD). If measuring tea ching effectiveness of the teacher on the basis of the performance of her pupils in st andardized testing could be shown to be totally absurd or incompetent, the teacher might have been successful in thwarting dismissal. On the other hand, if a political decisi on, or public policy, calls for such a measure of teaching effectiveness, courts tend to l eave judgment to the political arm, public policy, or simply institutional practice. Yet, we must take care in characterizing judicial perspective. For, whereas course content and grading standards may be treated as a matter of institutional policy (Ha skell, 1997d, p. 7), we also hear: "assignment of a letter grade is protected speech" (Haskell, 1997d., p. 6): [B]ecause the assignment of a letter grade is symbo lic communication intended to send a specific message to the student, the individual professor's communicative act is entitled to some measure of Fi rst Amendment protection. ( Parate v. Isibor 868 F.2d 821, at 828 [6th Cir. 1989] )(Note 9) More disturbing is an allegation of profe ssional incompetence in use of ratings by institutions which should know better, such as: According to Thompson (1988, p. 217), "Bayes Theore m shows that anything close to an accurate interpretation of the results of imperfect predictors is very elusive at the intuitive level. Indeed, empirical studies have shown that persons unfamilliar with conditiona l probability are quite poor at doing so (that is interpreting ratings resu lts) unless the situation is quite simple." It seems likely that the combination of less than perfect data with less than perfect users could quickly yield co mpletely unacceptable practices, unless safeguards were in place to insur e that users knew how to recognize problems of validity and reliability, und erstood the inherent limitations of ratings data and knew valid procedur es for using ratings data in the context of summative and formative evaluatio n (Franklin & Theall, 1990, pp. 79f.) (Haskell, 1997c, p. 6). It asks a great deal of a court to assess an argument of this kind. Yet, there appears
20 of 34to be accumulating evidence that educational instit utions, which are capable of evaluating psychometric standards, choose to ignore such weaknesses in favor of the efficiency of the continued unquestioned reliance o n student polling results. All-in-all, we see diversity of judicial opinion may be compara ble to the diversity of opinion in the psychometric survey discipline. Yet, what does appe ar from these citations is that while courts have not equated freedom of speech with acad emic freedom in all its manifestations, nor created a protected zone around assessment of teaching effectiveness, they have, from time to time, expres sed clear reservations about reliance on student ratings in personnel decisions to the ex clusion of everything else. III. B. Should Forward Looking Education Policy Con centrate on Goals and Incentives to Improve Teaching? The two authors of the study of the report on the shift to peer review of teaching at the Simon School of Business at Rochester tell us t hat there was a very rapid adjustment to changes in incentivesÂ—that was reflected by a co rresponding rapid rise in student teaching evaluations: During the 1990s, there was a substantial environme ntal shift that increased the importance of teaching relative to academic res earch at top business schools. The Simon School, like other business scho ols, changed its performance evaluation and reward systems to increa se the emphasis on teaching. One might have expected the effects of th ese changes to be gradual, given the human capital constraints implie d by the composition of existing faculty. Our results, however, suggest a very rapi d adjustment to the changes in incentives. Average teaching ratings increased f rom about 3.8 to over 4.0 (scale of 5) almost immediately. Teaching ratings c ontinue to rise after the changes in incentives, suggesting additional learni ng and turnover effects (Brickley; Zimmerman, 1997, p. 21). They believe this dramatic effect was owed to incentives rather than peer review. Whereas they had found that: "Some evidence suggest s that research output fell" (Brickley; Zimmerman, 1992, abstr.) they continue t hat, thereafter: ". .we find some evidence that faculty substituted research for teac hing following the incentive changes" (Brickely; Zimmerman, 1997, abstr.). On the other hand, these authors find that, in the long run, peer review may support "quality"Â— the declared objective of efforts in Hon g Kong associated with the TLQPR, and with the City University QAC. But they are forc ed to recognize an inherent conflict of interest when it comes to recognition of these e fforts in student ratings: . Intense peer review of classes had no obvious effect on either teaching ratings for the evaluated classes or subsequent cla sses. One possible reason peer review is not associated with higher st udent evaluations in the reviewed or subsequent courses m ight be due to the complementary nature of performance evaluation and compensation [citing: Milgrom; Roberts, 1995]. The Deans' office did not formally announce that CTE reviews would explicitly enter the compensation policy of the School. An alternative explanation of the lack of statistic al association is that "good" teaching as perceived by faculty evaluators and by students are
21 of 34orthogonal. For example, faculty evaluations value courses with more intellectual rigor and greater work loads, whereas students value courses with more current business content, more entertaini ng lectures, and lower work loads (Brickley; Zimmerman, 1997, p. 22, emphasis added ). The turnaround process is described for u s in terms of agency theory by the two faculty members of the Simon School: Agency theory suggests that the principal is intere sted in both the amount of effort exerted by the agent, as well as the agent's allocation of effort across tasks. As environments change, firms are expected t o adjust incentive contracts on both dimensions. For example, the 1990 s witnessed significant developments in information technology, which lower ed the costs of measuring performance. These changes potentially he lp to explain why many firms increased their use of incentive compens ation over this period. Similarly, changes in competition and technology mo tivated numerous firms to increase their focus on quality over quantity for example, through the adoption of TQM programs (Brickley; Zimmerman, 1997, p. 22, emphasis added).(Note 10) Changing incentives and "focus on quality over quantity" to concentrate more on teaching and learningÂ—particularly in an environmen t which esteems research and/or technological development higherÂ—is, perhaps, just as likely to involve more than merely issuing letters of congratulation to those w ho score high on student ratings polls. IV. Open Decisions Openly Arrived At Teachers may be stung by what students say if they ask for their students' opinions and find that they are significantly out of keeping with their own expectations. Of course, students have a right to their own opinions But teachers would be foolish to let themselves become ruled by everything students have to sayÂ—especially on those occasions when what they have to say derives from w holly different concepts of educational goals and/or is based on teaching pract ices contrary to wise learning patterns. They are students, and students test what they are thinking by saying it aloud. If there are legitimate differences about teaching and learning, they must be addressed by the institution as well as the individ ual instructor. On the other hand, if low "student evaluation" figures reflect that an instru ctor comes into a class drunk, or is on drugs, perhaps does not come at all, or does not pr epare, or preys upon those in his or her charge, then that instructor ought to be firedÂ— you do not put his or her name up on the world wide web!. But it is not students who post their opi nions on the web. It is a university administration, which does this in place of deeper thought or due diligence. If a student calls me a fool, it may be an inept way to open a c onversationÂ—about what fools are. If a university administrator calls me a foolÂ—he robs me of my right to teach. Is there an inherent problem in recognizi ng a qualitative measurement for rating of teaching? For putting teaching evaluation "on the s ame footing as evaluation of research"? Isn't that what Universities do? In the 1996 Research Assessment Exercise (RAE) in Hong Kong, we are told, the research "outp ut" of all research academics in the territory's then seven traditional "tertiary" institutionsÂ—covering 14,000 publications of
22 of 343,300 academic personnelÂ—was assessed by 110 expert s, many chosen worldwide, and all in less than nine months. If there is a way of obtaining assent of universities to standards for a monumental task of that kind, there must surely be an acceptable means of, at least, setting the standards for a professio nal teaching and learning quality review. There is a reason, however, why the CityU Policy and Guide for Developing Teaching Evaluation Schemes takes such a judicious stand on the collecting of c oncrete evidence for teaching evaluationÂ—this is a step tha t cannot be undone. And there is a reason why it calls for "expertise," "integrity," a nd "training," and applies the "quality" standards to the administration as well as the facu lty. Too often these decisions are made behind closed doors not simply to protect confident iality, but because ill-defined standards applied in secret leave no trace. There may be a right of appeal. But no appe al ever corrected injustice that should not have been done in the first place. If we know t he standards of "quality," and they are as clear as, for example, those in the CityU Policy and Guide or those pursued by the Committee on Teaching Excellence at the Simon Schoo l, then let the sun shine in. NotesThese concerns are well illustrated and documented by Clark. He considers the difficulties facing universities around the world f rom loss of funding for research and emphasis on mass education. He describes the si tuation in universities in the United States, Britain, France, Germany and Japan, also as they form a model for their areas of cultural influence. 1. The UGC is an advisory committee appointed by the C hief Executive of the Hong Kong Special Administrative Region (SAR). Although the UGC has neither statutory nor executive powers, it administers publ ic funds to the eight leading institutions of higher education in Hong Kong throu gh its Secretariat, which is "staffed by civil servants." 2. The ideals of "academic freedom" derive from many s ources: They were formalized as a pre-requisite of the research and t eaching functions of the modern university by Wilhelm von Humboldt in the establish ment of the University of Berlin in 1810. These ideals of "Lernfreiheit," the "freedom of inquiry, or advanced study," and "Lehrfreiheit," "the freedom t o teach what one perceives to be the principles of one's special field," became i nstitutional ideals not only of the German universities (until 1933, and again in the F ederal Republic), but also, in a way, of the American graduate schools created on th e German model. Intellectually, they derive from the same backgroun d of the European philosophers of the Renaissance and the Enlightenment that led t o the creation of political institutions in the United States of America. (Cf. Flexner, 1967). 3. Importance of Educational Technology: All technolog y has to recommend itself to users to be adopted. There have been enormous chang es in business and the professions, including education, as the result of improvements in technology in the last generation. Angela Castro of the Social Sc iences Research Centre, of the University of Hong Kong writes on adoption of new t echnology: I do not believe professional development can be externally imposed on an individual, it must come from a personal prioritising of needs and valu es. If that passionate conviction is there, then the individual will seek ways to improve him/herself. (Castro, 1996) Even the authors of the "TLQPR Revie w" cannot resist referring to the fear of "Educational development units" being cast in the role of Â‘teach policeÂ’ (TLQPR Review, 1996, p. 8). 4.
23 of 34Of course there are some who believe that, even in education, "the customer is always right." See: "Consumerism" in Appendix. 5. Other variables: sex of the student, sex of the ins tructor, personality of the student, and mood of the student, have also been studied in this context. More will be said about "personality" and "mood" of the student as th ey appear in Hong Kong student culture below. 6. Elements of Better Teaching Defined: e.g., breadth and depth of subject matter covered, development of understanding by students, amount and quality of such understanding retained, development of case materia l and textbooks, etc., and cooperation and collegiality between teachers and t eachers and students. 7. The authors of the Basic Law (i.e., the mini-Consti tution) of Hong Kong, had the foresight to include reference to the concept of "a cademic freedom," which "institutions" may retain and enjoy: Art. 137: Educational institutions of all kinds may retain their autonomy and enjoy academic freedom. They may conti nue to recruit staff and use teaching materials from outside the H ong Kong Special Administrative Region. Schools run by religious org anizations may continue to provide religious education, including courses in religion. Students shall enjoy freedom of choice of education al institutions and freedom to pursue their education outside the Hong Kong Special Administrative Region. As is apparent, however, even with statutory protec tion of a specific right, it can not be foreseen how a court might interpret that ri ghtÂ—or indeed whether a court might limit that right to what is immediately ascer tainable within the four corners of Art. 137 itself. 8. With respect, this decision should not be written i n stone either. On the one hand, what a faculty member ought to be able to bring to an institution is professional perspective on course design and grading standards. Yet, whereas a professional person should certainly enjoy a right to expression of professional opinion with respect to a grade, he or she cannot be said to hav e a right to create or destroy a career with that opinion. Even judicial decisions a re subject to appeal. 9. On the application of agency theory, they refer to: Holmstron. B., and Milgrom, P. (1991); and Feltham, G., and Xie, J. (1994). For fo cus on quality over quantity, see also: Wruck, K., and Jensen, M. (1994); and Bri ckley, J., Smith, C.; Zimmerman, J.(1997). 10.References Abrami, P.C.; Cohen, P.A.; d'Apollonia, S (1988). Implementation Problems in Meta-Analysis, Review of Educational Research 58: 151-79. Aleamoni, L.M. and Graham, M.H. (1978). The Relatio nship between CEQ Ratings and Instructor's Rank, Class Size, and Course Level, Journal of Educational Measurement 34: 189-202.Archibold, R.C. (1998). Payback Time: Give Me an 'A or Else, The New York Times Week in Review, May 24, 1998, and http:// www. nyti mes. com/ library/ review/ 052498 students-evaluate-review.html, a review of the situ ation on American university
24 of 34campusesArubayi, Eric A. (1987). Improvement of Instruction and Teacher Effectiveness: Are Student Ratings Reliable and Valid? Higher Education 16: 267-78, at 270. Arubayi, Eric (1985). Subject Disciplines and Stude nt Evaluation of a Degree Programme in Education, Higher Education 114: 683-91. Avi-Itzhak, T. (1982). Teaching Effectiveness as Me asured by Student Ratings and Instructor Self-Evaluation, Higher Education 11: 629-37. Barnoski, R.P. and Sockloff, A.L. (1976). A Validat ion Study of the Faculty and Course Evaluation (FACE) Instrument, Educational and Psychological Measurement 36: 391400.Brickley, James A. and Zimmerman, Jerold L. (1997). Changing Incentives in a Multitask Environment: The Case of Teaching at a To p-25 Business School. Working Paper, William E. Simon Graduate School of Business University of Rochester. Brickley, J.; Smith, C.; Zimmerman, J. (1997). Managerial Economics and Organizational Architecture Irwin. Byrne, J. and Leonhardt, D. (1996). The Best B-Scho ols, Business Week (October 21, 1996).Castro, Angela (1996). Professional Development in IT for Tertiary Academics. Available: http://www.ugc.edu.hk/UGCweb/inte.Clark, Burton R. (1995). Places of Inquiry: Research and Advanced Education in Modern Universities, Berkeley: University of California Press. Corcos, R. Churchill, D. and Lam, A., (1998). Enhan cing the Participation of Law Students in Academic Tutorials, in Kember, D.; Lam, D.-H.; Yan, L.; Yum, J.C.-K.; Liu, S.B., Case Studies of Improving Teaching and Learning fro m the Action Learning Project, Hong Kong: Hong Kong Polytechnic University, p. 35 8. Crittenden, K.S.; Norr, J.L.; Lebailly, R.K. (1975) Size of University Classes and Student Evaluations of Teaching, Journal of Higher Education 46: 461-70. Danielson, A.L. and White, R.A. (1976). Some Eviden ce on the Variables Associated with Student Evaluations of Teaching, Journal of Higher Education 7: 117-19. Downie, N.W. (1952). Student Evaluation of Faculty, Journal of Higher Education 23: 495-96, 503. Doyle, K and Whitely, S. (1974). Student Ratings as Criteria of Effective Teaching, American Educational Research Journal 11: 259-74. Feltham. G., and Xie, J. (1994). Performance, Measu re, Congruity, and Diversity in Multitask Principal/Agent Relations, Accounting Review 69: 429-53.
25 of 34Franklin, J, and Theall, M. (1990). Communicating S tudent Ratings to Decision Makers: Design for Good Practice, in Theall, M. and Frankli n, J., eds., Student Ratings of Instruction: Issues for Improving Practice San Francisco: Jossey-Bass. Flexner, Abraham (1967). Universities: American, English, German with an introduction by Robert Ulich (New York: Teachers Co llege Press, 1967). Gage, N. L. (1974). Students' Ratings of College Te aching: Their Justification and Proper Use, in N.S. Glasman and B.R. Killait, eds., Second UCSB Conference on Effective Teaching University of California at Santa Barbara, pp. 72 -86. Gage, N.L. (1961). The Appraisal of College Teachin g: An Analysis of Ends and Means, Journal of Higher Education 32: 17-22. Gayles, A.R. (1980). Student Evaluation in a Teache r Education Course, Improving College and University Teaching 28: 128-31. Gillmore, G.M. (1973). Estimates of Reliability Coe fficients for Items and Subscales of the Illinois Course Evaluation Questionnaire, Research Report, No. 341, (Urbana, Ill.: Measurement and Research Division, Office of Instru ctional Resources, University of Illinois). Gramlich, E. and Greenlee, G. (1993). Measuring Tea ching Performance, Journal of Economic Education Winter: 3-13 (based on a study of over 15,000 econ omics students who study the same subject matter, are gra ded in a common examination, and have instructors who each receive a similar form of student evaluation). Greenwald, A.G. (1997). Validity Concerns and Usefu lness of Student Ratings of Instruction, American Psychologist, 52:11: 1182-86, at p. 1184. Greenwald, A.G. and Gillmore, G.M. (1997a). Grading Leniency is a Removable Contaminant of Student Ratings, American Psychologist 52:11: 1209-17. Greenwald, A.G. and Gillmore, G.M.. (1997b). No Pai n, No Gain? The Importance of Measuring Course Workload in Student Ratings of Ins truction, Journal of Educational Psychology 89:4: 743-51. Greenwald, A.G. and Gillmore, G.M.. (1997c). Supple ment to UW's December 4 Press Release, http:// weber. u. washington. edu/ ~agg/ pa ingain/ supplement. Html. Guthrie, E.R. (1954). The Evaluation of Teaching: A Progress Report (Seattle: University of Washington).Hall, Cedric and Fizgerald, C. (1994). Student Summative Evaluation of Teaching: Code of Practice Wellington, N.Z.: Association of University Staf f of New Zealand. Harris, E.L. (1982). Student Ratings of Faculty Per formance: Should Departmental Committees Construct the Instruments, Journal of Educational Research 76: 100-106.
26 of 34Harvard Crimson (1998). Confi-Guide 1/25/98, at http://www.thecrimson.harvard.edu/cgi-bin/ . .Haskell, R.E. (1997a,b,c,d). Academic Freedom, Tenu re and Student Evaluation of Faculty, Education Policy Analysis Archives 5: 6,17,18, 21; a: Galloping Polls in the 21st Century; b: Views from the Court; c: Accuracy and Psychometric Validity; d: Analysis and Implications of Views from the Courts in Relation to Academic Freedom, Standards, and Quality Instruction. Available: http ://epaa.asu.edu/epaa/v5n6.html and v5n17.html, v5n18.html, v5n21.html. Hillery, J.M. and Yuk, G.A. (1974). Convergent and Discriminant Validation of Student Ratings of College Instructors, JSAS Catalog of Selected Documents in Psychology 4: 26. Hogan, T.P. (1973). Similarity of Student Ratings a cross Instructors, Course, and Time, Research in Higher Education 1: 149-54. Holmstron, B., and Milgrom, P. (1991). Multitask Pr incipalAgent Analysis: Incentive Contracts, Asset Ownership and Job Design, Journal of Law, Economics & Organization (special issue) 7: 24-52. Hong Kong University of Science & Technology (HKUST ) (1998). Progress Report to the University Grants Committee (2 March, 1998). Av ailable http://www.ust.hk/%7Ewebaa /TLQPR/account.htm, p.2.Hong Kong University of Science & Technology (HKUST ) (1998). Course Evaluations, http://www.ust.hk/~webaa/courseeval/#SOURCE.Hong Kong University of Science & Technology (HKUST ) ( 1997). Faculty Handbook, 1997 Imrie, Bradford W. (1993). Professional Development as Quality Assurance, a summary of official thinking of the last 30 years leading u p to the TLQPR. Available: http://www.ugc.edu.hk/documents/tlqpr/INQAAHE.html. Joint University Programmes Admissions System (JUPA S) (1997). 1997 Admissions Grades Achieved by the Median & Lower Quartile' Applicants in Programmes Offered by the 7 Institutions.Kaplin, W.A. and Lee, B. (1995). The Law of Higher Education: A Comprehensive Guide to Legal Implications of Administrative Decis ion Making 3rd ed., San Francisco: Jossey Bass.Kennedy, R.W. (1975). Grades Expected and Grades Re ceivedÂ—their Relationship to Students' Evaluations of Faculty Performance, Journal of Educational Psychology 57: 109-15.Kohlan, R.G. (1973). A Comparison of Faculty Evalua tions Early and Late in the Course, Journal of Higher Education 44: 587-95.
27 of 34Lee, O. (1998). 'Accountability' and 'Quality' in t he Research Assessment Policy of the University Grants Committee (Hong Kong), which appe ared as: Scrutineers perpetuate reign of error, in The Times Higher Education Supplement August 14, 1998, Commonwealth Section, p. xiii. Lee, O. (1997). Hong Kong Business Law in a Nutshell New York: Juris Publishing, pp. xiii ff.Lee, O. with the multimedia assistance of She, Jame s (1999). 'I Want To SeeÂ—Not To Be Seen!' Teaching 'Moot Court' Debating Skills thr ough Interactive Multimedia, in James, Jeff, ed., Quality in Teaching & Learning in Higher Education, Hong Kong: University Grants Committee, 2000.Marsh, H.W. (1983). Students as Evaluators of Teach ing, International Encyclopaedia of Education: Research and Studies (New York: Pergamon Press). Marsh, H.W. and Overall, J.U. (1981). The Relative Influence of Course Level, Course Type, and Instructor, on Students' Evaluations of C ollege Teaching, American Educational Research Journal 18: 103-11. Marsh, H.W. and Roche, L.A. (1997). Making Students Evaluations of Teaching Effectiveness Effective: The Critical Issues of Val idity, Bias, and Utility, American Psychologist 52: 1187-97. Massy, William F. and French, Nigel J. (1993). Teac hing and Learning Quality Process Review: A Review of the Hong Kong Programme (hereaf ter, TLQPR Review), cited from the UGC internet home page,http://www.ugc.edu.hk/documents/papers/wfm_njf5.htm l. McKeachie, Wilbert J. (1997a). Student Ratings: The Validity of Use, American Psychologist 52: 1218-25. McKeachie, W.J. (1997b). Student Ratings of Faculty : A Reprise, Academe 65: 384-97. Milgrom, P. and Roberts, J., (1995). Complementarit ies and Fit: Strategy, Structure, and Organizational Change in Manufacturing, Journal of Accounting and Economics 19:179-208.Murray, H.G. (1980). Evaluating University Teaching: A Review of Researc h (Toronto: Confederation of University Faculty Associations).Nichols, A, and Soper, J.C. (1972). Economic Man in the Classroom, Journal of Political Economy, 80: 106973. Payne, D.A. and Hobbs, A.M. (1979). The Effect of C ollege Course Evaluation Feedback on Instructor and Student Perceptions of I nstructional Climate and Effectiveness, Higher Education 8: 525-33. Perry, R.R. and Baumann, R.R. (1973). Criteria for Evaluation of College Teaching: Their Reliability and Validity at the University of Toledo, in A.I Sockloff, ed.,
28 of 34Proceedings: Faculty Effectiveness as Evaluated by Students (Philadelphia: Temple University). Poch, R.K. (1993). Academic Freedom in American Hig her Education: Rights, Responsibilities, and Limitations, ASHE-ERIC Higher Education Report No. 4. Pohlmann, J.T. (1975). A Description of Teaching Ef fectiveness as Measured by Student Ratings, Journal of Educational Measurement 12: 49-54. Quality Assurance Committee (QAC), City University of Hong Kong, (1996). CityU Policy and Guide for Developing Teaching Evaluation Schemes, http:// www. cityu. edu.hk/ QAC/ scheme. htm.Quality Assurance Committee (QAC), City University of Hong Kong (1996). Statement of Principles on Quality and Quality Assurance. Ava ilable http://www.edu.hk/QAC/aboutQAC.htm Riggs, R.O. (1975). The Prevalence and Purposes of Student and Subordinate Evaluations among AACTE Member Institutions, Journal of Teacher Education 26: 218-21. Rosenshine, B.; Cohen, A.; Furst, N. (1974). Correl ates of Student Preference Ratings, Journal of Economic Education 4: 90-99. Seldin, F. (1976). New Ratings Names for Professors The Peabody Journal of Education, 53: 254-59. Scott, C.A. (1977). Student Ratings and InstructorDefined Extenuating Circumstances, Journal of Educational Psychology 69: 744-47. Schwab, D.P. (1975). Course and Student Characteris tic Correlates of the Course Evaluation Instrument, Journal of Applied Psychology 60: 742-47. Scriven, M. (1981). Summative Teacher Evaluation, i n J. Millman, ed ., Handbook of Teacher Evaluation Beverley Hills, CA: SAGE, pp. 244-71. Sternberg, Robert J. (1998). Plenary Address, Pract ical Intelligence: Wisdom, Schooling, and Society, International Conference on the Application of Psychology to the Quality of Learning & Teaching, Hong Kong, June 13-18, 1998. Sullivan, A. and Skanes, G. (1974). Validity of Stu dent Evaluation of Teaching and the Characteristics of Successful Instructors, Journal of Educational Psychology 66: 584-90. TLQPR of Hong Kong University of Science & Technolo gy (1996). University Grants Committee (UGC) (1996). Higher Ed ucation in Hong KongÂ—A Report by the University Grants Committee, Hong Kon g, November, 1996. University of Hong Kong (HKU), Department of Psycho logy, and Hong Kong University of Science & Technology (HKUST), Divisio n of Social Science (1997). Call
29 of 34for Submissions, International Conference on the Application of Psy chology to the Quality of Learning and Teaching Hong Kong, June 13-18, 1998. University of Hong Kong (HKU) (1997). Response to t he TLQPR Report, http://www.hku.hk/acad/hku-tlqpr/response.htm 11/ 23/97, University of Washington (1997). Press Release: Stu dent Evaluations Don't Get a Passing Grade: Easy-Grading Professors Get Too-High Marks, new UW Study Shows, http://www. washington. edu/ newsroom/news/k120497. html Walker, B.D. (1968). An Investigation of Selected V ariables Relative to the Manner in which a Population of Junior College Students Evalu ate their Teachers, Diss., University of Houston, Dissertation Abstracts, 29: (1969) 3474B. Weyrauch, W.O. (1971). The 'Basic Law' or 'Constitu tion' of a Small Group, Journal of Social Issues 27: 49, examining the emergence of community stand ards of behavior in a group isolated for a nutritional experiment, althou gh many members saw themselves beyond culturally imposed rules. Weyrauch, W.O. (1969). Governance Within Institutio ns, Stanford Law Review 22:141 (1969), reviewing Rubenstein and Lasswell, The Sharing of Power in a Psychiatric Hospital (1966). Wong, C., (1984). Sociocultural Factors Counteract the Instructional Efforts of Teaching through English in Hong Kong Seattle: University of Washington. Woo Chia Wei (1997). The President's Progress Repor t, HKUST Newsletter Fall, 1997, and email 9/10/97.Wruck, K., and Jensen, M. (1994). "Science, Specifi c Knowledge, and Total Quality Management, Journal of Accounting and Economics 18: 247-87. Zirkel, P.A. (1996). The Law of Teacher Evaluation Bloomington, IN: Phi Delta Kappa Educational foundation. About the AuthorOrlan LeeHong Kong University of Science & TechnologySchool of Business & Management; andVisiting Fellow, Clare Hall, University of Cambridg e A.B. (hons.), Harvard; M.A., Yale; Ph.D., Freiburg (Germany); JurisDr., Pennsylvania; LL.M., Virginia. Dr. Lee teaches business law and c yberlaw in the School of Business and Management of the Hong Kong University of Scien ce & Technology, and is also Visiting Fellow at Clare Hall, the college of advan ced studies at the University of Cambridge. He is trained in both the civil law and common law systems, and as a social scientist, and has had extensive practical field ex perience. He has published widely on emergence of law and issues of public policy.
30 of 34AcknowledgmentsThis article was prepared in part with support of a Direct Allocation Grant of the University Grants Committee of Hong Kong, and with the rare opportunity for research and reflection provided by Clare Hall, of the Unive rsity of Cambridge. The author would like to thank Mr. Chan Tai Yat, and Mr. Ng Yiu Fai for their assistance in research and production of the manuscript.Appendix Divergent FindingsThose Duscussing the Conflict of Interest in Studen t Evaluation: Gage, N. L. (1974); Harris, E.L. (1982). Those Studying the Widespread use of Student Evalua tion for Formative and Summative Purposes: In the 1970s, the American Council on Education sur veyed 669 American colleges and universities and found 65% using such student r atings; 35% used these for so-called "summative" purposes, i.e., for faculty h iring, tenure, termination or promotion. See: Payne, D.A. and Hobbs, A.M. (1979). Obviously this form of questionnaire was even more at home in schools of teacher education, where 86% of the American Association of Colleges for Teacher Education (AACTE) reported using these measures. Se e: Riggs, R.O. (1975). Those Advocating "Consumerism" in Education: Seldin, F. (1976); Gayles, A.R. (1980); Arubayi, Eric (1985). Those Attributing High Rating to Impact of Prior In terest in Subject: Marsh, H.W. (1980);Greenwald, A.G. (1997). Those Believing that Ratings are Consistent for the Same Faculty Members from Year-to-Year: Guthrie, E.R. (1954). Those Finding that Smaller Class Size Produced High er Ratings: Danielson, A.L. and White, R.A. (1976);Crittenden, K.S.; Norr, J.L.; Lebailly, R.K. (1975) ; Scott, C.A. (1977);Perry, R.R. and Baumann, R.R. (1973);Avi-Itzhak, T. (1982). Those Still Arguing that Class Size Has NO Effect: Aleamoni, L.M. and Graham, M.H. (1978). Those Finding Student Ratings Correlate with Profes sional and Alumni
31 of 34Evaluation: Marsh, H.W. (1983);Murray, H.G. (1980).Those Finding that Time of Day Affects the Survey ( Afternoon Ratings Lower than Morning): Nichols, A, and Soper, J.C. (1972). Those Finding that Lecturers are Rated Lower than P rofessors: Downie, N.W. (1952);Gage, N.L. (1961);Walker, B.D. (1968). Those Finding that Students at Lower Levels Tend to Rank Lecturers Less Favorably than Professors: Downie, N.W. (1952);Gage, N.L. (1961);Pohlmann, J.T. (1975);Kohlan, R.G. (1973). Those Finding that Students at Lower Levels Do NOT Tend to Rank Lecturers Less Favorably than Professors: Hillery, J.M. and Yuk, G.A. (1974). Those Finding that "Grades Expected" Affect Ratings : Barnoski, R.P. and Sockloff, A.L. (1976);Kennedy, R.W. (1975); Schwab, D.P. (1975);Sullivan, A. and Skanes, G. (1974); Hillery, J.M. and Yuk, G.A. (1974);Perry, R.R. and Baumann.R.R. (1973);Rosenshine, B.; Cohen, A.; Furst, N. (1974). Those Finding that "Grades Expected" Do NOT Affect Ratings: Doyle, K and Whitely, S. (1974). Those Finding that Ratings Are Consistent for the S ame Faculty Members Regardless of Subject Matter Taught: Marsh, H.W. and Overall, J.U. (1981);Gillmore, G.M. (1973); Hogan, T.P. (1973). Those Finding that Teaching Ratings and Learning ar e Only "Weakly Related": Gramlich, E. and Greenlee, G. (1993). Those Who Surveyed the Literature on Validity: Arubayi, Eric A. (1987); McKeachie, W.J. (1997b). Haskell, R.E. (1997a, b, c, d).
32 of 34 Current Research Returning to the Conclusion that G rades Expected and Course Workload are Dominant Factors: Greenwald, A.G. (1997);Greenwald, A.G. and Gillmore, G.M. (1997a);Greenwald, A.G. and Gillmore, G.M.. (1997b); University of Washington (1997); Greenwald, A.G. and Gillmore, G.M.. (1997c); Archibold, R.C. (1998). Those Discussing the Disparity in the Concepts of T eaching and Learning: Lee, O. with She, James, (2000);Haskell, R.E. (1997a,b,c,d). Copyright 2000 by the Education Policy Analysis ArchivesThe World Wide Web address for the Education Policy Analysis Archives is epaa.asu.edu General questions about appropriateness of topics o r particular articles may be addressed to the Editor, Gene V Glass, email@example.com or reach him at College of Education, Arizona State University, Tempe, AZ 8 5287-0211. (602-965-9644). The Commentary Editor is Casey D. C obb: firstname.lastname@example.org .EPAA Editorial Board Michael W. Apple University of Wisconsin Greg Camilli Rutgers University John Covaleskie Northern Michigan University Alan Davis University of Colorado, Denver Sherman Dorn University of South Florida Mark E. Fetler California Commission on Teacher Credentialing Richard Garlikov email@example.com Thomas F. Green Syracuse University Alison I. Griffith York University Arlen Gullickson Western Michigan University Ernest R. House University of Colorado Aimee Howley Ohio University Craig B. Howley Appalachia Educational Laboratory William Hunter University of Calgary Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Green Mountain College Dewayne Matthews Western Interstate Commission for HigherEducation William McInerney Purdue University Mary McKeown-Moak MGT of America (Austin, TX)
33 of 34 Les McLean University of Toronto Susan Bobbitt Nolen University of Washington Anne L. Pemberton firstname.lastname@example.org Hugh G. Petrie SUNY Buffalo Richard C. Richardson New York University Anthony G. Rud Jr. Purdue University Dennis Sayers Ann Leavenworth Centerfor Accelerated Learning Jay D. Scribner University of Texas at Austin Michael Scriven email@example.com Robert E. Stake University of IllinoisÂ—UC Robert Stonehill U.S. Department of Education David D. Williams Brigham Young UniversityEPAA Spanish Language Editorial BoardAssociate Editor for Spanish Language Roberto Rodrguez Gmez Universidad Nacional Autnoma de Mxico firstname.lastname@example.org Adrin Acosta (Mxico) Universidad de Guadalajaraadrianacosta@compuserve.com J. Flix Angulo Rasco (Spain) Universidad de Cdizfelix.email@example.com Teresa Bracho (Mxico) Centro de Investigacin y DocenciaEconmica-CIDEbracho dis1.cide.mx Alejandro Canales (Mxico) Universidad Nacional Autnoma deMxicocanalesa@servidor.unam.mx Ursula Casanova (U.S.A.) Arizona State Universitycasanova@asu.edu Jos Contreras Domingo Universitat de Barcelona Jose.Contreras@doe.d5.ub.es Erwin Epstein (U.S.A.) Loyola University of ChicagoEepstein@luc.edu Josu Gonzlez (U.S.A.) Arizona State Universityjosue@asu.edu Rollin Kent (Mxico)Departamento de InvestigacinEducativa-DIE/CINVESTAVrkent@gemtel.com.mx firstname.lastname@example.org Mara Beatriz Luce (Brazil)Universidad Federal de Rio Grande do Sul-UFRGSlucemb@orion.ufrgs.brJavier Mendoza Rojas (Mxico)Universidad Nacional Autnoma deMxicojaviermr@servidor.unam.mxMarcela Mollis (Argentina)Universidad de Buenos Airesmmollis@filo.uba.ar Humberto Muoz Garca (Mxico) Universidad Nacional Autnoma deMxicohumberto@servidor.unam.mxAngel Ignacio Prez Gmez (Spain)Universidad de Mlagaaiperez@uma.es
34 of 34 Daniel Schugurensky (Argentina-Canad)OISE/UT, Canadadschugurensky@oise.utoronto.ca Simon Schwartzman (Brazil)Fundao Instituto Brasileiro e Geografiae Estatstica email@example.com Jurjo Torres Santom (Spain)Universidad de A Coruajurjo@udc.es Carlos Alberto Torres (U.S.A.)University of California, Los Angelestorres@gseisucla.edu