|USFDC Home||| RSS|
This item is only available as the following downloads:
1 of 5 9 Education Policy Analysis Archives Volume 7 Number 5February 17, 1999ISSN 1068-2341 A peer-reviewed scholarly electronic journal Editor: Gene V Glass, College of Education Arizona State University Copyright 1999, the EDUCATION POLICY ANALYSIS ARCHIVES. Permission is hereby granted to copy any article if EPAA is credited and copies are not sold. Articles appearing in EPAA are abstracted in the Current Index to Journals in Education by the ERIC Clearinghouse on Assessment and Evaluation and are permanently archived in Resources in Education Some Comments on the Ad Hoc Committee's Critique of the Massachusetts Teacher Tests Howard Wainer Educational Testing Service"It is a trite but true observation, that examples work more forcibly on the mind than precepts: And if this be just in what is odious and blameable, it is more strongly so in what is amiable and praise-wort hy." Henry Fielding, Joseph Andrews 1742 The critique of the Massachusetts Teacher Tests (MTT) by Haney and his colleagues is deserving of comment, both because of the impact of the MTT and because of the evocative manner in which the tale is told. Their emphasis on examples makes for a forceful argument, and I fear that my reliance on precepts may look meager by comparison. Nevertheless, I hope that some of the o bservations that follow contribute to the more reasoned assessment of these instruments a nd their use not just in Massachusetts but in the many other states where si milar programs are being developed or contemplated. It seems clear from some of the data that the Ad Hoc Committee have presented that the MTT is not up to snuff, although it is a b it of a puzzle why its reliability isn't higher. Even a little bit of pre-testing (and Spear manBrown) would have shown what test length is required for standard reliability. P erhaps time and economic pressures intervened and a less than fully developed instrume nt was rushed to the field? As the old
2 of 5saying goes, if it's worth doing, it's worth doing badly. The reliability in this whole process tha t is of greatest interest is the reliability associated with the pass/fail decision (i.e., the r eliability that emanates from the standard error of measurement around a score of 70). This ca n be easily calculated from the raw data by noting the inverse of the information funct ion after fitting an Item Response Theory (IRT) model. If the raw data were available, I could do it myself. Considering the high stakes associated with these tests, there is m ore than the usual obligation on someone's part to make these raw data available. It can be noted in Figure 1 that 32 of th e 66 teachers who failed initially passed on retesting. This does not necessarily speak to unrel iability of the tests. It is possible that grit, determination and plenty of quick preparation accounted for the elevation of the scores on the second administration. One wishes to see the other half of the four-fold table (the counts of those who passed the first tim e and who passed or failed the second time), but such data are never available--passing o nce is a ticket out of the testing system. One could contrive an approximation (in the spirit of split-halves) by generating two scores (e.g., odd and even item scores) and con structing the 2-by-2 table (pass/fail vs. score-1/score2) using 35 as the passing score Of course, one would want to do this for all candidates, not just those who failed the f irst time. The extent to which this table is diagonal (i.e., approaching zeroes in the off-di agonals) is the reliability of the test for the decision. Of course it is an underestimate (too short a test) and you need to apply something like the Spearman-Brown "prophecy formula to estimate the reliability of the decision based on the full test. A comment made about "Peter" in the exami nee vignettes seemed a bit strong. Peter scored in the 91st percentile on the GRE-V an d "between the 80th and 85th percentile in Reading". These two results were desc ribed as "quite at odds". In fact, they seem fairly consistent, especially considering the somewhat lower reliability of the MTT. We see this by using Kelley's (1947) equation for estimating true score from observed score, X Kelley's equation yields So we would predict (from Peter's observed score of 91) that his true score is 79. Thus on retesting getting 80-85 is not out of line. In f act if reliability was not 0.7 but rather 0.73 the true score estimate is 80. If reliability was 0.85 we would expect a true score of 85. (See also Wainer, 1999.) One might even say that the Ad Hoc Commit tee exaggerated the discrepancy between Peter's SAT and MTT scores for effect. (Ind eed, a more temperate, less polemical, tone throughout the article would have b een more persuasive for me and most readers, perhaps.) This same bit of exaggeration sh ows itself again in Recommendation 1: "No exam at all is better than an unreliable exa m...." Whether or not no exam is better than any exam depends on (i) how unreliable that ex am is, (ii) what the selection ratio is, and (iii) the nature of the cost function associate d with errors of each type. For example, suppose one has a test that is 92% accurate (4% that should "pass" actually fail and 4% of those that should "fail" ac tually pass), and suppose further that one has a selection ratio of 95 to 100--that is one expects to pass 95% of all applicants, and false passes are as bad as false failures. Unde r these circumstances, the test is a bad idea since if one simply passed everyone the error rate would be 62.5% of what would be obtained if the test were used. But suppose the cost function is different. Suppose a person who fails improperly can take the test again and pass whereas a person who
3 of 5 passes improperly is installed forever to do irrepa rable damage to our children. Now, using the test with its obvious imperfections seems like a better idea. What cost structure would one wish to imp ose on a licensing exam for heart surgeons? Airline pilots? And what about validity? The ancient Chin ese tests had an enormous selection ratio--one in a "gazillion." Consequently, the test did not need much validity to be worthwhile, though it did need at least some. Of co urse enormous numbers of worthy candidates failed, but the odds of an unworthy one passing were small enough to be ignored for all practical purposes. The same struct ure manifests itself in such tests as the National Merit Scholarship test in which 1,500 "win ners" are chosen from more than 1,500,000 applicants. This 1 in a 1,000 selection r atio means that a very large number of worthy kids do not win, but that all winners are tr uly wondrous. These three key issues that must be decided before making such a statement as "no exam at all is better than an unreliable exam" where "unreliable" means "reliabil ity in the 0.7 range". I conclude, then, on the basis of my read ing and my own biases, that the MTT needs work, but certainly provides some information to guide decisions. And, if the selection ratio is high enough, the MTT might even be good enough. (I tend to put a high cost on allowing an incompetent teacher into the cl assroom and a relatively low cost on asking a marginally passing teacher to take the tes t again.) But that's just me. I hope that the Massachusetts Department of Education will now make enough data available for a more proper test analysis and, perhaps, publishing the Ad Hoc Committee's critique will hasten that day.ReferencesKelley, T. L. (1947). Fundamentals of statistics Cambridge: Harvard University Press. Wainer, H. (1999). Is the Akebono School failing it s best students? An Hawaiian adventure in regression. Educational Measurement: Issues and Practice, 18 26-33.About the Author Howard Wainer Email: firstname.lastname@example.org Howard Wainer received his Ph.D. from Pri nceton University in 1968, after which he was on the faculty of the University of Chicago. He worked at the Bureau of Social Science Research in Washington during the Carter Ad ministration, and is now Principal Research Scientist at the Educational Testing Servi ce. He was awarded the Educational Testing Service's Senior Scientist Award in 1990 an d was selected for the Lady Davis Prize. He is a Fellow of the American Statistical A ssociation. His latest book, Visual Revelations was published by Copernicus Books (a division of Springer-Verlag) in 1997.Copyright 1999 by the Education Policy Analysis ArchivesThe World Wide Web address for the Education Policy Analysis Archives is
4 of 5 http://epaa.asu.edu General questions about appropriateness of topics o r particular articles may be addressed to the Editor, Gene V Glass, email@example.com or reach him at College of Education, Arizona State University, Tempe, AZ 85287-0211. (602-965-96 44). The Book Review Editor is Walter E. Shepherd: firstname.lastname@example.org The Commentary Editor is Casey D. Cobb: email@example.com .EPAA Editorial Board Michael W. Apple University of Wisconsin Greg Camilli Rutgers University John Covaleskie Northern Michigan University Andrew Coulson firstname.lastname@example.org Alan Davis University of Colorado, Denver Sherman Dorn University of South Florida Mark E. Fetler California Commission on Teacher Credentialing Richard Garlikov email@example.com Thomas F. Green Syracuse University Alison I. Griffith York University Arlen Gullickson Western Michigan University Ernest R. House University of Colorado Aimee Howley Ohio University Craig B. Howley Appalachia Educational Laboratory William Hunter University of Calgary Richard M. Jaeger University of North Carolina--Greensboro Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Green Mountain College Dewayne Matthews Western Interstate Commission for Higher Education William McInerney Purdue University Mary McKeown-Moak MGT of America (Austin, TX) Les McLean University of Toronto Susan Bobbitt Nolen University of Washington Anne L. Pemberton firstname.lastname@example.org Hugh G. Petrie SUNY Buffalo Richard C. Richardson Arizona State University Anthony G. Rud Jr. Purdue University Dennis Sayers Ann Leavenworth Centerfor Accelerated Learning Jay D. Scribner University of Texas at Austin Michael Scriven email@example.com Robert E. Stake University of Illinois--UC Robert Stonehill U.S. Department of Education Robert T. Stout Arizona State University
5 of 5 David D. Williams Brigham Young University EPAA Spanish Language Editorial BoardAssociate Editor for Spanish Language Roberto Rodrguez Gmez Universidad Nacional Autnoma de Mxico firstname.lastname@example.org Adrin Acosta (Mxico) Universidad de Guadalajaraadrianacosta@compuserve.com J. Flix Angulo Rasco (Spain) Universidad de Cdizfelix.email@example.com Teresa Bracho (Mxico) Centro de Investigacin y DocenciaEconmica-CIDEbracho dis1.cide.mx Alejandro Canales (Mxico) Universidad Nacional Autnoma deMxicocanalesa@servidor.unam.mx Ursula Casanova (U.S.A.) Arizona State Universitycasanova@asu.edu Jos Contreras Domingo Universitat de Barcelona Jose.Contreras@doe.d5.ub.es Erwin Epstein (U.S.A.) Loyola University of ChicagoEepstein@luc.edu Josu Gonzlez (U.S.A.) Arizona State Universityjosue@asu.edu Rollin Kent (Mxico)Departamento de InvestigacinEducativa-DIE/CINVESTAVrkent@gemtel.com.mx firstname.lastname@example.org Mara Beatriz Luce (Brazil)Universidad Federal de Rio Grande do Sul-UFRGSlucemb@orion.ufrgs.brJavier Mendoza Rojas (Mxico)Universidad Nacional Autnoma deMxicojaviermr@servidor.unam.mxMarcela Mollis (Argentina)Universidad de Buenos Airesmmollis@filo.uba.ar Humberto Muoz Garca (Mxico) Universidad Nacional Autnoma deMxicohumberto@servidor.unam.mxAngel Ignacio Prez Gmez (Spain)Universidad de Mlagaaiperez@uma.es Daniel Schugurensky (Argentina-Canad)OISE/UT, Canadadschugurensky@oise.utoronto.ca Simon Schwartzman (Brazil)Fundao Instituto Brasileiro e Geografiae Estatstica email@example.com Jurjo Torres Santom (Spain)Universidad de A Coruajurjo@udc.es Carlos Alberto Torres (U.S.A.)University of California, Los Angelestorres@gseisucla.edu
xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 7issue 5series Year mods:caption 19991999Month February2Day 1717mods:originInfo mods:dateIssued iso8601 1999-02-17
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c19999999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00117
Educational policy analysis archives.
n Vol. 7, no. 5 (February 17, 1999).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c February 17, 1999
Some comments on the ad hoc committee's critique of the Massachusetts Teacher Tests / Howard Wainer.
Arizona State University.
University of South Florida.
t Education Policy Analysis Archives (EPAA)