EDUCATION POLICY ANALYSIS ARCHIVES A peer-reviewed scholarly journal Editor: Sherman Dorn College of Education University of South Florida Copyright is retained by the first or sole au thor, who grants right of first publication to the Education Policy Analysis Archives EPAA is published jointly by the Colleges of Education at Arizona State University and the University of South Florida. Articles are indexed in the Director y of Open Access Journals ( and by H.W. Wilson & Co. Volume 13 Number 37 August 30, 20 05 ISSN 1068–2341 Response to "What Do Klein et al. Tell Us About Test Scores in Texas?"1 Stephen P. Klein, Laura S. Hamilton, Da niel F. McCaffrey, Brian M. Stecher RAND Citation: Klein, S. P., Hamilton, L. S., McCaff rey, D. F., & Stecher, B. M. (2005, August 30). Response to "What do Klein et al. tell us about test scores in Texas?" Education Policy Analysis Archives, 13 (37). Retrieved [date] from We have reviewed the article by Toenjes (2005). Below we summarize our responses. First, Toenjes incorrectly describes the focus of our study. As we note in our paper, the findings from the 20-school analysis merely triggered and supplemented our statewide analyses. Most of our article examined the stark discontinui ty between TAAS and NAEP trends for the entire state of Texas (namely, the meteoric increase in scores and the narrowing of the gap between racial/ethnic groups on TAAS not being refl ected on NAEP). The huge discrepancy between NAEP and state test scores in Texas is well-documen ted, it is still present, and it has been described in other studies (Haney, 2000; McCombs et al., 2005; Peterson & Hess, 2005). Toenjes’ comments about the representativeness of the 20 schools are irrelevant to our statewide analyses, and that analysis was the focus of our paper. Second, Toenjes appears to have misunderstood the purpose of our 20-schools analysis. The intent was not to make conclusions about relationships between achievement and socioeconomic status statewide. Rather, the purpose was to examin e whether, in this particular sample for which scores on two different tests were available, the TA AS functioned in the same way as an alternative test that many other states use, namely, the Stanford 9. The fact that the TAAS and Stanford 9 1 Accepted under the editorship of Sherman Dorn. Send commentary to Casey Cobb (


Education Policy Analysis Archives Vol. 13 No. 37 2 showed radically different relationships with SES does not tell us anything about what those relationships would be in a larger sample, and we ne ver claim that it does. In particular, we do not, as Toenjes claims, use the 20-schools study to conclude that TAAS lacks validity. Instead, we used the findings with the 20 school s to show that in these particular schools, the scores on a commonly used test of mathematic s achievement did not correspond to scores on the TAAS. The difference between these two tests wa s much larger than can plausibly be explained by differences in content coverage, and therefor e raises questions about the meaning of TAAS scores for these schools. As we state clearly, “We are therefore reluctant to draw conclusions from our findings with these schools or to imply that thes e findings are likely to occur elsewhere in Texas. Nevertheless, they do suggest the desirability of pe riodic administration of external tests to validate TAAS results. This procedure, which is sometimes referred to as ‘audit testing,’ could have been incorporated into the study of the Metropolitan Achievement Test discussed previously.” This is a much more cautious conclusion than Toenjes claims we make. Third, Toenjes misunderstands the relationship between our study and the Grissmer, Flanagan, Kawata, & Williamson (2000) study. Toen jes states that we conc luded that the observed increases in Texas student academic performance reported by Grissmer et al. (2000) were highly suspect. Our paper contains no such conclusion. We focused on whether the TAAS scores were suspect, whereas Grissmer et al. examined gains in NA EP. As we clearly state in our paper, “these studies differed in the questions they investigated the data they analyzed, and the methodologies they employed.” The studies do not in fact produce conflicting findings; they simply address different questions. References Grissmer, D., Flanagan, A., Kawa ta, J., & Williamson, S. (2000). Improving student achievement: What state NAEP t est scores tell us. Santa Monica, CA: RAND, MR-924-EDU Haney, W. (2000). The Myth of the Texas Miracle in Education. Education Policy Analysis Archives, 8 (41). Retrieved June 21, 2005, from McCombs, J. S., Kirby, S. N., Barney, H ., Darilek, H., & Mage e, S. J. (2005). Achieving state and national literacy goals, a long uphill road: A report to Ca rnegie Corporation of New York. Santa Monica, CA: RAND, TR-180-EDU. Peterson, P. E., & Hess, F. M. (2005). Johnny ca n read…in some states. Education Next, Summer 2005, 52-53. Retrieved June 21, 2005, from http://www.educationnext. org/20053/pdf/52.pdf. Toenjes, L. A. (2005). What do Klein et al. tell us about test scores in Texas? Education Policy Analysis Archives, 13 (36). Available at http://e


Response to "What Do Klein et al. Te ll Us About Test Scores in Texas?" 3 About the Authors Stephen P. Klein Laura S. Hamilton Daniel F. McCaffrey Brian Stecher RAND Email: Dr. Stephen P. Klein is a Senior Research Scientist at RAND, where for the past 30 years he has led studies on heal th, criminal justice, military manpower, and educational issues. His current projects include analyzing licensing ex aminations in teaching and other professions, delivering computer performance tes ts over the Web, and measurin g the effects of instructional practices and curriculum on student performance. Dr. Laura S. Hamilton is a Senior Behavioral Scient ist at RAND where she conducts research on educational assessment and the effe ctiveness of educationa l reform programs. Her current projects include a study of systemic reforms in math an d science, an investigation of district and school responses to standards-based accountability, and a study of teachers’ and principals’ use of information from a value-added assessment system. Dr. Daniel F. McCaffrey is is a Senior Statistician at RAND where he works on studies of health and educational issues. His recent rese arch has focused the stud y of value-added model estimates of teacher effects by evaluating the literature and st atistical methods for value-added modeling, by developing new mode ls and algorithms for estimati ng effects, and by conducting empirical evaluations of the various estimates. He is also studying the effect of a value-added assessment program on student outc omes and educational practice. Dr. Brian Stecher is a Senior Social Sc ientist in the Education program at RAND. Dr. Stecher's research emphasis is applied educational measurement, including the implementation, quality, and impact of state assessment and ac countability systems and the cost, quality, and feasibility of performance-based asse ssments in mathematics and science.


