xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c19949999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00025
Educational policy analysis archives.
n Vol. 2, no. 10 (July 11, 1994).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c July 11, 1994
On the academic performance of New Jersey's public school children : fourth and eight grade mathematics in 1992 / Howard Wainer.
Arizona State University.
University of South Florida.
t Education Policy Analysis Archives (EPAA)
xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 2issue 10series Year mods:caption 19941994Month July7Day 1111mods:originInfo mods:dateIssued iso8601 1994-07-11
1 of 17 Education Policy Analysis Archives Volume 2 Number 10July 11, 1994ISSN 1068-2341A peer-reviewed scholarly electronic journal. Editor: Gene V Glass, Glass@ASU.EDU. College of Edu cation, Arizona State University,Tempe AZ 85287-2411 Copyright 1993, the EDUCATION POLICY ANALYSIS ARCHIVES.Permission is hereby granted to copy any a rticle provided that EDUCATION POLICY ANALYSIS ARCHIVES is credited and copies are not sold.On the Academic Performance of New Jersey's Public School Children: Fourth and Eighth Grade Mathematics in 1992Howard Wainer Educational Testing Service email@example.com Abstract: Data from the 1992 National Assessment of Educatio nal Progress are used to compare the performance of New Jersey public school children with those from other participating states. The comparisons are made with the raw means scores and after standardizing all state scores to a common (National U.S.) demogr aphic mixture. It is argued that for most plausible questions about the performance of public schools the standardized scores are more useful. Also, it is shown that if New Jersey is vie wed as an independent nation, its students finished sixth among all the nations participating in the 1991 International Mathematics Assessment.Introduction"...teaching is validated by the transformation of the minds and persons of the intended audience." (Bressler, 1991)On the Genesis of this Report In January of this year I was approached by three m embers of the research staff of the New Jersey Education Association (NJEA) with a proposit ion. They wanted me to do some research that would show that "New Jersey's teachers were do ing a good job." They pointed out that NJEA was an advocacy group and that their principal goal was furthering the interests of their members. I told them I understood this, but that "research" means that we don't already know the answer, and that it was ETS policy that there could be no c ontrol exercised by a client over the freedom to publish by its researchers (except for usual con cerns about protecting the privacy of
2 of 17examinees). So whatever I found, good or bad, would be written up and disseminated in the same way. They agreed. Then we discussed the character of the research. T hey began with the notion that I somehow collect and summarize information that coul d be formed into some sort of index of teaching quality. I didn't (and don't) know how to characterize the quality of teaching indirectly. I know that some teachers are justly renowned for the ir bravura performances as Grand Expositor on the podium, Agent Provocateur in the preceptoria l, or Kindly Old Mentor in the corridors. I also knew that these familiar roles in the standard faculty repertoire should not be mistaken for teaching, except as they are validated by the trans formation of the minds and persons of the students. Thus I was committed to measuring the suc cess of the schools by the performance of the students. They agreed and suggested that I compare New Jersey 's students with other states on their SAT scores. I pointed out there is a substantial li terature (much of it by my favorite author e.g. Wainer, 1986a,b; 1989a,b) that demonstrates quite c onclusively that one cannot make effective comparisons among states using the scores of any co llege admissions test because of the self-selected character of the sample of students w ho choose to take it. I suggested that the data collected in the NAEP state assessment would be jus t the thing. They expressed concern that such an investigation would not have as high a prof ile since SAT scores are often in the news. I agreed that this might be the case, but that we cou ld begin to change this by making a big deal about the importance of NAEP. I pointed out that NA EP had much to recommend it in addition to it being drawn from a representative sample; tha t it touched several grades, several different subjects, and allowed international comparisons wer e three reasons that occurred to me at the moment. The project was a small one, as these things go at ETS. It could be completed in only a few weeks of my time, since most of the results wer e already available in one NAEP publication or another. My only contribution was in selecting t he results that seemed suitable, figuring out what were the most likely questions these results w ould be asked to answer, placing them into a form that would allow them to properly answer those questions, and displaying them to make the results accessible. The report was written and went through the usual peer review process at ETS, pursuant to which it was revised and appeared as an ETS Rese arch Report (RR-94-29). Then, in the June 15, 1994 issue of Education Week there was an exten ded commentary by Chester Finn, a founding partner of the Edison Project, a branch of the Whittle Corporation that is seeking to privatize public schools. He felt that I had painte d a too rosy picture of the performance of New Jersey's schools. He was particularly incensed beca use I had presented state school performance both before and after statistical standardization t o a common demographic mixture. He seemed to feel that allowing any formal statistical adjustmen t opened the door to all sorts of mischief. His alternative, as I understand it, is to leave the su mmary scores alone and allow individual users to weigh their component pieces subjectively. This is what Samuel Johnson has referred to as "the ancient method." (Note 1) It is true that an incorrect statistical adjustmen t will promote incorrect inferences. It is also true that not adjusting when one ought to will also promote incorrect inferences. The issue that we must address in this instance is what questions are NAEP data most likely to be asked to illuminate. Once this is determined what kind of st atistical adjustment, if any, will become clear. Let me illustrate this issue with a simple, but re al example: the 1992 NAEP 8th grade math assessment. Nebraska's average score was 277 N ew Jersey's average score was 271. On the face of it, it appears that 8th grade students in N ebraska do better in mathematics than their counterparts in New Jersey. We note further however that when we examine performance by ethnic group we find:
3 of 17WhiteBlack Nebraska281236New Jersey283242 How can this be? Even though Nebraska does better o verall than New Jersey, New Jersey's students in both of the major ethnic groups outperf orm their Nebraska counterparts. This is an example of what statisticians have long called Simp son's Paradox (Wainer, 1986c, Yule, 1903). It is caused by the differences in the ethnic distr ibutions in the two states. WhiteBlack Nebraska87%5%New Jersey61%16% Each state's mean score is a product of the mean sc ore within each ethnic group and its proportional representation in the population. Thus Nebraska's mean is composed of the White mean weighted by 87% and the much lower black mean weighted by only 5%. In New Jersey Whites represent a much smaller segment of the popu lation and so are given a smaller weight in the calculation of the overall mean. If we standardize all states to a common demograph ic mixture, say the demographics of the United States as a whole, we find that New Jers ey's standardized mean is 274 and Nebraska's is 270. Which is the right number? Finn suggests th e it is the unstandardized figure.To answer this we have to know what is the question that the number will be answering. If the question is of the sort, "I want to open a business in either New Jersey or Nebraska. Which state will provide me with a population of po tential employees whose knowledge of mathematics is, on average, higher?" The unadjusted mean scores provide the proper answer. If the question is, "I want to place my child in s chool in either New Jersey or Nebraska. In which state is my child likely to learn more mathem atics?" The standardized scores give the right answer. One can see this immediately by imagining a sequence of questions that someone trying to help the parent phrasing the above question migh t ask."Does your child have a race? If your child is White, he/she is likely to do better in Ne w Jersey. If he/she is Black, he/she is likely to d o better in New Jersey." Presenting the data in a dis aggregated way allows these sorts of questions to be answered specifically, but if a single, overa ll number is needed to summarize the performance of a state's children, for questions li ke this, one must standardize. I contend that NAEP data are gathered to illuminat e the performance of schools. Different schools face tasks of differing difficulty dependin g upon the particular mix of students that attend. If we want to make comparisons among school s that are about the schools and not about the mix of students we must standardize. Not doing so is wrong and misleading, exactly the opposite, if I may be permitted the obiter dictum, of what Mr. Finn suggests. The Report in Question The most critical measure of any educational syste m is the performance of its students. But what yardstick should be used to accomplish thi s measure? The fact that modern education has many goals suggests that we must measure the ex tent of its success in a variety of ways. This report describes the first of a series of researche s that will attempt to characterize the performance of New Jersey's public school system. W e will do this through comparative and
4 of 17absolute measures, the primary instrument of which will be the data gathered during the course of the National Assessment of Educational Progress (NA EP). NAEP is a Congressionally mandated survey of the e ducational achievement of American students and of changes in that achievement across time. Although this survey has been operational for nearly 25 years, it was only in 198 8 that Congress authorized adding state level surveys to the national assessment. This was begun on a trial basis with states participating on a voluntary basis. In 1990 37 states, two territories and the District of Columbia participated in the first Eighth Grade Math Assessment. In 1992 seven m ore states joined the state assessment yielding 44 jurisdictions. The 1992 assessment was expanded to also include the Fourth grade. In this report we shall focus attention only upon the 41 states in the assessment. Guam, the Virgin Islands and the District of Columbia will be explic itly excluded because they are sufficiently different from the states in their size, character and composition so as to distort most comparisons. The assessment methodology is technica lly sophisticated. Through the use of linking items and item response theory, the perform ance of all students participating in the assessment can be placed on the same numerical scal e. Measuring students' growth is thus straightforward. Subtracting 4th grade scores from 8th grade scores is the growth obtained. Consequently the expansion of the assessment to the fourth grade provides us with two important bits of information. First, is a measure of how muc h mathematics Fourth graders know. Second, a measure of how much mathematics is learned betwee n 4th and 8th grade. Note that having a measure of the gain obtained (about 49 NAEP points on average) helps us to interpret the NAEP scale. It tells us that if one state trails another by about 12 points this is about the same as the average gain in one year of school. Thus, when we c ompare California's mean 8th grade NAEP score of 261 to New Jersey's score of 273, we can i nterpret the 12 point difference as indicating that the average California 8th grader performs abo ut the same in mathematics as the average New Jersey 7th grader would have. This helps give a dditional meaning to the numerical scale. More meaning still for the eighth grade math asses sment is yielded by comparing performance on it with performance of 13 year old s tudents in the 1991 International Assessment. Because the NAEP Math assessment was co ordinated with the International Assessment both sets of scores can, with reasonable confidence, be placed on the same scale. This was accomplished by having a common sample of examinees for both assessments. As we shall see, the performance of New Jersey's students compares favorably with those from the developed nations. The Mathematics assessment contained tasks for the students drawn from the framework provided by the Curriculum and Evaluation Standards for School Mathematics, developed by the National Council of Teachers of Mathematics. The co ntent and the structure of the assessment has been widely praised as being representative of the best that current knowledge and technology allows. A full description of the 1992 M athematics Assessment is found in NAEP 1992: Mathematics Report Card for the Nation and th e States (Mullis, Dossey, Owen, & Phillips; 1993).The NAEP State Assessment Sample Within each state 100 public schools are carefully selected to be representative of all public schools in that state. Within each school at least 30 students are chosen at random to be tested ( in larger schools this number can be as la rge as 90). Students (usually of foreign birth) whose English language proficiency is deemed to be insufficient to deal with the test, are excluded from the sampling frame.The Results All results are reported on a uniform scale that c an meaningfully characterize the
5 of 17 performance of students over a very wide range of p roficiency. This scale can be used in a normative manner, for example comparing one state w ith another, or one state with the nation as a whole. Or it can be used as an absolute measure, since expert judges have provided a correspondence between score levels and specific pr oficiencies. These proficiencies are denoted Basic, Proficient and Advanced and what is required to perform at each of these levels obviously increases as the student progresses through school, but are always referred back to the five NAEP content areas. These are: (1) numbers and operation s, (2) measurement, (3) geometry, (4) data analysis, statistics, and probability, and (5) alge bra and functions. For example, a score of 211 is characterized as "B asic Level" fourth grade performance. "Basic Level" is defined as "showing some evidence of understanding the mathematical concepts and procedures of the five NAEP content areas." The second level is called "Proficient" and is located at score 248 and reflects being able to "co nsistently apply integrated procedural knowledge and conceptual understanding to problem s olving in the five NAEP areas." The highest performance level is termed "Advanced," is located at score 280 and reflects the ability to "apply integrated procedural knowledge and conceptu al understanding to complex and nonroutine real-world problem solving in the five N AEP areas." The mean performance of all participating states f or the 8th grade assessment is shown in Figure 1. Figure 1. A stem & leaf display of the 1992 NAEP State Asses sment in 8th Grade Mathematics. These results are the raw (unstandardi zed) means from each state. New Jersey ranks 14th among all participants.NAEP 1992 Mathematics Assessment Overall Proficienc y-8th Grade Mathematics (unstandardized)283Iowa North Dakota282Minnesota281280279278Maine New Hampshire277Nebraska Wisconsin276275274Idaho Wyoming Utah273Connecticut272Colorado Massachusetts271 New Jersey Pennsylvania270Missouri269Indiana268267Michigan Oklahoma Virginia Ohio266Delaware261Kentucky260California South Carolina259Florida New Mexico Georgia258West Virginia Tennessee North Carolina257 Hawaii256255Arkansas254
6 of 17 253252251Alabama250249Louisiana248247246MississippiThe results shown in Figure 1, while accurately ref lecting the actual mean performance within each state, may not be appropriate for certain kind s of state-by-state comparisons. The student populations of each state differ in their demograph ic make-up. As such, some states face more difficult challenges in educating their populations than others. One obvious example of such a situation occurs in states like California, New Jer sey and Florida that have large immigrant populations whose children, even if they do not par ticipate directly in the assessment, require a larger share of instructional resources than native English speakers. In addition, the various subpopulations of students in each state often perf orm very differently from one another. For example, in Figure 2 are displayed the mean perform ance of students in different parts of the country broken down by race/ethnicity. Figure 2 has two important messages: 1. There are very large differences in performance by ethnic group. These differences are much larger than the geographic variation observed. 2. New Jersey's students perform better than the na tional average and all regional averages for all groups. Thus although it is true that New Jersey's African-American and Hispanic students do worse than White students, they do better than Afri can-American and Hispanic students in any region. Figure 2. A stem & leaf depiction comparing the performance N ew Jersey's students, broken down by race/ethnicity, with simil ar groups from all other parts of the country. (Samples of "Asian/Pacific Islanders" were insufficient to obtain accurate estimates for any other regions than the W est.) (Note to Reader: The horizontal spacing of the entr ies in this figure are significant. If you view this figure in a proportional font such as Times or New York it will be distorted. The proper spacing is retained in Courie r.) NAEP 1992 Trial State Assessment Subgroup compariso ns of NJ with other parts of the Nation Race/Ethnicity Grade 8Mathematics White Asian/Pacific IslanderHispanic African-American297NJ 296295294293292291290
7 of 17 289288287NATION286West285284283NJ282281280Central279Northeast278277West276NATION275274273272271270269Southeast................247 NJ 246 Central & West 245 NATION 244243242 NJ 241 Northeast 240239 SoutheastNortheast & Central 238237236 NATION 235234 West 233 SoutheastIn addition to the widely different performances of the various demographic subgroups, the distribution of these groups is not uniform across all states. A brief summary of these distributions is shown in Table 1. As is evident, N ew Jersey's racial/ethnic distribution is rather close to that of the nation as a whole. The central states are the most deviant in the sense that they have a substantially larger proportion of their stu dent population that is White. Table 1. The national and regional racial/ethnic distributio n. NAEP 1992 Trial State Assessment Percentage Race/Et hnic Representation in NJ Compared to that in other parts of the Nation Race/Ethnicity
8 of 17 Mathematics WhiteBlackHispanicAsianOther NATION69171022Northeast 6820921Southeast 6329512Central 7911712West 65111653NJ 67141351 If we wish to use such data to draw inferences abo ut the relative efficacy of a state's schools, it is considered good practice to statisti cally adjust for the demographic differences. Why is it helpful to make such adjustments? It is b eyond the goals of this report to investigate fully why there are differences in performance by d emographic group, although there is a rich literature of fact and conjecture that attempts to do so (Note 2). However, to understand why we need to make a stati stical adjustment it is important to provide some explanation. To do so requires that we draw the important distinction between education and schooling. The school is only one age ncy among many -family, church, neighborhood, mass media -that provides children with windows on the world. Mass schooling was invented because families could no longer perfo rm essential educational functions. "Once upon a time schools could proceed on the only partl y fictitious assumption that, in their efforts to teach children, they were supported by relatively s table families, and by neighborhoods that enforced elementary standards of civility." (Note 3 ) Not only is this much less true now than in the past, but also it is less true within some demo graphic groups than others. NAEP measures education and not just schooling, but inferences ab out the differences among states are explicitly about schools. To add some validity to those infere nces we must try, statistically, to place all states on a level playing field with respect to the ir children's nonschool educational opportunities. Adjusting for differences in demographic groupings is a crude beginning. What follows is a methodology that recognizes that such differences exist, and a statistical technology that attempts to partition state differe nces due to demographics from those due to differences in school performance.Interpretable comparisons through statistical stand ardization The between-state comparisons that are implicit in Figure 1 can yield misleading inferences if one is not acutely aware of the differences in the demographic make-up of all of the constituent units. This is of such a complex nature that it is impossible to keep things straight without some formal adjustment. One accepted way to accomplish t his adjustment is termed "standardization" (Mosteller & Tukey, 1977, p. 223). The basic idea o f standardization is to choose some specific demographic proportions as the standard and then es timate each state's mean proficiency on that specific mixture. In this instance it is sensible t o choose the configuration of the entire United States as the standard mixture. Thus the estimated score for each state will be the answer to the question, "What would the national average be if al l children went to school in this state?" How is this adjustment accomplished? It is very si mple in theory, although sometimes, because of peculiarities in sampling weights, a bit tricky to execute. We take the mean score obtained in a state for a particular subgroup and m ultiply it by that subgroup's proportional representation in the standard (national) mixture. Do this for all subgroups and the resulting score is the adjusted one. So far we have reported New Je rsey's scores for four racial/ethnic groups. As we have seen, because New Jersey's demographics are so much like the national standard this
9 of 17 sort of adjustment would have little effect. A grea ter effect would be on more disparate areas (i.e. the central US,). Is it sufficient to adjust simply on the basis of this one demographic variable? No, although if we adjust on too many variables, an d so include some irrelevant ones, no damage will be done, since irrelevant variables will typic ally not show differences in performance. In this paper we adjust on three variables. These are:1. Race/ethnicity Five categories: White, African-American, Hispani c, Asian/Pacific Islander, Other2. Type of community Four categories: Extreme Rural, Advantaged Urban, Disadvantaged Urban, Other.3. Limited English Proficiency Two categories: Yes, No This resulted in dividing each state up into forty pieces corresponding to the forty possible combinations (5X4X2) and calculating the mean profi ciency within each of those 40 groups. These 40 means were then weighted by their represen tation in the entire nation yielding a standardized score for each state. The results of t his standardization are shown in Figure 3. Figure 3. After standardization New Jersey ranks fourth amon g all participating states in the 1992 8th grade mathematics assessment NAEP 1992 Trial state Assessment (Standardized for demographic differences) Grade 8Mathematics278North Dakota277276275Iowa Minnesota274New Jersey Maine New Hampshire273Idaho 272Connecticut 271Massachusetts Wisconsin270Nebraska Wyoming Utah269Texas 268Colorado Pennsylvania New York Virginia Missouri267Arizona California Maryland Indiana Michigan266NATION 265South Carolina Oklahoma Ohio264Delaware New Mexico Florida263Rhode Island262Georgia 261260Kentucky 259North Carolina258257Hawaii Tennessee256255Alabama Louisiana Arkansas West Virginia254Mississippi After standardization to national demographic norm s we find that although New Jersey's
10 of 17 mean score has not changed much, ten other states, with more homogeneous student populations (e.g., Massachusetts, Wisconsin, Iowa, Idaho, North Dakota) that had previously been slightly higher are now ranked equal to or below New Jersey. What is the point of standardization? There are ma ny reasons. So far we have mentioned just one -making comparisons between states on th e basis of their children's performance on the same tasks and not on the differences in the demogr aphic structure of their population. A second, and oftentimes more important use of standardized s cores is in easing the difficulties in making inferences about changes that occur within a state across time. When changes do occur the standardized scores assure that the change reflects changes in the students' performance and not changes in the demographic structure of the state. We expect that as time goes on this will be the aspect of greatest value of the standardization. (N ote 4) A natural question to ask is, "At what age do the differences observed among the states manifest themselves?" If we see the same difference between two states in 4th grade as we do in 8th, it implies that the lower scoring state needs to place more emphasis on learning in lower grades. If the difference observed in 4th grade gro ws proportionally larger in 8th it means that the deficit is spread throughout the years of school an d a more systemic change is needed for improvement. Trying to make inferences of this sort based on just two time points is risky, but is certainly instructive. Shown in Figure 4 are the st andardized scores for the 1992 4th grade math assessment. A comparison with the 8th grade ranking s shown in Figure 3 indicates that the positions established in 4th grade are maintained a nd the differences observed between states increase. The range of 24 NAEP points observed betw een the relatively extreme states of North Dakota and Mississippi in 8th grade was 13 NAEP poi nts in 4th grade. One way to interpret this is that the average Mississippi 4th grader was a ye ar behind the average North Dakota 4th grader in math, and by the time they both reached 8th grad e this deficit had increased to two years. Figure 4. The standardized scores for the 41 states in the 1992 4th grade math assessment.NAEP 1992 Trial state Assessment (Standardized for demographic differences) Grade 4 Mathematics227 New Hampshire226 Maine225224Connecticut223New Jersey Iowa Wisconsin222 North Dakota Pennsylvania221 Minnesota Texas Wyoming Virginia Massachusetts220 Nebraska Missouri New York219 Maryland Georgia218 Colorado Idaho Indiana Michigan Delaware Oklaho ma 217NATION Utah Ohio Arizona216South Carolina215 New Mexico North Carolina214 Rhode Island Florida213Kentucky212 California West Virginia211 Hawaii210 Tennessee Alabama Arkansas Louisiana209 Mississippi
11 of 17 By subtracting the scores shown in Figure 4 from t hose in Figure 3 we obtain estimates of the average growth exhibited in each state. This re sult is shown in Figure 5 below. All states' scores are standardized to the demographic structur e of the nation as a whole. Thus were these results longitudinal rather than crosssectional, we would be able to interpret the changes as due entirely to growth and not demographic changes. As they are now constituted these changes in scores are due to differences in performance and no t to demographic differences in the two grades. Figure 5. Standardized estimates of change in mathematics per formance seen by state between 4th and 8th grade in the 1992 assessm ent. New Jersey's gain was the seventh largest. Gain in Mathematics Proficiency from 4th to 8th gra de (Scores are standardized to entire US population)56North Dakota 55California Idaho54Minnesota53Utah52Iowa51New Jersey50Arizona Colorado Florida Massachusetts Nebraska49 NATION Indiana Michigan New Mexico Rhode Island Wyo ming South Carolina Ohio Texas 48Connecticut Maine Wisconsin New York Maryland Mis souri 47Kentucky Oklahoma Virginia Tennessee New Hampshir e 46Delaware Hawaii Pennsylvania45 Alabama Arkansas Louisiana Mississippi44North Carolina43Georgia West VirginiaInternational Comparisons As mentioned previously, the 1991 International As sessment contained enough NAEP items to allow accurate comparisons. The most newsw orthy result was that the United States finished near the bottom in this assessment, finish ing ahead of Jordan but behind all of the participating developed nations. This was (properly ) viewed with alarm. But, as we have seen in the preceding figures, there is tremendous variatio n within the United States. Shown in Figure 6, are the results of this assessment augmented by the inclusion of New Jersey (standardized to national demographics). As is evident, New Jersey's students' performance was sixth among all nations participating in the assessment. Further de tails of the International Assessment can be found in Salganik, Phelps, Bianchi, Nohara, & Smith (1993). Figure 6. Placing New Jersey explicitly into the 1991 Interna tional Assessment shows that its students performed above the average level of most developed nations. International 1991 Mathematics Assessment (Predicted Proficiency for 13 year olds)285Taiwan
12 of 17284283Korea282281280279Soviet Union Switzerland278277Hungary276275274New Jersey273France272Italy Israel271270Canada269Ireland Scotland268267266Slovenia265264263Spain262United States...246JordanInterpretation of this figure is helped by remember ing that, on average, students advance roughly 12 NAEP points a year. Thus the average student in Taiwan and Korea is about a year ahead of the average New Jersey student, who is within a mon th or two of the other developed nations. Thus we see rather dramatically that because of th e great diversity within the United States looking at just an overall figure for the en tire country provides an incomplete and, for some purposes, misleading picture. Because New Jers ey's score is standardized to the demographic structure of the entire nation one can interpret this result as what the nation's location would have been if all of the states' educ ational systems performed as well as New Jersey's.Within State Variation We have seen that the variation among states (roug hly 30 NAEP points from highest to lowest) makes interpretation of a national mean of limited value. In the same way, the variation within states dwarfs the variation between them. In most states the average score obtained by the lowest 10% of the students is more than 90 points b elow the score obtained by the top 10%. (Note 5) 90 points is an enormous gulf. Before tryi ng to understand the reasons for this great disparity (with an eye toward developing strategies for ameliorating it) it will be useful to continue this series of comparisons for one importa nt segment of the population --the very top. In Figure 7 is a comparison of the performance of the top 5% in the 1992 8th grade math assessment with the top 5% of the various OECD coun tries. We see immediately that New Jersey's top 8th grade students compare favorably w ith their counterparts throughout the world.
13 of 17 The United States as a whole has also improved rela tive to the other countries, but still lags the other developed nations by from 3 to 12 months. Figure 7. New Jersey's top students rank third in the world i n the 8th grade math assessment. International 1991 Mathematics Assessment(Predicted Proficiency for 95th %ile of 13 year old s)345 Taiwan...335Korea334333332331330329328New Jersey327326Hungary325324 Soviet Union323322Switzerland321320319France318317Israel Italy316Ireland315Scotland Canada314313312United States311Slovenia310309308307306Spain...296JordanSummary & Conclusions
14 of 17 This report is the beginning of a series that exam ines of the performance of New Jersey's school children relative to other children within t he United States and world-wide. The measure of performance used was the 1992 National Assessmen t of Educational Progress mathematics exams and their linked versions used in the 1991 In ternational Assessment. This is done in the ardent belief that the efficacy of schools must be measured by the performance of their students. We chose NAEP for several reasons, three of which w ere: 1. It is composed of test items that satisfy the b est of current wisdom with respect to both their content and their form. 2. The psychometric model underlying the scoring of NAEP yields a single scale on which not only can the fourth grade and eighth graders be characterized, but also the 13 year olds from the OECD countries from around the world. 3. The students sampled by the NAEP are drawn in a principled way from the populations of interest. This in sharp contrast to the sorts of self-selected samples that are represented by stat e means of such college admission tests as the SAT an d the ACT. It is well known that trying to draw inferences of useful accuracy from such self-s elected samples is impossible (Wainer, 1986a, b; 1989a, b). We concur with prevailing expert opinion that of a ll broad-based tests NAEP provides the most honest and accurate estimates of the performan ce of the students over the broad range of jurisdictions sampled. We found that, based on the unstandardized results of the 1992 Mathematics Assessment, New Jersey was among the highest performing states. Once these results were standardized to reflect a single (national) demographic composition New Jersey's rank among the participating states increased to fourth. The United States finis hed next to last when the performance of its students was compared with that of the students in the other 14 participating OECD nations in the 1991 International Assessment. New Jersey's stu dents were ranked sixth on the same assessment when their performance was placed on the same scale. However New Jersey's best students, its top 5%, when compared with the perfor mance of the top 5% of all other OECD nations, ranked third; trailing only Taiwan and Kor ea. Notes This research was supported by the New Jersey Educ ation Association. I am pleased to acknowledge their help. Furthermore I am grateful f or the advice and help of John Fremer, Gene Johnson, Philip Leung and John Mazzeo.1. Samuel Johnson "To count is modern practice, the ancient method was to guess"; but even Seneca was aware of the difference-"Magnum esse sol em philosophus probabit, quantus sit mathematicus."2. The Coleman report (Coleman et al., 1966) remain s the most encyclopedic of such investigations, summarizing, as it does, the perfor mance of more than 645,000 children in 4,000 public schools. It arrives at the not surprising co nclusion that family and economic variables drive educational achievement.3. This quote and much of the surrounding logic com es from Marvin Bressler's delightful and wise 1992 essay, "A teacher reflects."4. A caveat: Big changes as a result of a statistic al adjustment tell us that great care must be exercised in making inferences. A careful compariso n of Figure 1 and Figure 3 reveals that most states do not change very much. This is evidence th at the standardization is generally behaving as it ought, for if one disagrees with the structure o f the adjustment one can still be content that it
15 of 17isn't changing anything drastically. A notable exce ption to this would be the District of Columbia, whose small size and atypical demographic structure would combine to yield an enormous shift. Inferences about the meaning of its standardized location ought not be the same as those drawn about the states. For the more impor tant purpose of tracking changes in a jurisdiction's performance over time, it is probabl y prudent to develop a special standardization for each of the four most unusual jurisdictions (DC Hawaii, Guam, Virgin Islands). 5. In all but one of the OECD countries this gulf b etween the 10th percentile and the 90th is somewhat smaller, about 70 points. Taiwan is the lo ne exception a difference of 96 points.ReferencesBressler, M. (1991). Reflections on teaching. In Te aching at Princeton. Princeton, NJ: Princeton University.Bressler, M. (1992). A teacher reflects. Princeton Alumni Weekly, 93(5), 11-14. Coleman, J. S. et al (1966). Equality of Educationa l Opportunity. Washington, D.C.: U.S. Office of Education.Finn, C.E. (June 15, 1994). Drowning in Lake Wobego ne. Education Week, P. 31,35. Mosteller, F., & Tukey, J. W. (1977). Data analysis and regression. Reading, MA: Addison-Wesley.Mullis, I. V. S., Dossey, J. A., Owen, E. H. & Phil lips, G. W. (1993). NAEP 1992: Mathematics Report Card for the Nation and the States. Report N o. 23-ST02. Washington, DC: National Center for Education Statistics.Salganik, L. H., Phelps, R. P., Bianchi, L., Nohara D., & Smith, T. M. (1993). Education in States and Nations: Indicators Comparing U.S. State s with the OECD Countries in 1988. NCES Report No. 93-237. Washington, DC: National Center for Education Statistics. Wainer, H. (1986a). Drawing inferences from self-se lected samples. New York: Springer-Verlag. Wainer, H. (1986b). Five pitfalls encountered while trying to compare states on their SAT scores. Journal of Educational Measurement, 23, 69-81.Wainer, H. (1986c). Minority contributions to the S AT score turnaround: An example of Simpson's paradox. Journal of Educational Statistic s, 11, 229-244. Wainer, H. (1989a). Eelworms, bulletholes & Geraldi ne Ferraro: Some problems with statistical adjustment and some solutions. Journal of Education al Statistics, 14, 121-140 (with discussions). Reprinted in Shaffer, J. P. (Ed.) (1992). The role of models in nonexperimental social science (pps. 129-148). Washington, D.C.: American Educatio nal Research Association & American Statistical Association.Wainer, H. (1989b). Responsum. Journal of Education al Statistics, 14, 187-200. Reprinted in Shaffer, J. P. (Ed.) (1992). The role of models in nonexperimental social science (pps. 195-207). Washington, D.C.: American Educational Research Ass ociation & American Statistical Association.
16 of 17 Yule, G. U. (1903). Notes on the theory of associat ion of attributes of statistics. Biometrics, 2, 121-134.About the Author Howard Wainer Howard Wainer received his Ph.D. from Princeton Un iversity in 1968, after which he was on the faculty of the University of Chicago. He wor ked at the Bureau of Social Science Research in Washington during the Carter Administration, and is now Principal Research Scientist at the Educational Testing Service. He was awarded the Edu cational Testing Service's Senior Scientist Award in 1990 and was selected for the Lady Davis P rize. He is a Fellow of the American Statistical Association. His latest book, Visual Revelations will be published by Copernicus Books (a division of Springer-Verlag) in April of 1 997.Copyright 1994 by the Education Policy Analysis ArchivesEPAA can be accessed either by visiting one of its seve ral archived forms or by subscribing to the LISTSERV known as EPAA at LISTSERV@asu.edu. (To sub scribe, send an email letter to LISTSERV@asu.edu whose sole contents are SUB EPAA y our-name.) As articles are published by the Archives they are sent immediately to the EPAA subscribers and simultaneously archived in three forms. Articles are archived on EPAA as individual files under the name of the author a nd the Volume and article number. For example, the article by Stephen Kemmis in Volume 1, Number 1 of the Archives can be retrieved by sending an e-mail letter to LISTSERV@a su.edu and making the single line in the letter rea d GET KEMMIS V1N1 F=MAIL. For a table of contents of the entire ARCHIVES, send the following e-mail message to LISTSERV@asu.edu: INDEX EPAA F=MAIL, tha t is, send an e-mail letter and make its single line read INDEX EPAA F=MAIL.The World Wide Web address for the Education Policy Analysis Archives is http://olam.ed.asu.edu/epaa Education Policy Analysis Archives are "gophered" at olam.ed.asu.edu To receive a publication guide for submitting artic les, see the EPAA World Wide Web site or send an e-mail letter to LISTSERV@asu.edu and include the single l ine GET EPAA PUBGUIDE F=MAIL. It will be sent to you by return e-mail. General questions about ap propriateness of topics or particular articles may be addressed to the Editor, Gene V Glass, Glass@asu.ed u or reach him at College of Education, Arizona Sta te University, Tempe, AZ 85287-2411. (602-965-2692)Editorial Board John CovaleskieSyracuse UniversityAndrew Coulson Alan Davis University of Colorado--DenverMark E. Fetlermfetler@ctc.ca.gov Thomas F. GreenSyracuse UniversityAlison I. Griffithagriffith@edu.yorku.ca Arlen Gullickson firstname.lastname@example.org Ernest R. Houseernie.email@example.com
17 of 17Aimee Howleyess016@marshall.wvnet.edu Craig B. Howley firstname.lastname@example.org William Hunterhunter@acs.ucalgary.ca Richard M. Jaeger email@example.com Benjamin Levinlevin@ccu.umanitoba.ca Thomas Mauhs-Pughthomas.firstname.lastname@example.org Dewayne Matthewsdm@wiche.edu Mary P. McKeowniadmpm@asuvm.inre.asu.edu Les McLeanlmclean@oise.on.ca Susan Bobbitt Nolensunolen@u.washington.edu Anne L. Pembertonapembert@pen.k12.va.us Hugh G. Petrieprohugh@ubvms.cc.buffalo.edu Richard C. Richardsonrichard.email@example.com Anthony G. Rud Jr.firstname.lastname@example.org Dennis Sayersdmsayers@ucdavis.edu Jay Scribnerjayscrib@tenet.edu Robert Stonehillrstonehi@inet.ed.gov Robert T. Stoutaorxs@asuvm.inre.asu.edu