USFDC Home  USF Electronic Theses and Dissertations   RSS 
Material Information
Subjects
Notes
Record Information

Full Text 
xml version 1.0 encoding UTF8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd leader nam Ka controlfield tag 001 001911226 003 fts 005 20071004145437.0 006 med 007 cr mnuuuuuu 008 071004s2005 flu sbm 000 0 eng d datafield ind1 8 ind2 024 subfield code a E14SFE0001185 040 FHM c FHM 035 (OCoLC)173666306 049 FHMM 090 LB1139.23 (ONLINE) 1 100 Hunsader, Patricia D. 0 245 Lessons learned about boys' and girls' mathematical problem solving : b the solution processes, performance, linguistic explanations, selfefficacy, and selfassessment of fifthgrade students of varying reading and mathematics abilities h [electronic resource] / by Patricia D. Hunsader. 260 [Tampa, Fla] : University of South Florida, 2005. 3 520 ABSTRACT: The purpose of this nonexperimental, causalcomparative study was to examine how gender, reading ability, and mathematics ability differentially moderated students' mathematical problemsolving processes, linguistic explanations of those solution processes, achievement on a mathematical problemsolving test, selfefficacy on a selfreported rating scale, and selfassessment on a selfreported rating scale. The investigation addressed five research questions. First, to what extent are students' mathematical problemsolving processes related to gender? Second, to what extent is children's performance in mathematical problem solving related to gender, reading ability, and mathematics ability? Third, to what extent is the quality of students' linguistic explanations of their problemsolving processes related to gender, reading ability and mathematics ability? Fourth, to what extent is the relationship between students' feelings of selfefficacy when commencing work on a ^mathematical problem and their actual performance related to gender, reading ability and mathematics ability? Finally, to what extent is students' ability to assess their own performance on mathematical problemsolving tasks related to gender, reading ability, and mathematics ability? After being briefed in the use of the selfefficacy and selfassessment reporting scales, the students from 16 fifthgrade classrooms were tested with a 12item mathematical problemsolving test. The final sample consisted of 237 students, 129 boys and 108 girls. All student responses were scored for performance and linguistic explanation using holistic rubrics, and were coded according to the solution process employed. The results indicate that gender does not play a significant role in students' choice of problemsolving processes. As expected, mathematics ability was significantly related to performance as was reading ability. Gender was not found to be a significant predictor of performance.^^ Reading ability and mathematics ability were both strongly related to the quality of students' linguistic explanations of their problemsolving processes, but gender was not. Boys consistently exhibited higher levels of selfefficacy, but girls were more accurate in their selfefficacy feelings. Reading ability was also found to be a significant predictor of the accuracy of students' selfefficacy feelings, but mathematics ability was not. Reading ability was found to be the strongest predictor of the accuracy of students' selfassessment, with gender also showing a significant relationship. 502 Dissertation (Ph.D.)University of South Florida, 2005. 504 Includes bibliographical references. 516 Text (Electronic dissertation) in PDF format. 538 System requirements: World Wide Web browser and PDF reader. Mode of access: World Wide Web. 500 Title from PDF of title page. Document formatted into pages; contains 482 pages. Includes vita. 590 Adviser: James R. King, Ph.D. 653 Mathematics. Problem solving. Gender. Reading ability. Mathematics ability. Selfefficacy. Selfassessment. Selfregulation. 690 Dissertations, Academic z USF x Early Childhood Education Doctoral. 773 t USF Electronic Theses and Dissertations. 4 856 u http://digital.lib.usf.edu/?e14.1185 PAGE 1 Lessons Learned About Boys and Girls Mathematical Problem Solving: The Solution Processes, Performance, Lingui stic Explanations, SelfEfficacy, and SelfAssessment of FifthGrade Students of Varying Reading and Mathematics Abilities by Patricia D. Hunsader A dissertation in partial fulfillment of the requirement s for the degree of Doctor of Philosophy Department of Childhood Education College of Education University of South Florida CoMajor Professor: James R. King, Ed. D. CoMajor Professor: Deni sse R. Thompson, Ph. D. Jeffrey D. Kromrey, Ph. D. Susan Homan, Ph. D. Gladis Kersaint, Ph. D. Date of Approval: June 6, 2005 Keywords: mathematics, problem solv ing, gender, reading ability, mathematics ability, selfefficacy, selfassessment, selfregulation Copyright 2005, Patricia D. Hunsader PAGE 2 DEDICATION This work is dedicated to my family. To my husband and best friend Mike, your unfailing support and encouragement are what prompted me to begin this journey, and you have remained steadfast to the end. After nearly twentyfive years together you still listen to me go on and on and on. Thank you. To my children, Jared, Zach, Johnny, Tyler, and Alyssa, thanks for reminding me every day in all your little ways that there is mo re to life than work. Together, you make every day of my life a joyous adventure. PAGE 3 ACKNOWLEDGEMENTS I would like to thank my committee fo r truly being my dissertation team. To Dr. King, thank you for your wisdom and encouragement. Your belief in me inspires me to keep reaching. To Dr Thompson, thank you for applying your phenomenal editing skills to so many vers ions of this manuscript. You have given so generously of your most precious resource, your time. Dr. Kromrey, for someone of such great intellect, you touc h me with your humility. Thank you for always being kind and gentle in your corre ctions. To Dr. Homan, without you, I would not have even begun this program. Thanks for being my advocate from the first day to the last. To Dr. Kersai nt, thank you for teac hing me so much. I could not imagine writing this literatur e review without the experience you provided. I see pieces of what I have gained from each of you throughout this manuscript. Each one of you has been a blessing to me. PAGE 4 TABLE OF CONTENTS LIST OF TABLES................................................................................................vii LIST OF FIGURES...............................................................................................x ABSTRACT..........................................................................................................xi CHAPTER 1. INTRODUCTION...........................................................................1 Statement of the Problem..........................................................................8 Purpose of the Study.................................................................................9 Research Questions................................................................................10 Significance of the Study.........................................................................12 Limitations of the Study............................................................................13 Assumptions of the Study........................................................................15 Definitions................................................................................................16 Organization of the Manuscript................................................................18 CHAPTER 2. REVIEW OF LITERATURE.........................................................20 Gender Differences in Mathematics.........................................................20 Gender Differences in Mathematical Performance........................22 Studies Focused on Students in the United States............22 CrossNational Studies.......................................................25 Gender Differences in Test Item Performance..............................29 Item Format........................................................................29 Item Content.......................................................................33 Gender Differences in Strategy Use..............................................34 Gender Differences in Affect.........................................................37 Summary of Gender Differences in Mathematics..........................39 Reading in Problem Solving.....................................................................43 The Relationship Between Reading and Mathematics..................43 Challenges of Reading Mathematical Text....................................45 Complexity of Word Problems............................................45 Comparison to Other Forms of Text...................................46 Vocabulary.........................................................................48 Symbols..............................................................................49 Other Challenges...............................................................50 Readability....................................................................................52 i PAGE 5 Reading Comprehension and Word Problems..............................54 Highlighted Research on Reading and Problem Solving...............57 Problem Wording Research...............................................57 Disabilities Research..........................................................60 Accommodations Research................................................65 Reading Comprehension Research...................................68 Gender Differences in Reading/Literacy.......................................70 Sources of Boys Underachievement.................................71 Highlighted Research on Gender Differences in Reading/Literacy.............................................................75 Summary of Reading Research....................................................77 SelfRegulated Learning (SRL)................................................................78 The Concept and Nature of SRL...................................................78 Individual Differences in SRL........................................................86 Classroom Applications of SRL.....................................................87 SRL in the Field of Mathematics...................................................89 Proficient Problem Solvers.................................................90 Less Proficient Problem Solvers.........................................91 Highlighted SRL Research Findings.............................................93 Summary of SelfRegulated Learning...........................................95 SelfEfficacy.............................................................................................96 SelfEfficacy Measurement and Relationship to Performance......98 SelfEfficacy Versus SelfConcept..............................................102 SelfEfficacys Role in SelfRegulation........................................103 SelfEfficacy and MathRelated Career Choices.........................104 Intervention Efforts for Performance and Career Choices...........106 Gender and SelfEfficacy............................................................107 SelfEfficacy Summary................................................................109 SelfAssessment....................................................................................112 Highlighted Research on SelfAssessment.................................114 Summary of SelfAssessment.....................................................117 Written Communication in Problem Solving...........................................119 Challenges and Benefits of Written Communication...................120 Metacognition in Writing..............................................................122 Writing as Assessment Data.......................................................123 Highlighted Research on Writing and Problem Solving...............125 Gender Differences in Writing.....................................................127 Summary of Writing.....................................................................129 How the Review of Literature Informed the Proposed Study.................131 CHAPTER 3. METHODS................................................................................133 Design....................................................................................................133 Population and Sample Selection..........................................................137 Participant Demographics......................................................................143 Participation Levels by School and Gender................................143 Treatment of the Data.................................................................144 ii PAGE 6 Final Sample Demographics.......................................................147 Instrumentation......................................................................................151 Description of the Testing Instrument..........................................151 Measuring SelfEfficacy and SelfAssessment............................152 Development of the Test.............................................................153 Validity........................................................................................158 Reliability of the Data.............................................................................160 Cronbach Alpha.........................................................................160 Interrater Reliability Training and Scoring..................................161 Member Check Interviews..........................................................165 ProblemSolving Processes............................................168 SelfEfficacy....................................................................169 SelfAssessment.............................................................170 Data Collection......................................................................................171 Organization of Testing...............................................................172 Analysis of Data.....................................................................................175 Research Question One..............................................................175 Research Questions Two and Three...........................................179 Research Questions Four and Five.............................................183 Summary...............................................................................................185 CHAPTER 4. FINDINGS..................................................................................187 Question One: Findings for ProblemSolving Processes.......................187 Coding and Emergent Themes..................................................188 Descriptive Statistics..................................................................193 Inferential Statistics....................................................................195 Question Two: Findings for Mathematical Performance........................198 Descriptive Statistics..................................................................199 Descriptive Analysis by Question....................................199 Analysis of Overall Sample.....................................199 Analysis by Group...................................................201 Descriptive Analysis of Total Mathematics Performance Scores...................................................203 Correlations Between Mathematics Performance and Other Variables....................................................205 Inferential Statistics....................................................................209 Factorial ANOVA Results................................................209 Multiple Regression Analysis..........................................215 Introduction.............................................................215 Descriptive Statistics...............................................216 Multiple Regression Results...................................216 Assumptions...........................................................219 Conclusion..............................................................220 Question Three: Findings for Linguistic Explanations............................220 Descriptive Statistics..................................................................221 Descriptive Analysis by Question....................................221 iii PAGE 7 Analysis of Overall Sample.....................................221 Analysis by Group...................................................223 Descriptive Analysis of Total Linguistic Explanation Scores..........................................................................224 Inferential Statistics....................................................................226 Factorial ANOVA Results................................................226 Multiple Regression Analysis..........................................231 Introduction.............................................................231 Multiple Regression Results...................................232 Assumptions...........................................................234 Conclusion..............................................................235 Qualitative Analysis of Students Linguistic Explanations...........236 Question Four: Findings for SelfEfficacy...............................................239 Descriptive Statistics..................................................................241 Descriptive Analysis of Total SelfEfficacy Scores..........241 Overall Analysis of Groups.....................................241 Analysis of Overall Sample by Question.................241 Analysis of Groups by Question..............................244 Descriptive Analysis of SelfEfficacy Mean Bias.............245 Overall Analysis of Groups.....................................245 Analysis of Overall Sample by Question.................248 Analysis of Groups by Question..............................249 Inferential Statistics....................................................................250 Factorial ANOVA Results................................................250 Multiple Regression Analysis..........................................255 Introduction.............................................................255 Multiple Regression Results...................................255 Assumptions...........................................................258 Conclusion..............................................................259 Qualitative Analysis of Students Rationale for SelfEfficacy Ratings....................................................................................259 Question Five: Findings for SelfAssessment........................................263 Descriptive Statistics..................................................................264 Descriptive Analysis of Total SelfAssessment Scores...264 Overall Analysis of Groups.....................................264 Analysis of Overall Sample by Question.................266 Analysis of Groups by Question..............................267 Descriptive Analysis of SelfAssessment Mean Bias......268 Overall Analysis of Groups.....................................268 Analysis of Overall Sample by Question.................271 Analysis of Groups by Question..............................273 Inferential Statistics....................................................................273 Factorial ANOVA Results................................................273 Multiple Regression Analysis..........................................277 Introduction.............................................................277 Multiple Regression Results...................................278 iv PAGE 8 Assumptions...........................................................281 Conclusion..............................................................282 Teachers Assessment of Their Students Performance.............283 CHAPTER 5. SUMMARY, CONCLUSIONS, IMPLICATIONS, AND RECOMMENDATIONS.................................................................................284 Summary of the Study...........................................................................284 Conclusions and Implications................................................................293 Research Question One: ProblemSolving Processes..............293 ProblemSolving Process Conclusions...........................293 ProblemSolving Process Implications............................295 Research Question Two: Mathematical Performance...............296 Mathematical Performance Conclusions.........................296 Mathematical Performance Implications.........................298 Research Question Three: Linguistic Explanations...................299 Linguistic Explanation Conclusions.................................299 Linguistic Explanation Implications..................................301 Research Question Four: SelfEfficacy......................................302 SelfEfficacy Conclusions................................................302 SelfEfficacy Implications................................................304 Research Question Five: SelfAssessment...............................306 SelfAssessment Conclusions.........................................306 SelfAssessment Implications.........................................308 Recommendations for Practice..............................................................309 Recommendations for Future Research................................................317 REFERENCES.................................................................................................322 APPENDIX A. FCAT Mathematics Performance Scoring Rubric....................357 APPENDIX B. Amended Rubric for Scoring Linguistic Explanations..............359 APPENDIX C. Pilot Study Rubric for Scoring Linguistic Explanations............361 APPENDIX D. Sample Student Work from Pilot Study to Illustrate Scoring.................................................................................363 APPENDIX E. FCAT Achievement Levels......................................................366 APPENDIX F. Pilot Study...............................................................................368 APPENDIX G. Teacher Rating Sheet..............................................................390 APPENDIX H. Parental Informed Consent......................................................392 APPENDIX I. Informed Consent Cover Letter...............................................397 APPENDIX J. Problem Solving Test Instrument.............................................399 APPENDIX K. ItemSpecific Rubrics for Mathematics Scoring.......................414 APPENDIX L ItemSpecific Rubrics for Linguistic Explanation Scoring.........427 APPENDIX M. Solution Processes Identified for Each Question.....................440 APPENDIX N. Descriptive Statistics for Mathematics Performance by Question by Group............................................................454 APPENDIX O. Descriptive Statistics for Linguistic Explanations by Question by Group............................................................458 v PAGE 9 APPENDIX P. Descriptive Statistics for SelfEfficacy by Question by Group..462 APPENDIX Q. Descriptive Statistics for SelfEfficacy Mean Bias by Question by Group............................................................466 APPENDIX R. Descriptive Statistics for SelfAssessment by Question by Group................................................................................470 APPENDIX S. Descriptive Statistics for SelfAssessment Mean Bias by Question by Group............................................................474 APPENDIX T. Multiple Regression Summary Tables Including Interactions..478 ABOUT THE AUTHOR............................................................................End Page vi PAGE 10 LIST OF TABLES Table 1 Results of Mathematical Gender Differences Studies Grouped by Topic..........................................................40 Table 2 Results of Reading Studies Grouped by Topic.............................79 Table 3 Results of SelfRegulation in Mathematics Studies.......................96 Table 4 Results of SelfEfficacy Studies Grouped by Age of Participants...................................................................110 Table 5 Results of SelfAssessment Studies...........................................118 Table 6 Results of Writing Studies Grouped by Topic..............................130 Table 7 Categorization of Student Ability.................................................142 Table 8 Participation Levels of Boys and Girls by School........................144 Table 9 Cell Distribution of Students for Factorial ANOVA and Multiple Regression Analyses...................................................149 Table 10 Mean FCAT Scores for Sample Grouped by School and by Gender..........................................................................150 Table 11 Description of Test Questions Selected for the Study.................157 Table 12 Summary of Data Sources and Methods....................................176 Table 13 Categories of Themes and ProblemSolving Processes.............190 Table 14 Descriptive Statistics for Coding of ProblemSolving Process Themes.......................................................................194 Table 15 ChiSquare Results for ProblemSolving Process Themes.........196 Table 16 ProblemSolving Process Themes for the Pencil Problem..........197 vii PAGE 11 Table 17 Descriptive Statistics for Mathematics Performance Scores.......200 Table 18 Total Mathematics Performance Scores by Group......................203 Table 19 Teacher Rating of Student Preparedness by Question...............207 Table 20 Factorial ANOVA Results for Total Mathematics Performance Scores.......................................................................................212 Table 21 Summary of Multiple Regression Analysis for Mathematics Performance........................................................217 Table 22 Descriptive Statistics for Linguistic Explanation Scores..............221 Table 23 Total Linguistic Explanation Scores by Group.............................224 Table 24 Factorial ANOVA Results for Total Linguistic Explanation Scores.......................................................................................229 Table 25 Summary of Multiple Regression Analysis for Linguistic Explanations..............................................................232 Table 26 Total SelfEfficacy Scores by Group...........................................242 Table 27 Descriptive Statistics for SelfEfficacy Scores.............................243 Table 28 SelfEfficacy Mean Bias Scores by Group..................................245 Table 29 SelfEfficacy Mean Bias Scores by Question..............................249 Table 30 Factorial ANOVA Results for SelfEfficacy Mean Bias Scores....253 Table 31 Summary of Multiple Regression Analysis for SelfEfficacy Mean Bias............................................................256 Table 32 Total SelfAssessment Scores by Group....................................265 Table 33 Descriptive Statistics for SelfAssessment Scores......................266 Table 34 SelfAssessment Mean Bias Scores by Group...........................269 Table 35 SelfAssessment Mean Bias Scores by Question.......................271 Table 36 Factorial ANOVA Results for SelfAssessment Mean Bias Scores.......................................................................................275 viii PAGE 12 Table 37 Summary of Multiple Regression Analysis for SelfAssessment Mean Bias.....................................................279 Table 38 Categorization of Pilot Students..................................................374 Table 39 Pilot Study Summary of Performance Rubric Scores by Item......................................................................................382 Table 40 Summary of Pilot Data by Gender...............................................388 Table 41 Summary of Pilot Data by Item....................................................389 ix PAGE 13 LIST OF FIGURES Figure 1 Box and Whisker Plot of Total Performance Scores Grouped by Gender, Mathematics Ability, and Reading Ability...................................................................205 Figure 2 Total Mathematics Score Interaction of Reading Ability and Gender...............................................................................214 Figure 3 Box and Whisker Plot of Total Linguistic Explanation Scores Grouped by Gender, Mathematics Ability, and Reading Ability...................................................................225 Figure 4 Box and Whisker Plot of SelfEfficacy Mean Bias Scores Grouped by Gender, Mathematics Ability, and Reading Ability...................................................................247 Figure 5 Box and Whisker Plot of SelfAssessment Mean Bias Scores Grouped by Gender, Mathematics Ability, and Reading Ability...................................................................270 x PAGE 14 Lessons Learned About Boys and Girls Mathematical Problem Solving: The Solution Processes, Performance, Linguistic Explanations, SelfEfficacy, and SelfAssessment of FifthGrade Students of Varying Reading and Mathematics Abilities Patricia D. Hunsader ABSTRACT The purpose of this nonexperimental, causalcomparative study was to examine how gender, reading ability, and mathematics ability differentially moderated students mathematical problemsolving processes, linguistic explanations of those solution processes, achievement on a mathematical problemsolving test, selfefficacy on a selfreported rating scale, and selfassessment on a selfreported rating scale. The investigation addressed five research questions. First, to what extent are students mathematical problemsolving processes related to gender? Second, to what extent is childrens performance in mathematical problem solving related to gender, reading ability, and mathematics ability? Third, to what extent is the quality of students linguistic explanations of their problemsolving processes related to gender, reading ability and mathematics ability? Fourth, to what extent is the relationship between students feelings of selfefficacy when commencing work on a mathematical problem and their actual performance related to gender, reading ability and mathematics ability? Finally, to what extent is students ability to assess their xi PAGE 15 own performance on mathematical problemsolving tasks related to gender, reading ability, and mathematics ability? After being briefed in the use of the selfefficacy and selfassessment reporting scales, the students from 16 fifthgrade classrooms were tested with a 12item mathematical problemsolving test. The final sample consisted of 237 students, 129 boys and 108 girls. All student responses were scored for performance and linguistic explanation using holistic rubrics, and were coded according to the solution process employed. The results indicate that gender does not play a significant role in students choice of problemsolving processes. As expected, mathematics ability was significantly related to performance as was reading ability. Gender was not found to be a significant predictor of performance. Reading ability and mathematics ability were both strongly related to the quality of students linguistic explanations of their problemsolving processes, but gender was not. Boys consistently exhibited higher levels of selfefficacy, but girls were more accurate in their selfefficacy feelings. Reading ability was also found to be a significant predictor of the accuracy of students selfefficacy feelings, but mathematics ability was not. Reading ability was found to be the strongest predictor of the accuracy of students selfassessment, with gender also showing a significant relationship. xi i PAGE 16 CHAPTER 1 INTRODUCTION Unlike times past when a teachers judgment alone was regarded as a complete and accurate assessment of student achievement, todays emphasis has shifted away from classroom evidence of learning to basing conclusions on scores of standardized tests. Although future highstakes assessments in some states will begin to incorporate science and other content areas, the emphasis to date has strictly been on reading, writing, and mathematics (Hunsader, 2002a). Reading and mathematics cannot be separated in the testing arena because mathematical standards, and the tests that are designed to assess them, have deemphasized strict computation in favor of contextualized word problems. These word problems rely on students ability to read and comprehend the problem situation before solving (National Council of Teachers of Mathematics [NCTM], 2000). Before the advent of the NCTM Standards (1989, 2000), the traditional mathematics classroom with its traditional computational assessments stood alone as a subject. However, current practices in the reforminfluenced mathematics classroom connect mathematics to all subjects, and the ability to read, reason, and communicate is essential to success. The No Child Left 1 PAGE 17 Behind legislation calls for more testing than ever before, and the focus on standardsbased testing has resulted in an increase in the proportion of problems set in real world contexts requiring linguistic justification of answers. In todays highstakes testing environment, success in mathematics depends heavily on the integration of skills and processes previously confined to literacy (Adams, 2003). No longer can the teacher of mathematics treat the subject as an island, alienated from other content areas. The reforms encouraged by the National Council of Teachers of Mathematics in their standards publications (NCTM 1989, 2000) call for deemphasizing the teaching of discrete skills, focusing instead on a more integrated problemsolving approach. Unless students can solve problems, the facts, concepts, and procedures they know are of little use (NCTM, 2000, p. 182). Problem solving tasks differ in quality from simple exercises. Simple exercises are generally presented in a strictly symbolic format, requiring only the selection and application of an effective algorithm to complete the solution process. Word problems for which the student has developed such a high level of proficiency that they have become routine would also be considered simple exercises. True problems, however, not only require students to read and comprehend the problem situation, but also necessitate that students analyze alternative solution paths and select a productive strategy, then finally communicate either verbally, in writing, or both, the steps that were used to solve the problem. Whereas exercises are characterized by one correct answer generally arrived at through one specific algorithm, true problems may have multiple correct answers, or 2 PAGE 18 multiple ways of arriving at a single correct answer. In contrast to exercises, problems may require the student to make judgments about procedures, monitor their thinking throughout the solution process, and explain their thinking in writing (Hong, 1995; Krulik & Rudnick; 1987; McIntosh & Draper, 1996; Schoenfeld, 1989). The mathematics reform movement draws attention to the supportive relationship between mathematics and reading, and purports that they should be taught and learned together. The National Council of Teachers of Mathematics argues that knowing mathematics is doing mathematics (1989). Adams (2003) follows that doing mathematics is reading mathematics (p. 794). Because it is the words, symbols, and numerals that give mathematics its substance and framework, these same words, symbols, and numerals must be used to communicate the problem situation to students so that students can use them to perform procedures, solve problems, and explain processes. Reading, in this sense, acts as a gatekeeper to problem solving. In support of this notion, correlational studies show that there is a substantial relationship between childrens mathematical and reading performance (Flem & Sovik, 1993; Hecht, Torgeson, Wagner, & Rashotte, 2001; Sovik, Frostrad, & Lie, 1994). Reformbased problem solving, involving contextualized problems without a prescribed algorithm, requires that students be selfregulated learners who are metacognitively, motivationally, and behaviorally active participants in their own learning (Zimmerman, 1990, p. 4). Metacognitively, selfregulated learners set their own goals, monitor their progress, and selfevaluate throughout the learning 3 PAGE 19 process. In terms of motivation, selfregulated learners feel confident about their ability to be successful, and are intrinsically interested in learning. Behaviorally, selfregulated learners actively work to create environments that are conducive to learning (Zimmerman, 1990). Students level of selfefficacy and their ability to assess their own work with some level of accuracy are important elements in determining the extent to which they successfully engage in selfregulatory strategies. The current economy of the United States requires that workers are able to apply their mathematical knowledge to solve novel and authentic problems (Kramarski, Mevarech, & Arami, 2002). Unfortunately, large numbers of students of all ages fail to demonstrate proficiency in problem solving (Neef, Nelles, Iwata, & Page, 2003), and do not have the skills necessary to solve mathematicsrelated problems in their everyday lives or in future work situations (Bottge, 1999). One way to bring emphasis in the classroom to these critical skills is for states to continue to push for educational accountability through highstakes assessments that measure problemsolving skills. As the reform movement and highstakes testing change the face of mathematics, it is necessary to ensure that all children, regardless of personal characteristics, are still provided opportunities to be successful in mathematical problem solving (NCTM, 2000). This does not imply that all students should receive the same instruction, but that efforts should be made to provide strategic instruction that affords all children the opportunity to succeed mathematically. Unfortunately, girls and boys have not historically reached the same levels of 4 PAGE 20 achievement in mathematics. In past generations, where differences existed between girls and boys, girls scored higher than boys on lower level basic skills while boys outscored girls on higher level reasoning skills, including problem solving (Fennema, 1974). Maccoby (1966) found that gender differences in reasoning ability existed as early as the upper elementary grades in favor of boys. Additionally, these differences continued through high school with boys consistently outperforming girls in measures of higherlevel mathematical reasoning. In recent years, the disturbing trend of boys underachievement in literacy has been gaining attention. Studies show that boys account for 75 to 85 percent of those labeled atrisk for poor achievement progress in literacy (Rowe, 2000), and boys are three to five times more likely than girls to be labeled reading/learning disabled (Young & Brozo, 2001). As reading and writing become more crucial in the mathematics classroom, this development may impact boys performance in highlevel reasoning activities, such as problem solving, that rely on literacy skills. Business Week (Conlin, 2003) reports that from kindergarten to graduate school, girls now outperform boys in reading and overall grades, and are catching up to boys in mathematics. Although the connection between performance in reading and mathematics has not been established here, it certainly warrants further inspection. The research relating reading to problem solving is fairly extensive, but critical gaps still remain. For example, in most studies relating reading ability to problemsolving ability (Fan, Mueller, & Marini, 1994; Fuchs, Fuchs, Eaton, 5 PAGE 21 Hamlett, & Karns, 2000; Helwig, RozekTedesco, Tindal, Heath, & Almond, 1999; Helwig, RozekTedesco, & Tindal, 2002; Johnson, 2000; Jordan, Kaplan, & Hanich, 2002; Jordan & Montani, 1997, Tindal, Heath, Hollenback, Almond, & Harniss, 1998) the problems were read aloud to the students. Although this method does give insight into how accommodations may impact poor readers, it does not allow for any conclusions to be drawn about poor readers independent performance on problemsolving tasks. Another weakness in the research is that most studies have either used multiple examples of the same problem type in their tests (Bernardo, 1999; Fan, Mueller, & Marini, 1994; Leong & Jerred, 2001), or have administered tests containing questions significantly below the students reading and mathematics level (Sovik, Frostad, & Heggberget, 1999). The result is a wealth of data that is either severely limited in its generalizability, or does not allow the researcher to distinguish differences in the problemsolving performance of poor readers versus good readers. Given the trend of boys underachievement in literacy, the most striking gap in the research is the dearth of studies that include gender as a variable in the relationship between reading ability and problem solving. The NCTM Standards (2000) have also resulted in an increase in the use of writing activities in mathematics, and students are now often expected to explain their thought processes while solving problems in the classroom and on highstakes tests (Gurganus & Del Mastro, 1998). This writing may lead to improved conceptual understanding (Quinn & Wilson, 1997), and may serve to reveal misconceptions and gaps in students thought processes that may not be 6 PAGE 22 exposed through traditional assessments (Krulik & Rudnick, 1994; MacGregor, 1990). However, there is scant research that seeks to determine the differential impact on boys and girls of including linguistic explanations as a criterion in the assessment of problem solving. This is cause for concern because in the elementary grades girls tend to score higher than boys on measures of writing proficiency (Knudson, 1995). Genderrelated differences in general problem solving ability have been empirically demonstrated, but the focus of much of the research has been on the product of problem solving (the answer) rather than the solution process (the steps taken to arrive at the answer). When the focus is strictly on product, gender differences in the elementary and middle grades are not evident (Fennema, Carpenter, Jacobs, Franke, & Levi, 1998; Tarte & Fennema, 1995). However, when examining the solution processes children use during problem solving, gender differences appear as early as first grade (Carr, Jessup, & Fuller, 1999; Fennema et al., 1998). Gender related differences in problemsolving processes beyond the primary grades have received very little attention in the literature. When the solution processes have been studied, they have been examined in relation to boys and girls ability to follow a prescribed problemsolving heuristic (Zambo, 1990), rather than in an attempt to uncover inherent differences in their choice of problemsolving processes. The relationship between affect and mathematics achievement has garnered much attention in the research due to the prevalence of affective differences between boys and girls. Girls typically have less favorable beliefs 7 PAGE 23 about their mathematical ability than boys, and boys tend to exhibit more confidence than girls in their mathematical ability, regardless of their actual achievement (Stipek & Gralinski, 1991; Seegers & Boekaerts, 1996; Vermeer, Boekaerts, & Seegers, 2000). None of these studies, however, attempted to relate boys and girls feelings of selfefficacy to actual performance on a peritem basis, even though high selfefficacy has been shown to impact performance positively (Maddux, 2002; Pajares, 1996). Also, none of the studies attempted to gauge students ability to assess their own performance on individual problems. Statement of the Problem The problem investigated in this study was the lack of knowledge about how gender, reading ability, and mathematics ability are related to childrens problem solving processes, their proficiency in providing linguistic explanation of those solution processes, the accuracy of their feelings of selfefficacy, and their ability to assess their own work. To understand more clearly the root of differences between boys and girls, researchers must pay as much attention to how boys and girls think and the processes they use to solve problems as they do to achievement scores. As literacy becomes more integrated into mathematics instruction and assessment, the question arises as to whether this change will have the same impact on girls and boys. Whereas boys have historically outperformed girls in highlevel mathematics involving multistep problem solving and reasoning (National Assessment of Educational Progress [NAEP], 1986), the recent trend of boys underachievement in reading (Rowe, 8 PAGE 24 2000; Young & Brozo, 2001) may be a precursor to a reversal of this tendency. In addition, the increased use of contextualized problems calls for greater use of selfregulatory skills. How boys and girls respond to problem situations in their feelings of selfefficacy and their ability to selfassess their work may serve to further exacerbate gender differences. In the current educational environment, students performance on highstakes testing is used to make important decisions about their academic future. Because mathematical problem solving is an important element of these highstakes tests, researchers must work to understand how gender, reading ability, and mathematics ability impact students mathematical problem solving behavior. Purpose of the Study The purpose of this study was to examine how gender, reading ability, and mathematics ability differentially moderated students mathematical problemsolving processes, linguistic explanations of those solution processes, achievement on a mathematical problemsolving test, selfefficacy on a selfreported rating scale, and selfassessment on a selfreported rating scale. Previous research has shown that boys and girls think differently when solving problems, but because most of this work has been conducted with first, second, and thirdgrade children, little is known about how these differences may manifest themselves in complex problemsolving situations in the upper elementary grades. The problems that were used in this study were situated in real world contexts, and included both a range of difficulty and a variety of 9 PAGE 25 possible operations. In keeping with current findings related to the relationship between reading and problem solving, this study differentiated reading ability as an independent variable. Because reform efforts call for students to explain or justify their thinking in writing, all test questions instructed students to provide a linguistic explanation of their work, and those explanations were holistically scored and analyzed. Boys and girls differ in their feelings of selfefficacy, and because selfefficacy has been shown to impact performance, students were provided with a means of recording their level of selfefficacy for every problem tested. Finally, as students selfregulatory behaviors are critical to problem solving, and their ability to assess their own work is critical to selfregulation, students were also provided with a means of assessing their performance for every problem tested. Together, the information garnered from these elements of the study provide a unique contribution to the body of knowledge about gender differences in problem solving. Research Questions Very little research has examined gender differences in students problem solving processes, performance, selfefficacy, selfassessment, and linguistic explanations, and the impact that their reading and mathematics ability have on these issues. For this reason, the research questions for this study were as follows: 1) To what extent are students choices of mathematical problemsolving processes related to gender? 10 PAGE 26 2) To what extent is childrens performance in mathematical problem solving related to gender, reading ability as measured by teacher ratings and Florida Comprehensive Assessment Test (FCAT) reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 3) To what extent is the quality of students linguistic explanations of their problemsolving processes, as measured by a holistic rubric, related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 4) To what extent is the relationship between students feelings of selfefficacy when commencing work on a mathematical problem, as measured by a selfreported rating scale, and their actual performance on a mathematical problem solving test as measured by a holistic scoring rubric related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 5) To what extent is students ability to assess their own performance on mathematical problem solving tasks as measured by a selfreported rating scale related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 11 PAGE 27 Significance of the Study The primary goal of mathematics instruction is to teach students to solve practical problems (NCTM 1989, 2000) that will help prepare them for full participation in society (Miller, 1993). Solving word problems is a difficult process because it entails not only mathematics skills, but also reading comprehension in the translation of words to symbols and then into corresponding operations (Neef, Nelles, Iwata, & Page, 2003). NCTM (2000) calls for equity in mathematics education with high expectations and strong support for all students: girls and boys. Yet, boys and girls have historically differed in their problemsolving performance, with current statistics showing that boys are beginning to lag behind girls in literacy skills. Additionally, the literacy skill of reading comprehension has proven to be a critical element of the problem solving process. For these reasons, teachers, teacher educators, and curriculum developers who understand the differential relationship between reading ability and boys and girls problemsolving processes will be better equipped to provide mathematics education that is relevant to todays world and equitable to both genders than teachers who do not understand this relationship. Also, because reformbased assessments and highstakes tests often require linguistic explanations of problem solving processes, it is important to examine whether potential differences in boys and girls ability to provide these explanations may have a differential relationship to their achievement on these tests when measured using a holistic scoring rubric. 12 PAGE 28 Selfefficacy and selfassessment have been proven to impact students selfregulatory behaviors and performance in mathematics (Hassmen & Hunt, 1994; Koivula, Hassmen, & Hunt, 2001; Maddux, 2002; Pajares, 1996), and boys and girls differ in both their feelings of selfefficacy and their ability to selfassess (Pajares & Miller, 1997; Pallier, 2003, van Kraayenoord & Paris, 1997; Zimmerman & MartinezPons, 1990). The implication is that teaching mathematics so that students learn mathematics is a much more complex issue than content alone. If indeed girls and boys performance is differentially tied to these elements of selfregulation, teachers of mathematics, mathematics teacher educators, and even the parents of mathematics students will be well served by any new information that helps elucidate the relationships between selfefficacy, selfassessment, gender, and ability. Limitations of the Study The following list is provided to acknowledge and clarify the limitations of this study that may impact the generalizability of the findings: 1. Number and Operations is the only mathematics strand tested through the 12item instrument that was developed for this study. If the instrument also included items related to algebra, geometry, measurement, and/or data analysis and probability, the study may net different results, and may be generalized to the field of mathematics education differently. 2. Participation in the study was voluntary, which may have resulted in the use of a biased sample of the population. However, because motivation is 13 PAGE 29 an important element in students willingness to respond carefully to all 12 extendedresponse items, the use of nonvolunteers could have feasibly falsified the results more than the use of volunteers. 3. To provide complete coverage of the Number and Operations strand through extended response questions would have resulted in an unreasonably long test. The researcher acknowledges that the 12items included in the test did not measure all possible dimensions of Number and Operations. 4. The time required to analyze data from the testing event precluded the possibility of immediately performing the targeted followup interviews of students. The time lapse between the actual testing event and the interviews may have limited students ability to recall what they were thinking during the problem solving process. 5. The smileyface Likert scales used by students to record their level of selfefficacy and selfassessment for each problem limited their ability to express a full range of thoughts regarding selfefficacy and selfassessment. 6. Complete FCAT scores were unavailable for fortyfour students in the sample, thereby reducing the statistical power of the study. 14 PAGE 30 Assumptions of the Study The following list serves to clarify the premises under which the study was conducted: 1. The researcher, in the process of test administration, did nothing to bias the results of the evaluation procedure. 2. Childrens responses to the 12item extended response test provided a sufficient level of information about students thinking to allow for an accurate categorization of their problemsolving processes. 3. The structure of the targeted followup interviews, allowing students to review their own work before making statements about what they were thinking during the problemsolving process, allowed for some conclusions to be drawn as to what students were thinking during the actual testing event. 4. The use of previous standardized test scores together with teacher evaluation to categorize students as high, middle, or low readers, and high, middle, or low mathematics students, provided a sufficient level of accuracy to draw conclusions about these groups. 5. The 12 items chosen for the test instrument from fourthgrade 1992, 1996, and 2003 National Assessment of Educational Progress (NAEP) released items and fifthgrade Florida Comprehensive Assessment Test (FCAT) released items were appropriate for testing students in the first half of their fifthgrade year. 15 PAGE 31 6. Although, based on the pilot study, the sample did consist of an exactly even number of boys and girls, there were sufficient numbers of each to be able to draw conclusions about their problemsolving performance, solution processes, selfefficacy, selfassessment, and linguistic explanations. Definitions The following is a list of the terms and operational definitions that were used throughout this study: Attribution: The reasons students give for their success or failure in an endeavor. Calibration: The degree to which students judgments of their capability reflect their actual competence (Pajares & Miller, 1997). Confidence bias: a systematic error of judgment made by individuals when they assess the correctness of their responses to questions relating to intellectual or perceptual problems (Pallier et al., 2002). Gender: Although some believe the dichotomization of gender to be artificial, and the nature of gender to be socially constructed, for the purposes of this study, gender is limited to the categories of male and female as determined strictly by genetics. Linguistic Explanations: The text provided by the student in an effort to explain her/his mathematical work on each problem. In the assessment process, symbols and numerals embedded in textual explanations were treated as text. 16 PAGE 32 Mathematical problem: A task for which the student has no prescribed or memorized rules or methods for solving (Hebert et al., 1996). Problemsolving performance: The level of accuracy and completeness of problemsolving processes and solutions as measured by the Florida Comprehensive Achievement Test (FCAT) 4point rubric. Problem solving processes: The steps taken to solve a problem as evidenced by students written work on the problemsolving test and through discussion during the targeted followup interviews. Selfassessment: A judgment of the success of ones performance on a task. In this regard, selfassessment might be considered to play an essential role in students progress towards autonomy (Laveault, Leblanc, & Leroux, 1999). For this study, students selfassessment is recorded by circling one of five faces on the smileyface Likert scale presented below the workspace for each question. Selfefficacy: A judgment of ones ability to perform a task within a specific domain (Bandura, 1997). For this study, students level of selfefficacy is recorded by circling one of five faces on the smileyface Likert scale presented after the problem text for each question. Selfefficacy expectations: A persons belief concerning his or her ability to perform a given task or behavior successfully. Selfregulated learning: The outcome of choosing to engage in selfdirected metacognitive, cognitive, affective, and behavioral processes and skills (McCombs & Marzano, 1990). 17 PAGE 33 Statistically significant: A given result has less than a five percent probability of being a function of chance (Ary, Jacobs, & Razavieh, 1996). Triadic theory of social cognition: Students efforts at selfregulation are determined by three factors: their personal processes, the environment, and their behavior (Bandura, 1986). Organization of the Manuscript This manuscript has been organized into five chapters. Chapter 1 identified the problem and places it in context for the study. The research questions, limitations, assumptions, and definitions were also included. Chapter 2 reviews the literature relevant to the research questions. Research strands include (a) gender differences in mathematics, (b) reading in problem solving, and gender differences in reading/literacy, (c) elements of selfregulation, specifically selfefficacy and selfassessment, and their relationship to mathematics, and (d) written communication in problem solving, including gender differences in writing. At the end of each major section of the literature review is a table that summarizes the research studies discussed in the section. Chapter 3 presents the methods that were used to conduct the study. The development process for the 12item extendedresponse test is described, as well as the procedures for acquiring research participants, administering the test, conducting the targeted followup interviews, and analyzing the data. Chapter 4 summarizes the findings of the study. The descriptive statistics and findings derived from inferential data analysis are reported. Chapter 5 presents the conclusions of the 18 PAGE 34 study, the resulting implications of the study results, and the recommendations for classroom practice and future research. 19 PAGE 35 CHAPTER 2 REVIEW OF LITERATURE The purpose of this study was to examine how gender, reading ability, and mathematics ability differentially moderated students mathematical problemsolving processes, linguistic explanations of those solution processes, achievement on a mathematical problemsolving test, selfefficacy on a selfreported rating scale, and selfassessment on a selfreported rating scale. This chapter has four research strands presented in four sections. The first section reviews gender differences in mathematics. The second section focuses on reading as it relates to problem solving, and examines gender differences in reading ability. The third section reviews selfregulation as a concept, how it relates to mathematics, and how selfefficacy and selfassessment, as elements of selfregulation, are related to mathematics. The fourth section provides an overview of written communication in problem solving, followed by a review of the literature on gender differences in writing. Gender Differences in Mathematics NCTM (2000), in its publication Principles and Standards for School Mathematics, supports the belief that all children, regardless of personal 20 PAGE 36 characteristics or background, should have opportunities to study and learn mathematics. Unfortunately, girls have historically not reached the same levels of achievement in mathematics as boys. Although girls are currently outpacing boys in reading, are taking more advanced courses in high school, and are reporting higher educational aspirations (American Association of University Women, 2002; Young & Brozo, 2001), they continue to lag behind boys in enrollment and achievement in advanced mathematics courses (Carr, Jessup, & Fuller, 1999). It is important that our educational system do everything in its power to ensure that girls and boys are equally prepared to meet the mathematical challenges they will face in the workplace and in society (Croom, 1997). The purpose of this section of the literature review is to examine research findings regarding gender differences in students mathematical lives, including the topics of performance, item format and content, strategy use, affect, and the broader context provided by crossnational studies. Through understanding the factors that generate gender differences, teacher educators will be equipped to incorporate genderspecific pedagogical principles into coursework for preservice teachers of mathematics. New teachers will in turn be prepared to provide instruction that allows both boys and girls to be mathematically successful. Although some of the studies of gender differences do not find statistically significant differences in achievement until the high school years, this review was conducted under the premise that students mathematical experiences in the earlier grades influence the development of the differences 21 PAGE 37 that are found in later grades. It is, therefore, important not only to examine research findings that convey the extent of gender differences in achievement throughout childrens school experience, but also to examine differences in nonachievement based factors that may serve to shape and/or exacerbate achievement differences. Gender Differences in Mathematical Performance Studies Focused on Students in the United States The study of gender differences in mathematical performance has captured a fair amount of attention in the literature since the early 1970s. In 1974, Fennema reviewed published studies and concluded that No significant differences between boys and girls mathematics achievement were found before boys and girls entered elementary school or during early elementary years. In upper elementary and early high school years significant differences were not always apparent. However, when significant differences did appear they were more apt to be in the boys favor when higherlevel cognitive tasks were being measured and in the girls favor when lowerlevel cognitive tasks were being measured (pp. 136137). No conclusions were made in this review about high school learners because of the limited research available on students of that age. Only a few years later though, Fennema and Carpenter (1981) reported that the 1978 NAEP results indicated that there were gender differences in performance at the high school level with males outperforming females, especially on high cognitivelevel tasks. 22 PAGE 38 In a metaanalysis of 100 studies conducted between 1967 and 1987, Hyde, Fennema, & Lamon (1990) concluded that there was a slight female superiority in elementary and middle school, a moderate male superiority in high school, and a larger male advantage in college and later adulthood. They stress that their findings in no way give credence to the global conclusion that boys excel in mathematics, because the overall differences were small and decreasing. In fact, the magnitude of the differences found in studies conducted in 1974 or after was less than half that found in earlier studies. They also caution against making any general statements about gender differences in performance because the pattern of differences that they found was complex, with gender differences varying according to task and age. Whereas girls are superior to boys in computation, there do not appear to be any gender differences in understanding mathematics concepts. Although boys and girls differ in their problem solving abilities, the differences that favor boys do not appear until the high school years. Although Park, Bauer, and Sullivans (1998) study viewed a much narrower segment of the population than the metaanalysis, namely highability students, their findings corroborate and bring current some of what Hyde et al. (1990) found in their metaanalysis. Just as in the Hyde et al. metaanalysis, girls outperformed boys in mathematics computation, and boys outperformed girls on problems requiring application of concepts. In contrast to the metaanalytic findings, they found that boys scores on items testing mathematics concepts were higher than girls. Thompson, Strackbein, and Williams (1992) 23 PAGE 39 sought to determine whether specific intervention efforts could alter gender differences in achievement. In an experimental study using intervention to improve students mental computation, they found that while all students benefited from the eightweek training on mental computation, the intervention had a much greater impact on girls scores than boys in both computation and problem solving. This study shows the potential of intervention efforts using appropriate instructional strategies for achieving balance in boys and girls mathematics performance, thereby providing incentive for further study. Using data from the National Longitudinal Survey of Youth (NLSY) and the National Educational Longitudinal Study (NELS), Leahey and Guo (2001) found small but significant differences in favor of girls until age 11, at which time boys exhibited a higher acceleration rate in scores. Across all age groups, the standard deviation of boys scores exceeded that of girls. They concluded that the higher representation of boys at the extreme top of the mathematics ability scale, and the fact that boys exhibited a higher acceleration rate, may have contributed to the higher representation of males in mathematicsrelated careers. Ais (2002) fouryear longitudinal study confirmed these findings regarding boys growth rates, but only among lowability students. At the beginning of the study, the seventhgrade students were placed in highand lowability groups based on previous test scores. Gender differences in mathematical growth varied according to initial ability status. Within the low group, girls started higher than boys, but boys improved at a slightly faster rate. There were no differences in initial status or growth rate between boys and girls in the high ability group. 24 PAGE 40 The methodology issue that causes the greatest concern in this study is the use of students scores from the seventh grade administration of the test as both the sole grouping variable and the outcome variable. This method of grouping may have resulted in the conflict between the findings of this study related to the high ability group and those of Leahey and Guo. Even given the differences that exist between boys and girls in mathematics, the representation of women in mathematics fields of study at the university level has been growing and is nearing parity. The percent of mathematics degrees being awarded to women has increased from 22.6 percent in 1950 to 47.2 percent in 1991 (Chipman, 1996). Chipman (1996) claims there is no other field of study which comes so close to proportional representation of men and women (p. 285). Although Leahey and Guo (2001) continue to be concerned about the underrepresentation of women in mathematicsrelated careers, the increase of women receiving mathematics degrees may soon bring an end to that inequity. CrossNational Studies The previous section focused strictly on studies conducted in the United States, but a review of gender differences in mathematics performance would not be complete without situating the topic in the broader context of results from crossnational studies. The Second International Mathematics Study (SIMS), under the sponsorship of the International Association for the Evaluation of Educational Achievement, was completed in 1982 with eighthand twelfthgrade students in nineteen countries. Among eighthgraders, boys performed better 25 PAGE 41 than girls in seven countries, there were no significant differences in eight countries, and in four countries girls outperformed boys (Baker & Jones, 1993). Further study of the gender stratification of opportunity in each of the countries indicated that as females gain more access to advanced training and the workplace, gender differences in mathematics performance decrease. Also, the occupational status of women in each country is a significant predictor of the size of the gender differences in test scores, with girls in countries with more opportunity for women scoring higher than girls in countries that offer less opportunity for women. Baker and Jones (1993) then compared the SIMS data to the data gathered in the First International Mathematics Study (FIMS), conducted in 1964, to see if gender differences in mathematical performance had declined over time. Nine countries participated in both studies, and in eight of those countries, the size of the superiority of boys decreased. In 1964, boys scored significantly higher than girls in eight of the nine countries. By 1982, four of these countries dropped towards parity between the sexes; in two additional countries, girls actually outperformed boys. These results indicate that across the world, gender differences in mathematics performance are decreasing. Also, because test scores were so closely tied to variations in womens access to higher education and the labor market, the data provide serious challenge to the notion that gender differences in mathematics have a biological basis. This finding corroborates the work of Walkerdine (1998) who rejects the notion that one gender is genetically predisposed to perform better in mathematics, attributing 26 PAGE 42 observed gender differences to the effects of environment and sexrole stereotyping over time. Ethington (1992) conducted a pathanalysis using a subsample of United States students from the SIMS data to determine whether there were gender differences in the factors that impact performance. Not surprisingly, prior achievement had the greatest influence on current achievement for both males and females. The personal value placed on mathematics was the only other variable that impacted male performance. In contrast, females were influenced by family help, their perception of gender stereotyping in mathematics, and their perception of the difficulty of mathematics. The direction of the influence of family help on performance was surprising. Other things being equal, females who are more independent and receive less help from their families, who do not stereotype mathematics as a male domain, and who view mathematics as less difficult are more likely to have higher levels of achievement. Another unanticipated result was that socioeconomic status, perception of parents attitudes, goals, and expectations for success did not directly impact performance for either gender. Fierros (1999) examined gender differences in mathematics achievement using data from the Third International Mathematics and Science Study (TIMSS). The data was gathered during the 19941995 school year at five grade levels in forty countries with over half a million students. Across countries at the eighthgrade level, there were few significant differences in mean achievement by gender, but differences that did exist tended to favor males. At the twelfthgrade 27 PAGE 43 level, males did significantly better in measures of mathematical literacy (applying mathematics to everyday problems), and advanced mathematics. In the United States sample, there were no significant differences between eighthor twelfthgrade males and females in any of the ability groupings. This contrasts with the international data in which highability males significantly outperformed highability females at both grade levels. The Program for International Student Assessment (PISA) is a new system of international assessments sponsored by the Organization for Economic Cooperation and Development aimed at understanding 15yearolds abilities in reading literacy, mathematics literacy, and science literacy (Lemke et al., 2001). The assessment, first conducted in 2000, is scheduled to be repeated every three years. The results of the first test administration indicated that boys outperformed girls in mathematics literacy in 16 of the 31 countries that participated in the mathematics portion of the assessment. The United States was among the countries for which the differences were not statistically significant. In summary, although gender differences in mathematics achievement continue to exist in some countries, the balance is shifting away from male domination. In countries where differences remain, they are most pronounced among highability students (Leder, 1990). The primary cause for these achievement differences is closely tied to womens lack of access to higher education and the labor market. In countries where women have equal opportunity, such as the United States, differences in achievement that existed at 28 PAGE 44 the time of the FIMS are no longer significant. Although males and females in this country are now achieving similar scores on international assessments, gendered performance on the primary predictors of that achievement continue to differ. Gender Differences in Test Item Performance One indication that mathematical reform is taking hold is the shift away from assessments that are purely computationbased using multiple choice questions towards assessments that include openended problemsolving tasks situated in realworld contexts (National Assessment Governing Board, 1995; Willingham & Cole, 1997). This shift raises questions about how test item format and content may differentially impact boys and girls performance and attitudes. Different skills are required for multiplechoice and constructed response formats, and these may differ by gender. For example, some studies have shown that girls have higher verbal fluency than boys (Halpern, 1992), giving girls an advantage over boys on constructed response tasks that require writing. This advantage has proven to impact test scores on the advanced placement U. S. history test, with girls consistently outscoring boys on constructedresponse items, but with no difference in scores on multiplechoice items (Breland, Danos, Kahn, Kubota, & Bonner, 1994). Item Format How this issue impacts the content area of mathematics has received insufficient attention in the literature. Unfortunately, for the few studies that have dealt with gender differences in performance related to mathematics test item 29 PAGE 45 format and content, the findings do not appear to be stable across grade levels, nor are all research results consistent. Using data from a Swedish sample of students from TIMSS, Webster and Henriksson (2000) found that middleschool aged females outperformed their male counterparts on both multiple choice and openended items. Although it was hypothesized that openended items requiring communication would provide an advantage for girls, the gap between boys and girls was actually narrower for openended items than for multiplechoice items. Contrary to these results, Fierros (1999), using United States data from TIMSS, found no significant gender differences for multiplechoice, shortanswer, or constructedresponse items for eighthgrade students, but twelfthgrade males outperformed females on the constructedresponse items that are related to verbal fluency. These contradictory findings point to the need for further research. Wilson and Zhang (1998) found similar results and concluded that boys significantly outperformed girls on constructedresponse items at two of four grade levels on a statewide assessment. On multiplechoice items, boys outscored girls in grades 3 and 8, but there were no significant differences in grades 5 or 10. As a followup to the 1998 study, Zhang, Wilson, and Manon (1999) delved deeper into the solution processes used by boys and girls in constructedresponse questions. Their assessment included two extendedresponse tasks. The egg task challenged students to determine how many egg cartons would be required to hold the eggs that two children had colored. The jellybean task was 30 PAGE 46 more difficult, and required students to estimate the total number of jellybeans in a jar, given the number of small cups the jar of beans could fill and the count of beans in a few of the cups. They found no significant gender differences in the level of problemsolving sophistication or performance with the jellybean task, but with the egg task, more boys than girls used the most sophisticated approach, yet girls received higher scores. Pomplun and Capps (1999) compared students holistic and objective scores on questions requiring communication versus those requiring either reasoning or problem solving but not communication. At the seventhand tenthgrade levels, girls received higher holistic scores than boys on the constructedresponse items requiring communication, even though boys provided more correct answers. This is due to the fact that girls created more figures and provided more mathematics reasoning to report their solution procedures, and the holistic scoring rubric credited them for this effort. In the seventh grade sample, boys received higher holistic and objective scores on reasoning questions, but girls outscored boys in problem solving. There were minimal gender differences at the tenth grade level for these items, except that boys received higher objective scores on questions requiring reasoning but not communication. Lane, Wang, and Magone (1996) studied gender differences on constructedresponse items that asked students to show their solution strategies and/or explain their reasoning. They found that the two tasks that favored males included a figure in the problem situation, and the four tasks that favored females 31 PAGE 47 did not include a figure. Girls provided more complete reasoning for their answers and fewer missing responses. As with the Pomplun and Capps (1999) study, male students were at a disadvantage in the holistic scoring due to the fact that they were not as complete in showing their solution processes and providing explanations. By combining the results from Pomplun and Capps (1999), and Lane et al. (1996), it might be concluded that girls are more independent problem solvers than boys. Boys performed better when they were provided with a diagram or figure, yet girls were more likely to create their own figures. Three studies compared how boys and girls solve problems as measured against an established problemsolving heuristic. McCoy (1994) found no significant gender differences in the extent to which students employed the four steps of Polyas problemsolving process. Zambo and Follman (1994) first examined how middle school girls and boys would perform on questions presented on an algorithmictype worksheet outlined with nine problemsolving steps. They found a small but significant difference in scores in favor of girls, and hypothesized that the algorithmic nature of the worksheet was more useful to girls than to boys. In a followup study, Zambo & Hess (1996) compared girls and boys performance on two forms of the test, one unformatted, and one formatted with the nine problemsolving steps. Although there were no significant gender differences on either form of the test, girls who were given the unformatted test before the formatted test improved more than boys. Conversely, girls who were given the formatted test before the unformatted test declined in performance compared to boys. 32 PAGE 48 Item Content In regards to item content, two studies examined whether gender labeling and content could explain gender differences in performance. Walsh, Hickey, and Duffy (1999) manipulated the gender labeling (female character, male character, or gender neutral) to see whether this would differentially impact the performance of middle school and university level males and females. Although overall scores were higher on malelabeled items, the labeling did not result in any significant gender differences at either age level. The only significant difference was that, at the college level, women scored lower than men when they were told the test had previously shown gender differences, but scored the same as men when told the test was merely comparing Canadian and American students. Chipman, Marshall, and Scott (1991) experienced similar results with undergraduate students. They found no gender differences on feminine and neutrally worded questions, but a small significant difference in favor of males on questions with masculine characters. Therefore, it appears that boys are more susceptible to performing differently on problems based on gendered wording, and girls are more generalized in their thinking and are not as affected by gendered wording. In light of the apparent advantage girls have on items requiring linguistic explanation, more research is needed into the role that gender differences in verbal fluency play in the holistic scoring of constructedresponse items. It must also be noted that because girls and boys perform differently depending on item format, tests that contain only one type of problem may not provide a fair 33 PAGE 49 assessment of students capabilities. Because the only study including elementary aged participants (Wilson & Zhang, 1998) showed that gender differences in performance according to item format develop over time, more research is needed with young children to attempt to understand the roots of the differences that emerge in the middle school years. The limited and contradictory findings from research on girls and boys performance related to the gendered wording of problems also point to the need for further study. Gender Differences in Strategy Use Several researchers have devoted themselves to determining what, if any, gender differences exist in young childrens problemsolving strategies, and to what extent these differences may presage performance differences that ultimately emerge in middle and high school. If, in fact, significant and consistent differences emerge, they may indicate the need for a genderdifferentiated curriculum. For the purposes of this discussion, strategies are defined as the plans, methods, or maneuvers the student utilizes during the problem solving process. Fennema et al. (1998) investigated gender differences in problem solving strategies used by students as they progressed from first through third grade. Their interview data revealed no significant differences in the number of correct answers for grades one and two, but in third grade, boys solved significantly more complex problems than girls. The most important finding was that there were significant gender differences in strategy use for all three years of the study. Girls tended to use more concrete strategies such as modeling with manipulatives or counting, whereas boys used more abstract strategies such as 34 PAGE 50 invented algorithms. Invented algorithms are those that the child constructs, without specific instruction, using his/her number sense and understanding of the problem situation (Van de Walle, 2001). Carr, Jessup, and Fuller (1999) observed similar results with first grade students. Boys correctly used retrieval strategies, pulling information from memory, more than girls. Girls relied more on overt strategies such as counting with manipulatives or fingers. Their further analysis of how parents and teachers may contribute to these differences revealed that both teachers and parents were more likely to direct boys to use retrieval strategies. Carr and Davis (2001) followed up on their earlier work by examining whether the differences in firstgrade boys and girls strategy use was more a reflection of ability or preference. When given free choice, boys and girls were equally able to solve the problems correctly, but girls were more likely than boys to use manipulatives, and tended to use the most elementary strategies for addition and subtraction, representing all of the items being counted and counting each item separately. When the researchers controlled strategy, requiring students to use retrieval or manipulatives on specific problems, boys outperformed girls. Boys showed that they were equally able to use retrieval strategies and manipulatives, while girls were less able to use retrieval strategies. Therefore, early gender differences in generalized strategy use appear to reflect differences in both ability and preference. Gallagher and DeLisi (1994) found that gender differences in strategy use continue through high school, even in testing situations that reflect no overall 35 PAGE 51 difference in scores. Using a thinkaloud protocol with the mathematics portion of the scholastic aptitude test with highability students, they found that females did better on conventional problems, but males did better on unconventional problems. Females relied more heavily on conventional strategies that were taught in the classroom, whereas males were more likely to use unconventional strategies. These findings demonstrate that gender differences in strategy use exhibited as early as first grade appear to remain consistent throughout high school. Different results were found in a pair of followup studies completed by Gallagher et al. (2000). In a small study of highability students, females were more successful with conventional problems than with unconventional items, yet males performance did not vary with problem type. Conventional problems are routine textbook problems that can be answered by algorithmic methods; unconventional problems require an unusual use of a familiar algorithm, estimation, or insight. With freeresponse items, females did better on unconventional problems compared to conventional ones, but males did better with conventional items than they did with unconventional ones. In comparing the scores of males and females, males outscored females on conventional items, but there was no significant difference in scores on unconventional items. In a larger study with students of varied ability, Gallagher et al. found no interaction between gender and item format or problem type. Additional work is needed to understand why differences found in the small sample were not replicated in the larger sample, and why performance according to problem type 36 PAGE 52 (conventional versus unconventional) was not consistent across item format (multiplechoice versus freeresponse). Gender Differences in Affect The impact of affect on students mathematical lives is an area of study that has drawn a great deal of research attention due to the multitude of affective differences between boys and girls in this content area, together with the heavy influence of affective factors on childrens problemsolving performance (McLeod, 1989). The most studied element of affect is student attributions: perceptions of the relationship between actions and outcomes. A study conducted in the Netherlands (Seegers & Boekaerts, 1996) found that upperelementary girls showed a stronger tendency than boys to attribute failure to lack of ability, a feeling that was increased by a fear of making mistakes. Girls also had less favorable beliefs about their mathematical ability than boys. Boys who were highly taskoriented were less likely to attribute failure to lack of capacity, but the opposite was true for highly taskoriented girls. Boys were more confident than girls about their mathematical abilities, regardless of their actual achievement. Another article on the same study (Vermeer, Boekaerts, & Seegers, 2000) additionally reported that girls showed higher persistence than boys during applied problem solving, indicating that the previously held belief of girls learned helplessness was not supported by the sample. Stipek and Gralinskis (1991) study of thirdand eighthgrade students showed that boys attributed a good outcome to ability significantly more often than did girls, whereas girls attributed a poor outcome to their lack of ability 37 PAGE 53 significantly more often than boys. Girls were also less likely to believe that hard work would lead to success, and eighthgrade girls were more likely to hold this belief than thirdgrade girls. In thirdgrade, boys and girls had similar expectations for success, but by eighth grade, boys had become more optimistic than girls. Overall, girls rated their ability lower and expected to do less well than boys on tests. Most of these findings are corroborated by Forgasz & Leders (1996) study of seventhgrade students. They found that, compared to males, females attributed success to ability to a lesser extent, attributed success to effort to a greater extent, attributed failure to task difficulty to a greater extent, and rated their achievement levels lower. Similar results were obtained with thirdgrade students in that girls attributed their success to effort more than boys (Michaels, 2002), and that boys attributed their success to ability more than girls (Farrand, 2002). A longitudinal study (Tartre & Fennema, 1995) that assessed a random sample of students as they progressed through the 6 th 8 th 10 th and 12 th grades found no significant gender difference in achievement, but discovered that males stereotyped mathematics as a male domain significantly more than females; although the stereotype was not related to achievement, it increased during each year of the study. For females in the middle grades, less stereotyping was positively correlated to mathematics achievement. Further analysis regarding the possible relationship between male stereotyping and the existing high proportion of males in advanced mathematics classes is in order. 38 PAGE 54 As opposed to many other aspects of gender differences in mathematics, the research findings on affective differences between the sexes are fairly consistent. In general, girls are less likely than boys to attribute success to ability, but more likely to blame lack of success on low ability. Girls are less mathematically confident than boys, but their achievement is enhanced when they fail to stereotype mathematics as a male domain. Boys see mathematics as their domain more strongly as the years progress, regardless of their achievement. Whether this view is a causal factor in the waning numbers of girls in advanced classes, or simply a reflection of it, is yet to be seen. Summary of Gender Differences in Mathematics The research studies highlighted in this section on gender differences in mathematics are summarized in Table 1. In the United States, girls tend to do as well or better in the elementary years, after which time boys outshine girls (Ai, 2002; Hyde et al., 1990; Leahey & Guo, 2001; Park et al., 1998). Crossnational studies show a gradual decrease in gender differences over time, with males still outperforming females, especially in countries where women have less opportunity (Baker & Jones, 1993; Fierros, 1999; Lemke et al., 2001). Regarding item format and content, the gendered wording of questions did not result in gender differences in performance (Chipman et al., 1991; Walsh et al., 1999), but differences existed according to item format in most studies (Fierros, 1999; Lane et al., 1996; Pomplun & Capps, 1999; Webster & Henriksson, 2000; Wilson & Zhang, 1998; Zambo & Follman, 1994; Zambo & Hess, 1996; Zhang et al., 39 PAGE 55 Table 1 Results of Mathematical Gender Differences Studies Grouped by Topic Author/Date Participants Results Ai, 2002 Hyde et al., 1990 Leahey & Guo, 2001 Park et al., 1998 Thompson et al., 1992 Baker & Jones, 1993 Ethington, 1992 Fierros 1999 Lemke, 2001 3,116 7 th graders Metaanalysis 100 studies 3,985,682 students NLSY 4,126 children ages 413; NELS 9,787 children ages 1418. 4661 4 th graders (2421 girls, 2240 boys); 2541 6 th graders (1,268 girls, 1,273 boys) 86 4 th graders (47 girls, 48 boys) 77,000+ 8 th graders in 19 countries FIMS and SIMS data 746 U.S. 8 th graders from SIMS data 8 th & 12 th graders in 34 countries TIMSS data 15 year olds in 32 countries PISA data Lowability boys started lower than girls but improved faster. No difference in highability boys and girls. Slight female advantage in elementary and middle school, with a moderate male advantage in high school and beyond. Girls outscore boys until age 11, then boys growth exceeds girls. Boys standard deviation in scores is higher than girls at all ages. Girls outscored boys in computation. Boys outscored girls on problems of mathematics applications. Girls benefited more than boys in problem solving and performance from intervention training in mental computation. Performance Differences in the United States Gender differences are decreasing over time, especially in countries where women have opportunity. Prior achievement and value impact males scores. Prior achievement, family help, stereotyping, and perception of difficulty impact females scores. 12 th grade males did better on math literacy and advanced math. Highability males outperformed high ability females. Boys outscored girls in mathematics literacy in 29 of 31 countries. The gender differences were not significant in the U. S. Performance Differences in CrossNational Studies 40 PAGE 56 Table 1 (Continued) Author/Date Participants Results Table 1 (Continued) 41 Chipman et al., 1991 Fierros, 1999 Lane et al., 1996 McCoy, 1994 Pomplun & Capps, 1999 Walsh et al., 1999 Webster & Henriksson, 2000 Wilson & Zhang, 1998 Zambo & Follman, 1994 Zambo & Hess, 1996 Zhang et al., 1999 256 undergraduates (128 females, 128 males) 8 th & 12 th graders in 34 countries TIMSS data 460 6 th and 7 th graders (250 girls, 210 boys) 90 2 nd and 3 rd graders (44 girls, 46 boys) 438 7 th and 10 th graders (219 girls, 219 boys) 63 7 th and 8 th graders (27 girls, 36 boys) and 174 undergrads (96 women, 78 men) 8,851 Swedish 6 th 8 th grade students (4,334 girls, 4,517 boys) 29,809 students in the 3 rd 5 th 8 th and 10 th grades 302 6 th and 8 th graders (149 girls, 153 boys) 155 6 th graders (77 girls, 78 boys) 300 3 rd graders (150 girls, 150 boys) Gender Differences in Test Item Performance There were no gender differences in feminine or neutrally worded questions. Men did slightly better than women on masculine worded questions. 12 th grade males scored higher than females on constructedresponse items. No significant differences were found on other items at the 8 th or 12 th grade levels. Boys scored higher on constructedresponse items that included a figure, girls did better on items with no figure. Girls showed their work more than boys. No significant gender differences in students use of the steps of Polyas problem solving process. Girls received higher holistic scores on items requiring communication even though boys provided more correct answers. No gender differences in questions worded with masculine, feminine or neutral characters. Men outscored women when told the test had previously found gender differences. Girls outperformed boys on multiple choice and openended items. On constructed response items, boys scored higher at grades 5, 8, and 10. On multiplechoice items, boys scored higher at grades 3 and 8. No significant differences for other grades. Girls performed better on a test formatted with nine problemsolving steps. Girls benefited from the formatted test, but declined when moving from a formatted to an unformatted test. On one of two constructedresponse items, girls outscored boys even though boys used a more sophisticated approach. PAGE 57 Table 1 (Continued) Author/Date Participants Results Gallagher & DeLisi, 1994). In studies focusing on attributions, the results were 42 Carr et al., 1999 Carr & Davis, 2001 Fennema et al., 1998 Gallagher & DeLisi, 1994 Gallagher et al., 2000 Farrand, 2002 Forgasz & Leder, 1996 Michaels, 2002 Seegers & Boekaerts, 1996 Stipek & Gralinski, 1991 Tarte & Fennema, 1995 92 1 st graders (46 girls, 46 boys) 84 1 st graders (42 girls, 42 boys) 82 children for 3 years 1 st through 3 rd grades (38 girls, 44 boys) 47 highability high school students (22 girls, 25 boys) Study 1: 28 highability high school students (14 girls, 14 boys). Study 2: 154 high school students (94 girls, 60 boys) 62 3 rd graders 782 7 th graders (396 girls, 386 boys) 109 3 rd graders 186 1112 year old students (96 girls, 90 boys) 194 3 rd graders (94 girls, 100 boys); 279 8th grade (143 girls, 136 boys) 60 students (32 girls, 28 boys) data collected at grades 6, 8, 10, and 12. Boys used more retrieval strategies, girls used more overt strategies. When strategy use is controlled, boys outscore girls because they are more proficient with retrieval strategies. Few differences in performance, but significant differences in strategy use for 1 st 3 rd grades. Girls used more concrete strategies, boys used more abstract strategies. Girls used more conventional strategies and performed better than boys on conventional problems. Boys used more unconventional strategies and did better than girls on unconventional problems. Study 1: with multiplechoice, girls do better with conventional items, boys score the same on both problem types. With freeresponse, girls better with unconventional, boys better with conventional. Study 2: no gender interaction with problem type (conventional vs. unconventional) or item format (multiplechoice or freeresponse). Boys attribute success to ability more than girls. Girls more likely than boys to attribute failure to lack of ability and success to effort. Girls attribute success to effort more than boys. Boys are more confident than girls; girls are more likely to attribute failure to lack of ability. Boys attribute success to high ability. Girls attribute failure to lack of ability. Boys stereotype mathematics as a male domain significantly more than girls. Gender Differences in Affect Gender Differences in Strate gy Use PAGE 58 1999). Although few differences in achievement exist in the early school years, young children exhibit strong gender differences in their strategy use during problem solving (Carr at al., 1999; Carr & Davis, 2001; Fennema et al., 1998). Gender differences in affect were fairly consistent across studies, with boys attributing success more to ability and exhibiting more confidence than girls. Girls, on the other hand, tend to attribute their success to effort and their failure to lack of ability (Farrand, 2002; Forgasz & Leder, 1996; Michaels, 2002; Seegers & Boekaerts, 1996; Stipek & Gralinski, 1991; Tarte & Fennema, 1995). Reading in Problem Solving The Relationship Between Reading and Mathematics Although mathematical language differentiates itself in many key areas from ordinary language, there are strong similarities between literacy and mathematical literacy. Just as the intricacy and value of mathematical language is just now being recognized, until early in the twentieth century, reading was considered a simple skill of connecting individual words to understand text. Thorndike (1917) was one of the first educational psychologists to delve deeper into the process of reading comprehension. He acknowledged the similarities between reading comprehension and mathematical problem solving: Understanding a paragraph is like solving a problem in mathematics. It consists in selecting the right elements of the situation and putting them together in the right relations, and also with the right amount of weight or influence or force for each. The mind is assailed as it were by every word 43 PAGE 59 in the paragraph. It must select, repress, soften, emphasize, correlate and organize, all under the influence of the right mental set or purpose or demand. (pp. 327328) The language of mathematics shares many characteristics with other languages: symbols and rules are uniform and consistent, understanding increases with practice, novice learners need translations and interpretations, meaning is influenced by symbol order, and communication requires encoding and decoding (Wakefield, 2000). When we miss opportunities to teach mathematics as a language, our students may fail to notice the underlying concepts that lead to understanding (Adams, 2003). Empowering students mathematically requires helping them make the connection between the language used to communicate mathematics and their construction of mathematical knowledge (Earp & Tanner, 1980; Miller, 1993). The language of mathematics interacts with everyday language in problem solving. Rarely do story problems include the mathematical terms add, multiply, divide, or subtract, but the language used in story problems to describe the problem situation implies these mathematical terms and guides the reader in the solution process (Capps & Pickreign, 1993; Hegarty, Mayer, & Monk, 1995). This connection between problem solving and mathematical language must be kept in mind when assessing students problemsolving performance. If a word problem is solved incorrectly, there is no unambiguous way to conclude that the error is due to a lack of mathematical proficiency rather than a failure in reading comprehension (Clarkson & Williams, 1994). 44 PAGE 60 We again see the connection between reading and problem solving in various problem solving heuristics. Older heuristics (Polya, 1957; Schoenfeld, 1983) used either the word read, or the word understand in the first step, but did not focus on the connection between the two. In contrast, Krulik and Rudnicks (1996) more current problemsolving heuristic begins with the step read and think. This heuristic highlights the inseparable link between reading the problem and the initiation of critical thinking. During this stage, the reader translates the problem into her/his own language, examines and evaluates the facts of the problem, determines what question is being asked, and connects the parts of the problem to form a visualization of the situation. Challenges of Reading Mathematical Text Complexity of Word Problems Children are more likely to have languagerelated difficulties in the mathematics classroom than in any other content area (MacGregor, 1990). Although most students enter the mathematics classroom knowing how to read, few have developed the strategies necessary to use their reading skills to learn mathematics content (Barton, Heidema, & Jordan, 2002). Even those students who are successful readers in other content areas and have strong computational skills may struggle with reading mathematics text and working word problems (Manning, 1999; Muth, 1997). Word problems are unique in that they require the use of two language systems simultaneously; students must read while at the same time thinking abstractly about mathematics (Manning, 1999). 45 PAGE 61 Students often struggle when they are asked to read and solve word problems, even when they have no difficulty solving the problem when it is translated for them (McIntosh, 1997; Schell, 1982). Reading the words in problems requires much more than just decoding: students must analyze the information in the problem; translate and use symbols; identify necessary, insufficient, or extraneous information in the text; and employ a strategy that will aid them in solving the problem (Schell, 1982). These challenges are confirmed by the fact that children perform ten to thirty percent worse on arithmetic word problems than on comparable problems presented in numerical format (Kiplinger, Haug, & Abedi, 2000). Another difficulty stems from the fact that reading and mathematics skills do not develop at the same rate in young children. For problems presented in numerical format with the operation signs included, children who understand the algorithm can go right to work. When the same problem is embedded in words and sentences, students must first comprehend the language of the text before they can employ an appropriate algorithm (Choi & Hannafin, 1997; Fuentes, 1998). Comparison to Other Forms of Text Learning to read story problems is a much different task than learning to read narrative stories or expository text. Reading mathematical text requires highlevel thinking and comprehension skills beyond much of what is necessary for reading text in other content areas (Reehm & Long, 1996). The language of expository text is usually straightforward and serves to explain or elaborate a 46 PAGE 62 point. The readers goal is to sift and filter the information to determine the most essential elements. In contrast, the language of story problems often hides, implies or assumes the mathematical concepts (Fuentes, 1998). The readers goal must be to expand the compact text and unpack meaning. Every word and abstract symbol must be read and fully understood, there are rarely context clues to point to the meaning of unfamiliar words (Fuentes, 1998; Hunsader 2002a), and many of the vocabulary words are either unique to mathematics or have different meanings than the same words in everyday usage (Fuentes, 1998). Story problems may be much more similar to narrative text than to expository text in that they tell a story. However, very little is known about how the narrative text structure functions in the problem solving process. Research into problem wording (Bernardo, 1999; Fan et al., 1994; Leong & Jerred, 2001) makes clear the existence of a relationship between text structure and childrens ability to solve problems, but the nature of the relationship is relatively unexplored territory. Reading story problems also requires comprehension strategies that may not be used in other content areas. Comprehending mathematical text requires an understanding of not just the words, but also the signs, symbols, and graphics. Mathematical information is often presented in unfamiliar ways, not only right to left, but also left to right (number lines and long division), top to bottom (tables and fractions), bottom to top (vertical multiplication) and even diagonally (graphs) (Adams, 2003; Barton, Heidema, & Jordan, 2002). In addition, whereas the authors main idea in a language arts passage is often the 47 PAGE 63 first sentence, the main idea of a story problem is often not presented until the end. Readers have to sift through all of the problem details before knowing the point of the problem (Barton et al.). For these reasons, mathematics is widely acknowledged as the most difficult content area reading material, with more concepts per word, per sentence, and per paragraph than any other area (Schell, 1982, p. 544). Yet teachers of mathematics often report that they are the least prepared to teach students how to read to learn. Im a math (science) teacher, I wasnt trained to teach reading (Barton et al., p. 24). Just as with reading other forms of text, comprehension of mathematical story problems is aided when students can create visual pictures in their heads about the problem situation, interact with the problem by calling on prior knowledge and experience, and bring meaning to the text instead of expecting meaning to be inherent in the words (Tovani, 2000). Vocabulary Mathematical vocabulary also complicates the ease of reading mathematical text (Jones, 2001; Thompson & Rubenstein, 2000). Some mathematical terms are shared by everyday English and may or may not have the same meaning (Thompson & Rubenstein). Mathematical terms such as quotient, divisor, and rhombus name concepts that have no unique realworld representation (Miller, 1993). Because these terms are found only in mathematics, students have no opportunity to acquire or practice the use of the terms outside the mathematics classroom (Capps & Pickreign, 1993; Thompson & Rubenstein; Usiskin, 1996). Some words have more than one mathematical 48 PAGE 64 meaning, and yet others such as volume, yard, count, product, and range, have different meanings in mathematics than in everyday language (Thompson & Rubenstein; Wardrop, 1993). Still other mathematics words sound like English words but are homophones and have vastly different meanings than their counterparts. Examples of this include plane vs. plain, one vs. won, sum vs. some, and whole vs. hole (Adams, 2003; Thompson & Rubenstein). Educators must remember that mathematical understanding and vocabulary learning are interdependent, and that instruction that does not address vocabulary is incomplete (Thompson & Rubenstein). Symbols Mathematical vocabulary and mathematical symbols go handinhand. To understand mathematical vocabulary, students must learn to recognize symbols, connect the vocabulary word that names each symbol, remember that the word and the symbol have the same meaning and are pronounced the same, and finally, understand the concept behind the word and the symbol (Reehm & Long, 1996). It is the relationship between these elements that allows students to make sense of the problem context and translate the text of the problem into the symbols used to solve the problem (Adams, 2003). Symbols embedded in mathematical text communicate meaning that must be interpreted during the reading process (Adams, 2003). This is no small task because mathematical symbols such as + x < and % are like a foreign language to children. As an added complication, there is no phonemegrapheme relationship to aid in decoding because, linguistically, the symbols are 49 PAGE 65 ideographs (Reehm & Long, 1996). The context of mathematical language and symbols can also have a great impact on comprehension and understanding. For example, the numeral is interpreted to have a very different meaning in each of the following contexts: 2, 24, 21 10 2 32 (Capps & Pickreign, 1993). Therefore, the meaning of each symbol is largely determined by its context, and it must be instantiated (Anderson et al., 1976) as a more specific exemplar of the general concept of the numeral. If students are unable to decenter on a fixed meaning for a symbol, understanding will be negatively impacted. Other Challenges Not only do the differences in mathematical text together with vocabulary and symbols make for difficulties in comprehending mathematical problems, but also there are numerous other characteristics that further complicate childrens reading process. To begin, mathematics text is presented in a variety of formats: pictures, graphs, symbols, and words (Reehm & Long, 1996). The wording of a word problem also relates to its difficulty. Some problems are not worded consistently or explicitly (Mosenthal & Kirsch, 1993), so that the mathematical nature of the problem may not be readily apparent (Adams, 2003). Given a picture with five birds and three worms, the question How many more birds are there than worms? is much more difficult than the question How many birds will not get a worm? The difficulty is obviously not in decoding, because the difficulty of the individual words in both questions is similar, but the linguistic complexity of the first question results in many more students giving incorrect responses (Reed, 1999). Comprehension of the semantic information 50 PAGE 66 contained in the problems is informed by childrens existing schemata that represents their knowledge of the story problem discourse, and the topic and context of the problems (Van Dijk & Kintsch, 1977). Comprehension of mathematical story problems, just as with expository or narrative text, involves an interaction between the reader, the text, the author, and the context (Tierney & Mosenthal, 1985). Therefore, if the way a story problem is worded elicits recall of a similarly worded problem or a previous realworld experience, the child will more likely comprehend the problem. The question stating, How many birds will not get a worm? may be more easily understood than How many more birds are there than worms? because it resonates with young children and their realworld experiences with sharing and being included. The second question requires students to make a comparison judgment of more than, a situation that is less common in the early childhood experience. The differences in the wording of the two questions also signal different means of finding the answer. The question beginning How many signals the child to form a onetoone correspondence between birds and worms, then count only the remaining two birds. The question beginning How many more eventually signals the need for a comparison between the numbers five and three, but the question cannot be fully understood until the end of the sentence where the child encounters the words than birds. Although from an adults perspective the two problems involve the same mathematical computation, from the childs perspective, the linguistic presentation of the two questions creates two distinctly different problems that are solved in different ways. Because one51 PAGE 67 toone correspondence and counting are early number concepts that can be understood at an earlier developmental level than the comparison involved in finding how many more than (Van de Walle, 2001), the question asking how many is more easily answered. The ability to read word problems to determine what information is necessary, extraneous, or missing is a skill that requires instruction, guidance, and practice (Adams, 2003). The density and complexity of mathematical text (Wardrop, 1993) makes the process of reading and constructing an appropriate mental model (Hegarty, Mayer, & Monk, 1995) as much an issue in problem solving as performing the mathematical operations that are described in the reading (Mosenthal & Kirsch, 1992). In order for students to become independent learners, the structure of mathematical language must be explicitly taught, just as the structure of narrative and expository text is explicitly taught (Reehm & Long, 1996). Readability No discussion of the complexity of reading in mathematics would be complete without a mention of readability. Although the amount of research and writing on the subject of mathematics readability was abundant from the late 1940s through 1980, the lack of it since that time, especially in the last ten years, is disturbing. With the current trends in mathematics instruction that emphasize solving problems in context, together with research that solidifies the connection between reading comprehension and mathematical problem solving, it would 52 PAGE 68 stand to reason that there is now an even more compelling rationale for research into readability, especially as it relates to mathematics, than ever before. Both mathematical and nonmathematical vocabulary, the number of words in the problem, the length of individual sentences, the length of the problem statement, and the grammatical structure of the problem all impact readability and difficulty (Clarkson & Williams, 1994; Fan, Mueller, & Marini, 1994; Kiplinger, Haug, & Abedi, 2000). Most readability formulas (Dale & Chall, 1949; Flesch, 1948; and Fry, 1969 ) rely on a limited number of factors and are only designed for use with running text of at least 100 words. When using these formulas, the rating process excludes analysis of mathematical text presented through mathematical symbols rather than English, and fails to acknowledge the complexity of reading text that is densely packed with mathematical jargon (Kane, 1970). Two formulas that were designed for short passages such as those used in mathematical word problems are the Short Passage Readability Formula (Fry, 1990), which is simply an adaptation of the Fry Readability Graph, and the more comprehensive HomanHewitt Readability Formula (Homan, Hewitt, & Linder, 1994) that uses a combination of words per sentence, clauses per sentence, word length, and word familiarity (Helwig, RozekTedesco, Tindal, Heath, & Almond, 1999). Although these formulas give a rough estimate of a problems readability level, they do not account for all of the complexities associated with mathematical text described earlier. The fault does not lie with the readability 53 PAGE 69 formulas themselves, but with the application of the formulas to contexts for which they were not intended. Hater and Kane (1975) worked to develop a formula that would be easy to apply, yet be appropriate for assessing the readability of mathematical English. They adapted the cloze procedure by replacing every fifth word or symbol in the text, tables, and figures with a blank, and tested the method using five passages with over 1,700 middle and high school students. They found it to be a reliable means of assessing the reading difficulty of mathematical English. The challenge with this method lies in its lack of practicality. Unlike its predecessors, it is not a formula that can be applied to a passage to determine a gradespecific level of difficulty. To assess the readability of any passage requires administering a cloze test to a large number of participants and analyzing the results, a daunting task for most applications. Reading Comprehension and Word Problems Traditional problem solving focused simply on finding the correct answer. With the newer focus on the use of realworld problem contexts, problem solving now requires that students comprehend text well enough to interpret meanings embedded in the context (Basurto, 1999). Unfortunately, students often embark on problemsolving expeditions without realizing the importance of reading for comprehension (Bratina & Lipkin, 2003; Kiplinger, Haug, & Abedi, 2000). Current notions of reading involve much more than just decoding text. The focus is now on how the reader creates meaning as a result of the transaction between the text and the reader (Rosenblatt, 1978; Ruddell, 1997). 54 PAGE 70 The meaning the reader creates is dependent on his/her prior knowledge (Draper, 2002; Fuentes, 1998; Rosenblatt, 1978) and experience with the information presented in the text, together with the readers ability to make sense of the signs and symbols inherent in the text. Students engaged in problem solving activities may need assistance with reading the text if they lack mathematical content knowledge or an understanding of how to interpret, use, or manipulate mathematical signs and symbols. As mathematics teachers are the most knowledgeable about reading and creating mathematical text, they are in the best position to help students engage in this kind of literacy (Draper, 2002). Simply being able to read the words in a problem does not yield comprehension if the reader does not understand the unique meanings implied by the mathematical context, language, or symbolism (Capps & Pickreign, 1993; Pau, 1995). Reading is composed of the interaction of two distinct processes: decoding and comprehension. Reading comprehension requires the reader to reach far beyond the literal meaning of the printed words. It includes such activities as determining the main idea; inferring the writers purpose; drawing inferences; and summarizing, integrating, and synthesizing the information (Sovik, Frostrad, & Heggberget, 1999). As with any type of text, students will have greater comprehension of mathematical text if they have a set purpose for reading, make predictions, monitor their comprehension, and summarize the information contained in the reading in their own words (Draper, 2002). For skilled readers, decoding is nearly automatic, allowing the majority of the readers attention to be focused on comprehension. In contrast to pure 55 PAGE 71 computation problems, students inability to comprehend the entire problem and context of word problems may lead to a significant number of errors. Arithmetical errors in word problems may indicate a lack of understanding of the text rather than faulty arithmetic. Poor reading comprehension can be related to the childs inability to draw inferences from the text, failure to use working memory to integrate new information into existing schema, or failure to monitor ones own thinking and refer back to the text to gain understanding (Sovik, Frostrad, & Heggberget, 1999). Problem solving assessments that require students to rely on their reading skills to demonstrate their mathematical competency may create a serious disadvantage for lowability readers who struggle with comprehension (Helwig, RozekTedesco, Tindal, Heath, & Almond, 1999). Students who utilize direct translation as a problemsolving strategy focus on the key facts and key words in each statement as they read. Direct translation relies on identifying the key numbers and relational terms in the text of a mathematical story problem, and using them to calculate a solution. All other information is ignored except these details. This method works well for problems that are consistently worded with all numbers listed in the order they must be used for computation, and all relational words correctly implying the operation to be used. However, when problems are inconsistently worded, direct translation will lead to inaccuracies (Bernardo, 1999; Hegarty, Mayer, & Monk, 1995; Reed, 1999). An example of an inconsistently worded problem is: Matt has three more dollars than Nancy. Matt has six dollars. How many dollars does Nancy have? With direct translation, the student would focus on the numbers three and six, 56 PAGE 72 and the relational word more, incorrectly concluding that the solution requires adding three and six to get nine. Students who develop a holistic view of the problem using a problemmodel approach will comprehend the relationship implied in the problem and understand that because Nancy actually has less money than Matt, the problem requires subtraction (Hegarty et al.). Highlighted Research on Reading and Problem Solving The research that relates reading to problem solving falls into four main categories: problem wording, reading disabilities, accommodations, and comprehension. Problem Wording Research The wording of a mathematical story problem appears to impact the difficulty of the problem. Fan, Mueller, & Marini (1994) used three different wordings of problems requiring students to find the difference between two disjoint sets and found that the problems worded in a format requiring a static comparison between two disjoint sets (compare) were significantly more difficult than those requiring the student to make two disjoint sets equal (equalize) or to find element correspondence between two disjoint sets (wont get). An example of a compare problem is, John has 9 apples. Ann has 4 apples. How many more apples does John have than Ann? (p. 359). An equalize problem might ask, Fred has 9 buckets. Betty has 5 buckets. How many more buckets does Betty have to get to have as many buckets as Fred? (p. 360). An example of a wont get problem is, children went to the store to buy hats. There were only 5 hats in the store. How many children would not get a hat? (p. 360). Because 57 PAGE 73 the only difference among the three types of problems was problem text, it can be concluded that problem wording impacts the difficulty of problems. One element of the methodology that seriously reduces the generalizability of these results is that the problems were read aloud to the students. Although this method did allow the researchers to ensure that decoding would not confound the results, it also means that the results cannot be used to draw any conclusions about how problem wording impacts problemsolving in an environment where students must read for themselves. Hegarty, Mayer, & Monk (1995) compared the reading processes of those students who make errors in solving inconsistently worded problems with those who do not make errors. By definition, an inconsistently worded problem is one for which using the numbers in the order they are presented and applying an operation based simply on a key word such as more will not result in a correct solution. By monitoring students eye fixations during the problem solving process, they were able to establish a correlation between the amount of attention paid to specific elements of the problem and problemsolving performance. They found that unsuccessful problem solvers fixated on numbers and relational terms significantly more than successful problem solvers, confirming their hypothesis that poor problem solvers rely on directtranslation of text to solve problems. Successful problem solvers spent less time reexamining the problem, and when they did look back, it was primarily to examine the context rather than numerical details. This implies that the successful problem solvers were attempting to develop a model of the problem to aid in the solution process. 58 PAGE 74 Although this study successfully draws conclusions about the behaviors of the adult problem solvers who participated, no evidence is provided to support the conclusions that the same behaviors govern childrens problem solving experiences. Further work is needed to determine whether childrens problem solving performance would differ based on whether they used direct translation or problemmodel strategies. Leong and Jerreds (2001) work serves to fill the gap revealed in Hegarty et al.s (1995) work regarding consistency of language, and also extends into how the adequacy of information in mathematical problems impacts students proficiency in problem solving. In agreement with Hegarty et al., they found that students scored significantly better on the consistently worded questions than on inconsistently worded ones. There were also significant main effects for ability level and adequacy of information, with students performing better on questions with just enough information than on those with extraneous information. Because the study was conducted with elementaryaged students who independently read the questions, the results have much greater applicability to the classroom than those of Hegarty et al. They also serve to highlight the critical role played by language and reading comprehension in mathematical problem solving because variations in wording consistently impacted students performance. Taking a different approach to the relationship between reading and problem solving, Bernardo (1999) studied the effect on performance of rewording word problems to make them more explicit. Although his study involved 59 PAGE 75 second to fourth grade students whose primary language was Filipino, the results have important implications for English speakers also. The subjects were given a test containing similar problems presented in four formats: conventional set in English, conventional set in Filipino, reworded set in English, and reworded set in Filipino. There was a statistically significant main effect for problem wording, with students doing much better on the reworded problems, as well as a statistically significant effect for language, favoring the problems presented in Filipino. The interaction effect was also statistically significant, indicating that for problems written in English, rewording did not provide much benefit, but for problems written in Filipino, rewording resulted in higher achievement. The results indicate that poor performance in mathematical problem solving may be more a reflection of students reading comprehension ability than mathematical ability. The implication is that students may be more successful with word problem solving and better able to demonstrate their competency when the complexity of the text does not exceed their reading comprehension ability. Disabilities Research Mathematics disabilities have historically garnered much less attention in the research than reading disabilities (Ackerman & Dykman, 1995; Fuchs & Fuchs, 2002; Jordan & Hanich, 2000), but several studies conducted in the last ten years are shedding light on the relationship between the two. Jordan and Montani (1997) examined the computation and problemsolving skills of two subgroups of students with mathematics disabilities: those with reading difficulties and those without. On both number facts and story problems, 60 PAGE 76 students with mathematics disabilities who were good readers scored lower than normally achieving students in timed conditions, but not in untimed conditions. Children with disabilities in both mathematics and reading performed worse than normally achieving students in all tasks: timed and untimed. As a followup to the work by Jordan and Montani (1997), Jordan and Hanich (2000) studied the performance of secondgrade children on cognitive tasks associated with mathematics teaching. All children were assessed individually on tasks including number facts, story problems, place value, and written calculations. The results for the story problem tasks are of interest here. The story problems were read aloud to the students and a written version was also provided. The students with mathematics and reading disabilities and those with only mathematics disabilities performed significantly worse than both normally achieving and reading disability only students. Those with only mathematics disabilities performed better than the group with both disabilities. The only task in the study that differentiated between the mathematics disability only group and the normally achieving group was the story problem section of the test. The results of this study are interesting, but several elements of the study design must be considered before drawing any conclusions. By reading the problems aloud, the full impact of reading disabilities on children who must independently read and solve mathematical problems cannot be surmised. The study sample was relatively small, and the researchers did not attempt to gather information about the teaching practices in the three classrooms from which the 61 PAGE 77 study participants were drawn. If, in fact, the practices were vastly different, using the individual child as the unit of analysis violates the assumption of independence, causing a serious threat to the robustness of the results. What this study does accomplish is to highlight the necessity of differentiating between students with mathematics disabilities from those who also have reading disabilities. Much of the earlier research on this topic confounded these two subgroups, making it difficult to interpret the findings. Jordan, Kaplan, & Hanich (2002) built on this work with a longitudinal study of students through their secondand thirdgrade years, geared towards understanding the longterm developmental issues of students with mathematics disabilities, with and without comorbid reading disabilities. Based on their previous work, the researchers predicted that students with both mathematics and reading disabilities would gain mathematical proficiency at a slower rate than children with only mathematics disabilities due to their inability to use reading as a compensatory strategy. Students were assessed using the WoodcockJohnson PsychoEducational BatteryRevised test at the outset of the study to assign the children into one of four groups: mathematics disability only, mathematics and reading disability, reading disability only, or normally achieving. The mathematics tasks used in assessment throughout the study were read aloud to the students. Regardless of instructional method, children with both disabilities appeared to learn at a slower rate than those with only a mathematics disability. The most interesting finding is that students who began the study with only reading disabilities completed the study with mathematical performance 62 PAGE 78 levels equal to those students who started the study with mathematics disabilities. The researchers concluded that children with specific reading disabilities are at risk of developing associated mathematics disabilities as they progress through elementary school, believing that some mathematics skills appear to be acquired in a languagespecific format. As with Jordan and Hanichs (2000) earlier work, the findings must be interpreted in light of the fact that the students were not required to read the problems themselves. The findings of Ackerman and Dykmans (1995) research are similar to those of Jordan et al. (2002). Ackerman and Dykman studied elementaryaged poor readers from ages seven to twelve who were coded as either reading disabled (RD) or reading and arithmetic disabled (RAD) according to test scores. All students were then given a battery of psychoeducational tests. The results indicated that the RAD students scored lower on measures of sequential memory than RD students, and both groups scored lower than adequate readers. These differences in achievement between RD and RAD students were significantly larger in the upperelementary students, suggesting that mathematical retardation among poor readers increases with age. Fuchs and Fuchs (2002) saw a weakness in the existing research about the relationship between mathematics disabilities and reading disabilities in problem solving. Previous research was largely confined to a study of student responses to simple, onestep arithmetic word problems. To address this limitation, they conducted a study of students with mathematical disabilities with and without comorbid reading disabilities on a range of mathematical problem63 PAGE 79 solving tasks including simple arithmetic story problems (onestep story problems involving sums or minuends of 9 or less), complex story problems (word problems requiring students to select a problemsolving strategy, find relevant information, and use a correct operation), and realworld story problems (presented in a multiparagraph narrative including graphic and tabular information and requiring students to answer four questions about the problem situation). Students in this study were required to read the problems on their own. An analysis of the three problem types revealed that realworld problems had significantly more words, sentences, words per sentence, verbs, numbers, and math steps, than complex story problems, with arithmetic problems having the least number of all of these criteria. The accuracy of all students responses decreased dramatically across the three problem types, with the highest scores found on arithmetic problems, followed by complex problems then realworld problems. The students with mathematics disabilities scored 75% on the arithmetic story problems, 14% on the complex story problems, and 12% for realworld story problems, while the students with both mathematics and reading disabilities scored 55%, 8%, and 5%, respectively. In all cases, the students with reading disabilities scored significantly lower than those without, indicating that reading ability is a significant predictor of mathematical problem solving ability across problem types. In a study of deaf children 8 to 12 years of age, Pau (1995) found that the reading level of the word problem was directly related to childrens problem solving abilities. Kelly & Mousley (2001) studied both deaf and hearing college 64 PAGE 80 students to determine the effect of their measured reading levels on their ability to solve mathematics problems that were presented both in numeric and word problem formats. No significant differences were found with the simple problems, but with the more challenging word problems, the deaf students of all reading ability levels performed significantly lower than the hearing students, and the deaf students with high reading ability performed significantly better than the deaf students with middle and low reading ability. Accommodations Research The issue of providing testing accommodations to students with learning disabilities is a matter of serious debate due to the highstakes nature of many current assessments. The goal of any accommodation is to level the playing field by ensuring that students disabilities in one area do not impact their ability to demonstrate achievement in another area. One of the most contentious accommodations currently being debated is reading the mathematics portion of the test aloud to students with reading disabilities. Tindal, Heath, Hollenbeck, Almond, and Harniss (1998) studied the impact of this accommodation on both regular and special education students. The control group used the standard administration of independently reading the items from the booklet, but the questions were read aloud two times for the students in the accommodation treatment. Students in the regular education curriculum showed no preference for test format, but students who were receiving special education services scored significantly higher with the readaloud accommodation. These findings seem to support the use of this accommodation for mathematics test situations. 65 PAGE 81 Johnson (2000) studied the same phenomenon and found that the accommodation did not impact the performance of students without disabilities, but approached significance for students with reading disabilities. The small group sizes (<40) limited the power of the statistical test to detect a difference; therefore the results must be interpreted with caution. However, the consistency of the findings with the results of the Tindal et al. (1998) study indicate that reading the mathematics questions aloud to students with reading disabilities does not affect the validity of the test. Fuchs, Fuchs, Eaton, Hamlett, and Karns (2000) work extended previous findings by focusing on how accommodations impact performance on different problem types. In their study, students with and without learning disabilities performed best on conceptapplications problems when the questions were presented orally. With the problem solving test, however, learning disabled students performed significantly better under the oral presentation, but the presentation format had no effect on nondisabled students. The conceptapplications problems contained significantly fewer words than the problem solving items, suggesting that reading was an important access skill only for the latter. For this reason, students performance on the conceptapplications problems is counterintuitive and deserves further study. Helwig et al.s (1999) findings contradict those of Fuchs et al. (2000). Helwig et al. found that mathematical problems that contained a multitude of challenging language factors tended to be solved more easily when read aloud as opposed to having students read the problems. The accommodation did not 66 PAGE 82 have a significant impact on students performance with word problems containing only a few words. Within the high mathematics group, the differences between hearing the problem read aloud and independent reading were much greater for the lowability readers than for the proficient readers. The low mathematics students significantly preferred having the items read aloud, but their lack of mathematical proficiency apparently precluded them from benefiting from the accommodation. The researchers concluded that because the results of the accommodation among strong math students discriminated between those with high and low oral reading fluency, part of what was being tested was students word identification skills. Although reading word problems aloud was not found to be beneficial for a majority of students, this accommodation may allow those who excel in mathematics yet struggle with reading fluency to demonstrate their true mathematical abilities. In a followup study, Helwig, RozekTedesco, & Tindal (2002) sought to extend their earlier work by administering both standard and readaloud versions of a mathematics achievement test to a large sample of students over a range of grade levels. They hypothesized that the students with low reading skill would perform better when test items were read aloud, especially for items that were linguistically complex. The main effect of reading ability was significant in all cases, however the results showed only limited evidence, and only at the elementary level, that reading the test items aloud was an effective accommodation. The researchers, surprised by these results, point to the possibility that students may have failed to take advantage of the readaloud 67 PAGE 83 accommodation, choosing to work on their own. No observations were conducted during the testing event to ensure that students actually followed the pace of the oral reading. In addition, even the most linguistically complex test items on this assessment were not as difficult as the items used in other studies that found the accommodation to make a significant difference for poor readers. It should also be noted that the low readers scored extremely low on the test, indicating that they may not have had the mathematics skills necessary to take advantage of the accommodation. Due to the conflicting results of some of these studies, and considering the highstakes nature of any policies that may be enacted about testing accommodations, further research is still needed. Future studies must work to solidify the connection between problem types and the resulting impact of accommodations for both learning disabled and average achieving students. In addition, sample sizes must be large enough to ensure that statistical tests are able to detect any differences that may exist. Reading Comprehension Research An increasing amount of practicebased literature points to the relationship between reading comprehension and problem solving performance, yet the relationship is not reinforced in the classroom enough (Tovani, 2000), and the empirical research on this topic remains scant. Borasi, Siegel, and Fonzi (1998) found that encouraging students to talk, write, draw, and enact texts provided them with concrete ways of constructing meaning from the mathematical text that they read. They were correct in hypothesizing that the instructional strategies 68 PAGE 84 drawn from the reading education literature, specifically transactional reading theory (Rosenblatt, 1978), would allow mathematics teachers to support sensemaking and discussion in their classrooms. Although this study focused on the reading of mathematical narrative texts rather than mathematical problems, the researchers suggested that the results were so promising that further research to explore the applicability of these reading strategies to problem solving is warranted. Morales (1998) examined whether childrens understanding of the semantic structure of word problems would impact their ability to solve the problems accurately. Using a fourway ANOVA, the study found that students who have a faulty conception of what the problem is about have more difficulty solving the problem. As a result, the researcher concluded that there is a need for placing greater instructional emphasis on the comprehension component of problem solving. The final study to examine comprehension looked at the impact of fourthgrade students reading comprehension ability on their choice of strategies in solving arithmetic word problems (Sovik, Frostad, & Heggberget, 1999). Students with good arithmetic ability, regardless of reading ability, chose similar strategies for solving the addition and subtraction problems. For the multiplication and division problems, the group with more proficient reading comprehension and high arithmetic ability used significantly more sophisticated strategies than the proficient readers with poor arithmetic ability. With the more mathematically difficult problems, proficient readers seemed to profit more from 69 PAGE 85 their reading ability than with items that tested simpler arithmetic concepts. Although this study does show some relationship between reading ability and strategy selection, no significant conclusions can be drawn from the results due to the fact that no effort was made to include the readability level of the question stems as a variable, or to provide for a range of reading difficulty within each test set. The addition and subtraction items were significantly below grade level in both reading and arithmetic difficulty. A typical example is Jens had 2 apples and Ola had 7 apples. How many apples did they have in all? (p. 380). Because of this, it is not surprising that nearly all students in all groups answered all of the questions correctly, and the results showed no significant difference between groups. Gender Differences in Reading/Literacy Because reading ability directly impacts childrens achievement in problem solving, a review and discussion of gender differences in reading is in order. Recent statistics on boys performance in literacy show that boys account for 75 to 85 percent of those labeled atrisk of poor achievement progress in literacy (Rowe, 2000), and that girls consistently outperform boys in reading and writing (National Center for Education Statistics, 2000). Other studies show that boys are three to five times more likely than girls to be labeled reading/learning disabled (Young & Brozo, 2001), with a boy to girl ratio of 4:1 in learning disabled programs (Vogel, 1990), and a significantly greater number of boys than girls in remedial reading classes (Alloway & Gilbert, 1997). These same findings are being found across the globe, including Canada (Gambell & Hunter, 2000), the 70 PAGE 86 United Kingdom (Barrs, 2000), and Australia (Rowe, 2000). In fact, the Program for International Student Assessment (PISA) found that girls scored significantly higher than boys on a test of reading literacy for all 32 of the countries that participated in the study (Lemke et al., 2001). Is this a new phenomenon? Research into the history of American education reported by Young and Brozo (2001) shows that as early as the late 1800s there was concern that boys were not doing as well as girls in school. Throughout history, a pattern emerges in which boys failures were blamed on factors external to the boys themselves: pedagogy, methods, texts, or teachers. When boys were successful, all credit was given to their intrinsic intelligence and ability. In contrast, girls successes have been attributed to pedagogy, methods, texts, or teachers, while their failures were blamed on the girls themselves and their lack of ability. The concern over boys academic performance resurfaced periodically until the 1960s when, for the first time, the focus shifted to girls. The result has been that girls are now consistently outperforming boys in all measures of academic success (Young & Brozo, 2001). Sources of Boys Underachievement There are many roots of underachievement in boys, the first of which is social in nature. Boys have a strong desire for membership in a Discourse of masculinity (Discourse with a capital D meaning ways of being, thinking, acting, talking, and reading; Gee, 1996). This Discourse comes into play in the literacy classroom when boys, wishing to be viewed by their peers as a certain sort of boy, will read, write, and think like other boys who are members of that particular 71 PAGE 87 Discourse of masculinity (Young & Brozo, 2001). Unfortunately, the most prevalent masculine Discourse among schoolaged boys includes a caveat that real boys dont do schoolwork, including literacy work. Jackson (1998) found that even academically successful boys felt the need to hide their commitment to academic work in order to be accepted by their peers. Boys come to school with a fairly fixed preference for certain kinds of literature (Young & Brozo, 2001), and denying them engagement with texts that appeal to them denies them access to meaningful literacy. When boys are not engaged with texts that appeal to them, we risk them developing the idea that reading is not a masculine undertaking (Hunsader, 2002b). The second root of the problem is biological. According to Gur (as cited in Mulrine, 2001), director of the University of Pennsylvanias Brain Behavior Laboratory, womens brains are an average of 11 percent smaller than mens, yet there is no gender difference in IQ. The female brain is simply more finely developed with a larger corpus callosum, the bundle of nerves that provides communication between the two sides of the brain. The result is that females verbal skills are superior to those of males. Boys have more white matter in the brain, making for superior gross motor skills, and their high volume of spinal fluid allows their brains to sustain more blows without injury. In addition, the portion of the brain that controls emotions is less developed in men than in women (Mulrine, 2001). The very things at which the male brain excels gross motor skills, visual and spatial skills, exuberance are often things that do not find a good reception in educational settings. 72 PAGE 88 This lack of reinforcement of boys natural strengths has resulted in their perception that teachers like girls better (Dobson, 2000). Elementary classrooms, especially in the primary grades, use management techniques that are designed primarily by women to fit the temperament, learning styles, and natural tendencies of girls, and this is taking a toll on boys (Dobson, 2000; King, in press, Palardy, 1998). Gender is overtly and covertly integrated into school culture, primarily through the gender of the teacher. In the early grades, boys and girls are expected to master literacy while embedded in a classroom environment that is teachercentered (Boggs, 1999). Due to the overwhelming preponderance of female teachers in the early grades, this teachercentered classroom environment is more accurately termed a femalecentered environment. The third source of difficulty for boys in the literacy classroom is a result of the increased demand for operational literacy in schools especially verbal reasoning and written communication skills as evidenced by changes in both schoolbased and standardized assessment. These are all areas in which girls have a distinct advantage (Rowe, 2000). For many boys, especially those termed late bloomers, the verbal reasoning and literacy demands being placed on them are often constrained by performance parameters that limit their ability to demonstrate what they know and can do. Rowe (2000) reports that boys reluctance to read, exacerbated by their recent increase in solitary play (video games), contributes to their literacy underachievement. In addition, the fact that girls social lives revolve around verbal communication, while boys social 73 PAGE 89 experiences revolve around play, serves to widen the gap between the sexes in schoolbased literacy activities (Rowe, 2000). Newkirk (2002) acknowledges that boys currently enjoy a low status in the educational arena, but refuses to blame popular culture for their plight. Instead, he posits that because boys are avid consumers of visual media and TV is the primary entertainment for economically disadvantaged children, teachers have an ethical obligation to use this media creatively to enhance boys literacy development. The fourth cause of underachievement in literacy is a bit of a catch22. Research indicates that the most significant predictor of student literacy achievement is student attentiveness. The most influential indicator of student attentiveness is gender, with girls being significantly more attentive than boys (Hill & Rowe, 1998; Rowe, 2000). The implication of this finding is enormous given the current move away from short answer assessments and activities to those that require extended periods of attentiveness. This change in pedagogy may be inadvertently exacerbating the differences in literacy achievement between boys and girls (Hunsader, 2002b). Interestingly, the relationship between attentiveness and literacy achievement holds true in reverse as well. Research shows that students early growth in reading skills has the result of reducing inattentive behavior, thereby improving academic achievement. The underachievement of boys leads to boys actingout behaviors, inattentiveness, low selfesteem, and an unwillingness to participate in schooling. Conversely, it is literacy achievement that most strongly reduces inattentiveness and improves 74 PAGE 90 both the educational and behavioral outcomes of students especially boys (Rowe, 2000). Highlighted Research on Gender Differences in Reading/Literacy A significant portion of the empirical research relating gender and reading comes out of Canada. In a largescale study of students in Saskatchewan, male students were weaker than females in reading across all grade levels studied. Attitudinal data indicated that girls had higher selfconcepts as readers and more positive attitudes towards reading than boys (Gambell & Hunter, 1999). A longitudinal study conducted in Eastern Canada tracked the reading achievement of a group of students as they progressed from first through sixth grade (Phillips, Norris, Osmond, & Maynard, 2002). In the first through third grade, there was a higher proportion of boys in the belowaverage category, and a higher proportion of girls in both the average and aboveaverage categories. An interesting shift occurred by the end of fourth grade, however, with the relationship between gender and reading ability losing statistical significance. A similar phenomenon was found In British Columbia with a study of fourth, fifth, and sixth graders ability to read science text (Williams & Yore, 1985). Although overall results showed girls outscoring boys, there was a grade by sex interaction with boys outscoring girls by grade six. New Zealand, a country known around the world for its high standards and achievement in literacy (Guthrie, 1981), has recently experienced a downward trend in its international literacy standings caused in part by the declining literacy 75 PAGE 91 achievement of its boys (Wagemaker, 1993). Wilkinson (1998) conducted an extensive study of the literacy habits and abilities of grade four students in New Zealand. Although girls outscored boys, three factors served to reduce the gender gap: female teachers, teachers who had spent more years in preservice training, and the provision of less reading materials in the classroom. The researchers hypothesized that female teachers may be more responsive to individual differences and more likely to address reading difficulties faced by boys at an early age. This hypothesis reveals a recent change in attitude towards the preponderance of female teachers in the elementary grades. In the 1960s and 1970s in the United States, female teachers were blamed for feminizing boys and harming their educational performance (Smith, 1973). The finding for teacher training indicated that education helped teachers cater to individual differences among boys and girls. The counterintuitive finding about classroom reading materials may reflect a tendency of girls to make greater use of classroom materials when they are available, such that when materials are scarce, the advantage girls gain from using the materials diminishes. In a study relating reading to mathematics, Tartre and Fennema (1995) found that verbal skill was a significant predictor of mathematics achievement for boys, but not for girls. The researchers did not draw any conclusions about the cause of this phenomenon, but a potential hypothesis is that boys underachievement in literacy has resulted in verbal skill becoming a major factor in determining how boys perform in other content areas. 76 PAGE 92 Two studies used survey data to understand the relationship between gender and literacy. Pottorff, PhelpsZientarski, and Skovera (1996) studied the gender perceptions of elementary and middle school students about literacy in hopes of beginning to understand the causal relationship between gender and literacy achievement. The results indicated that both genders viewed girls as being more capable readers and writers than boys. Although boys viewed themselves more positively than girls viewed them, only a small percentage of boys saw themselves as being better at these activities than girls. The findings were consistent across grade levels and demographic populations. Commeyras (1999) surveyed literacy educators to assess their beliefs and interest in the sources of gender differences in literacy. She found that more educators believed that boys and girls behavior was a combination of biological and sociocultural determinants than either cause alone, and that most teachers indicated some interest in knowing more about gender interest in literacy education. Acceptance that gender differences are, in part, socially constructed, coupled with educators interest in gaining more knowledge, indicated teachers belief in their ability to enact change and their interest in doing so. Summary of Reading Research Table 2 summarizes the results of the research studies examined in this section. The linguistic complexity of word problems appears to impact students problem solving success (Bernardo, 1999; Fan et al., 1994; Leong & Jerred, 2001), and students with reading disabilities are at a disadvantage when mathematical problems are presented in a written context (Ackerman & Dykman, 77 PAGE 93 1995; Fuchs & Fuchs, 2002; Jordan & Hanich, 2000; Jordan & Montani, 1997, Jordan et al., 2002). Studies of readaloud accommodations did not have consistent findings, but generally speaking, reading disabled students benefited from having word problems read aloud to them in testing situations (Helwig et al., 1999; Helwig et al., 2002, Johnson, 2000; Fuchs et al., 2000; Tindal et al., 1998). Regarding the issue of gender, in most studies girls scored higher in reading, and had more positive attitudes about reading than boys (Gambell & Hunter, 1999; Pottorff et al., 1996; Wilkinson, 1998). Two studies had slightly different results, one finding similar reading scores among boys and girls by the end of fourth grade (Phillips et al.,2002), and one finding that girls lost their advantage by the sixth grade (Williams & Yore, 1985). SelfRegulated Learning The Concept and Nature of SelfRegulated Learning The topic of selfregulated learning (SRL) is a relatively new but important focus of study among educational psychologists. If researchers can begin to understand the questions of how, when, and why students take over the direction of their own learning, instructional strategies can be developed to support and encourage those behaviors. Zimmermans (1990) definition of SRL focuses on three main components. He states that selfregulated learners are metacognitively, motivationally, and behaviorally active participants in their own learning (p. 4). Selfefficacy and selfassessment, two key variables of this 78 PAGE 94 Table 2 Results of Reading Studies Grouped by Topic Author/Date Participants Results Bernardo, 1999 Fan et al., 1994 Hegarty et al., 1995 Leong & Jerred, 2001 Ackerman & Dykman, 1995 Fuchs & Fuchs, 2002 Jordan & Hanich, 2000 Jordan & Montani, 1997 Jordan et al., 2002 Kelly & Mousley, 2001 Pau, 1995 Helwig et al., 1999 283 Filipino students grades 24 29 1 st graders 38 undergraduate psychology students 91 3 rd 5 th graders 65 students ages 7.512 years 40 4 th graders with math disabilities but average intelligence 76 2 nd graders 48 3 rd grade students, half with mathematics disabilities 180 children studied through their 2 nd and 3 rd grade years 44 deaf and hearing college students 12 deaf children: 812 years old 247 6 th graders Problem Wordin g Students scored higher on problems reworded in simpler language. Problem wording affected the difficulty of the problems. Poor problem solvers relied on direct translation. Good problem solvers developed a mental model of the problem. Students performed better on consistently worded problems than on inconsistently worded ones. Students with disability in reading and math scored lower on a test of sequential memory than students with reading disability only. Students with math and reading disabilities scored lower on story problems than those with only math disabilities. Students with math and reading disabilities scored lower than all other groups. Math disability only students were weak on story problems. Students with math disabilities struggled with problem solving efficiency. Students with math and reading disabilities also struggled with problem conceptualization. Students who began the study with only reading disabilities finished at the same level as those who started with math disabilities. Deaf students scored lower than hearing students on word problems. Deaf students reading level was predictive of their problem solving ability. Students with Mathematics and/or Readin g Disabilities Students performed better when problems with challenging language were read aloud. Testing Accommodations for Disabled Students 79 PAGE 95 Table 2 (Continued) Author/Date Participants Results higher selfconcepts of themselves as readers, and held a more positive attitude 80 Helwig et al., 2002 Johnson, 2000 Fuchs et al., 2000 Tindal et al., 1998 Borasi et al., 1998 Morales, 1998 Sovik et al., 1999 Commeyras, 1999 Gambell & Hunter, 1999 Phillips et al., 2002 Pottorff et al., 1996 Tartre & Fennema, 1995 Wilkinson, 1998 Williams & Yore, 1985 1,343 students grades 4, 5, 7, and 8 115 4 th graders 373 4 th graders, half with learning disabilities 481 912 year old students 4 secondary math classrooms 199 MexicanAmerican students grades 25 102 4 th graders 1,530 literacy educators 3,214 students in grades 5, 8, & 11 187 students in grades 16, longitudinal study 730 students in grades 2, 4, 6, & 8 60 students, longitudinal study of 6 th 12 th grade 2,949 grade 4 students 358 4 th 5 th & 6 th graders Reading test items aloud only benefited the elementary students. Reading test items aloud did not impact performance of nondisabled students but approached significance for reading disabled. Readaloud accommodation helped reading disabled but not nondisabled students. Read aloud accommodation did not impact regular education students, but helped special education students. Use of transactional reading strategies improved sensemaking and text discussion. Students with faulty conception of problems struggled with solution processes. Students with good arithmetic ability, regardless of reading ability, chose similar problem solving strategies. Readin g Com p rehension Research Student beha vior is determined by genetics and societal influences. Educators desire to know more about gender and literacy. Girls scored higher in reading and had higher selfconcepts and more positive attitudes about reading than boys at all grade levels. Girls performed better until the end of fourth grade when achievement leveled out. Girls are viewed as better at reading and writing. Few boys feel they are better at reading and writing than girls. Verbal skill was a significant predictor of mathematics achievement for boys but not for girls. Girls had higher reading scores, but differences were reduced by female teachers, highly educated teachers, and scarcity of classroom reading materials. Although girls outscored boys overall in reading, 6 th grade boys outscored girls. Gender Differences in Readin g PAGE 96 study, are intertwined in these three components of SRL and will be discussed separately later in this section. Metacognitively, selfregulated learners set their own goals, monitor their progress, and selfevaluate throughout the learning process. In terms of motivation, selfregulated learners feel confident about their ability to be successful, and are intrinsically interested in learning. Behaviorally, selfregulated learners actively work to create environments that are conducive to learning. Selfregulated learners are unique in their awareness of the connection between the learning strategies they employ and their success in accomplishing their learning goals. They persistently monitor the effectiveness of their learning strategies and make changes when they fall short of reaching their goal. They also independently determine what learning outcomes are worthy of their interest and effort, and make judgments about their level of strategy use based on the degree to which they desire an outcome. Hence, most SRL theories consider learning and motivation as two interdependent processes (Zimmerman, 1990). Unfortunately, getting students to use strategies correctly is much more complex than simply teaching them the strategies. The process of selecting and monitoring strategy use requires highlevel reasoning that in many cases is developmentally acquired (Zimmerman, 1990). If educational psychologists were able to determine when instruction for each selfregulatory strategy was developmentally appropriate, and develop concrete methods for teaching students to selfregulate, educators would more likely accomplish the goal of helping all children become independent learners. 81 PAGE 97 McCombs and Marzano (1990) voiced their concern that confusion about the interrelationships between metacognitive, cognitive, affective, and motivational aspects of SRL is negatively affecting the ability of research to bring about useful conclusions. This concern is well founded since the leading theorists in the field have very divergent views about the nature of SRL. McCombs and Marzano define selfregulated learning as the outcome of choosing to engage in selfdirected metacognitive, cognitive, affective, and behavioral processes and skills (p. 52) They see SRL as being the result of both will and skill. In order to promote the development of SRL, interventions must target both of these elements. Winne (1995) sought to gain a deeper understanding of SRL through examination of empirical research on the subject that was not commonly cited in the literature. He acknowledged that selfregulating learners are more effective learners, and hoped to uncover specific interventions that would encourage selfregulatory behavior. He undertook his investigation for three main reasons. First, he believed that because so much of a students learning time is spent alone, understanding how learners develop and use SRL strategies, namely those strategies that students rely on when working alone, may help researchers develop better resources for SRL strategy instruction. Second, coming to understand how complex SRL behaviors develop in the absence of direct training may aid in the development of methods to support these behaviors. Finally, because SRL exists as a phenomenon, it is worthy of the effort required to understand it more fully. 82 PAGE 98 Winne (1995) enlisted the help of Pat, a senior honors psychology student, to uncover the features unique to selfregulating learners. He found that selfregulating learners are aware of the challenges that may hinder their learning process, they are deliberate in using the strategies that will enable them to achieve their learning goals, and they are in control of their attitudes and thinking. His review of research led him to conclude that when learners exert effort to accomplish a difficult task and find success, they are more likely to expend high levels of effort on other tasks perceived to be difficult. He also found that lowability learners avoid selfregulatory behaviors when faced with a task that demands application of a new skill. Sound knowledge about a specific domain enables students to approach a novel task with more confidence, and therefore increases the likelihood that they will engage in selfregulatory behaviors. Selfregulated learning involves knowledge, beliefs, and learned skills, and can therefore be impacted by environmental changes. Winne concluded that the solitary nature of much of students learning efforts was just cause for more attention to be focused on researching the details of inherent SRL. In reaction to Winnes (1995) writing, Alexander (1995) focused on two elements of SRL that she believed were not properly portrayed. First, she posited that selfregulation that occurs in solitude has not been proven to be markedly different from that which occurs in social situations. Second, she argued that Winne failed to delineate the degree to which SRL is a general characteristic of the learner versus being related more to specific situations or content areas. 83 PAGE 99 Alexander (1995) stated that the SRL behaviors that occur when learning alone, learning with others, and learning in the presence of others, may not be entirely distinguishable as portrayed by Winne (1995). She claims that even learners who appear to be working alone, may actually be engaging in a private mental dialogue with themselves as either audience or teacher. Also, it should not be assumed that learners engagement in a social learning situation precludes the internal dialogue that might occur in solitude. Alexander also contradicts Winnes claim that selfregulating learners exert higher levels of effort than those who are less selfregulating. She notes that a learner effectively engaging in SRL may determine that his/her effort is not warranted by the task. Mindless engagement in trivial tasks should not be considered a positive quality that is associated with SRL. On the contrary, selfregulated learners should selectively engage in tasks that are meaningful. In addition to these differences in point of view related to SRL, Winne (1995) and Alexander (1995) also disagree on the nature of selfregulatory behaviors associated with levels of competence in specific domain areas. Winne states that selfregulated cognitive engagement is a deliberate, judgmental, adaptive aptitude of expert proportion (p. 191). Alexander counters that for students who are novices in a particular domain area, even their greatest efforts at selfregulatory behaviors are not likely to be of expert quality. She claims that selfregulatory efforts are likely less frequent, more difficult, and less rewarding (p. 192) than when the learner is competent in the domain area. She believes that future research in SRL must take into account the context in which the 84 PAGE 100 learning occurs together with the learners interest and competency level in that context. Pressley (1995) also responded to Winnes (1995) article with his own thoughts on the nature of SRL. Pressley views SRL as much more complex than Winne, developing over years and decades, with real expertise not likely developing until one has been a scholar at a high level for a long period of time. He also argues that selfregulatory strategies must develop over a long period of time before they can be transferred to new contexts. The reasons for this are fourfold. First, for new strategies to be integrated, they must prove themselves to be superior to older, more deeply ingrained strategies. Second, simply knowing how to use a strategy does not mean that the learner will know when and where to use it. Third, learning a strategy does not guarantee that the learner will fully comprehend the usefulness of the strategy. Finally, learning a strategy does not guarantee that the learner will know how to modify the strategy for use in a new situation. Pressley does not ascribe to Winnes optimism that teaching selfregulation strategies will result in their use. Pressley (1995) and Alexander (1995) agree in arguing against Winnes view of the solitary nature of much of students learning experience. Pressley notes that even study that appears to be solitary may be influenced by many social inputs. He points out that though seatwork seems solitary, it is largely influenced by recent instruction that occurred in a social context. The student who appears to be working alone is not thinking alone (p. 210). Also, the child 85 PAGE 101 who is working alone often has access to social support simply by raising her/his hand. In support of Winnes (1995) work, Pressley recommends adherence to some of the principles of teaching Winne encouraged. Students should be provided with varied opportunities to realize that their effort is worthwhile. Students should be able to practice new strategies until they become routine before they are expected to use them in selfregulatory behaviors. They must come to understand that learning is difficult and requires effort, even for the most able learners, and that knowledge is subject to multiple perspectives. Individual Differences in SRL Winne (1996) followed his original article with new writing on individual differences in selfregulated learning and the elements that contribute to those differences. He begins by breaking the SRL process down into two phases. The first is to understand the task, outline goals, and make plans for accomplishing those goals. It is in this first stage that feelings of selfefficacy come into play. The second phase involves carrying out the plan and making adjustments to selfregulatory strategies in response to both metacognitive monitoring and feedback received from external sources. It is during this second phase that selfassessment plays an important role. Winne (1996) points to the lack of research using individual differences as an object of study. He cautiously outlines five potential sources of individual differences in the metacognitive aspects of SRL that may be fruitful sites for future study. All students vary in their domain knowledge, and an expert in one 86 PAGE 102 area may have less need for selfregulatory behavior than a novice. Similarly, all learners differ in their knowledge of cognitive strategies. No two learners are identical in their ability to perform these strategies, depending on the degree to which they have utilized the strategies and have developed automaticity. Every learner differs in his/her ability to regulate tactics and strategies. Finally, students dispositions and styles vary. Classroom Applications of SRL Paris and Paris (2001) contribute to the field of knowledge about SRL through their work on ways to apply research findings to the classroom. They point to the changing perspective in the field of educational psychology that increasingly seeks to link its work to interventions that can be implemented in the classroom for the benefit of students. They list six stages of instructional practice that support the development of selfregulatory behaviors. Teachers should begin by activating students prior knowledge, followed by discussion and cognitive modeling of strategies. These stages are followed by mnemonic memorization, supported performance, and independent performance. These stages should not be considered strictly linear, as it may be necessary to return to one or all of the stages as students develop their SRL strategies. Successful interventions in the development of selfregulation tend to provide a variety of strategies with guidance on how, when, and why they may be utilized. Students need to see that their success in performing a task is the direct benefit of their selfregulatory strategies as opposed to either their innate ability or the skill of the teacher. Students should learn to value the input of their peers 87 PAGE 103 on the use of strategies, and strategies should become a regular part of activities throughout the curriculum. The optimum classroom environment for development of SRL is characterized by openended tasks that allow for some student choice, encouragement of collaboration, expectation of high performance, and facilitation of students selfefficacy in dealing with tasks. Students need to be provided the opportunity to develop their selfregulatory behaviors indirectly through classroom experiences, directly through instruction, and also through practice (Paris & Paris, 2001). Assessment is another key to students motivation to engage in selfregulatory behaviors. Selfassessment is related to the cognitive, motivational, and affective aspects of SRL. As students develop their ability to monitor their actions, they become more effective and accurate in assessing their own performance across a wider range of behaviors. They become more able to assess their level of comprehension, their level of effort, and their own interests. When students become proficient in detecting quality in their performance, their feelings of selfefficacy and perceptions of ability increase (Paris & Paris, 2001). Both internal and external factors affect students selfassessment processes. Internal factors include selfappraisal, the evaluation of ones own ability and selfmanagement, and the planning that results from monitoring ones ongoing behavior. External factors that impact selfassessment include the curricula and assessment activities selected by teachers. Closed tasks that do not allow students to pursue knowledge independently, and objective tests such as multiple choice and truefalse, do little to promote SRL. In contrast, authentic 88 PAGE 104 assessment is by nature only possible in response to more openended and engaging tasks. Portfolios also provide a context for students to assess not only their current work, but also their growth over time (Paris & Paris, 2001). SRL in the Field of Mathematics Cognitive theory has succeeded in breaking down complex tasks such as mathematical problem solving into component processes and strategies. The goal of this effort is ultimately to improve instructional methods and student achievement. Current research has found that students who are successful in mathematical problem solving are strategic, and actively engage in using strategies to improve their performance. More specifically, good mathematics students possess and utilize metacognitive knowledge to direct their learning and problem solving. They know when, how, and where to use specific strategies, and they continually monitor the use of these strategies so they are able to make appropriate modifications throughout the problem solving process. These students also possess sufficient mathematical content knowledge to be prompted when to use a particular strategy, and when strategic activity may be unnecessary (Braten, 1998). Students who can activate and persistently use their strategies, metacognition, and domainspecific content knowledge in mathematical problem solving are said to be practicing selfregulation. Childrens selfdirected or private speech becomes the primary instrument of selfregulation (Braten, 1998). During selfregulation, children, in effect, become their own teachers, using an unspoken literacy as their private guide through the problem solving process. 89 PAGE 105 Students must know how to monitor and regulate their own actions, develop selfawareness, and reflect on their own actions. These metacognitive behaviors are best learned and exercised in the context of real problem solving events (Leinhardt & Schwartz, 1997). Metacognitive instruction has been found to benefit problem solving, reading comprehension, and writing; when instruction is provided for one of these processes, there is a positive carryover effect to the other processes (Kramarski, Mevarech, & Arami, 2002). The major goals of metacognitive instruction in mathematical problem solving are to help students comprehend the problem, construct connections between previous and new knowledge, use appropriate strategies to solve the problem, and reflect on the process and the solution (Kramarski, Mevarech, & Arami, 2002). Metacognitive strategy instruction should include work on both selfmonitoring and selfregulation. Selfmonitoring enables students to focus attention on their learning processes, and selfregulation helps students become goaldirected (CardelleElawar, 1995). Proficient Problem Solvers A look at the cognitive and metacognitive characteristics of good problem solvers provides a picture of what goals teachers should set for every student. To begin, good problem solvers focus their attention on the problem and persist in considering all relevant information. They assess all the information before making a conclusion, and test or evaluate all their potential solutions before moving on (Kelly & Mousley, 2001). During the problem solving process, good problem solvers make much greater use of selfexplanations, the practice of 90 PAGE 106 mentally talking through the problem situation, than do poor problem solvers (Reed, 1999). Childrens ability to generate frequent and powerful selfexplanations of a problem situation significantly predicts their success. The best learners take a holistic view of problems, focusing on the structural rather than the surface features of a problem, and are able to think ahead in the problem solving process (Goldman & Hasselbring, 1997; Kramarski, Mevarech, & Arami, 2002; Wong, Lawson, & Reeves, 2002). Less Proficient Problem Solvers In contrast to the thinking processes of proficient problem solvers are the cognitive and metacognitive behaviors of less proficient problem solvers. Lowperforming students often exhibit patterned behaviors when faced with a problemsolving task. They may read too quickly to gain full comprehension, may not reorganize information, may fail to realize that there may be more than one possible path to solving the problem, or they may be uncertain how to calculate or verify their solution and give up (CardelleElawar, 1995). Students who suffer from metacognitive deficits are less likely to be able to use strategies effectively to represent problems and execute solutions. This deficit impedes progress on tasks requiring considerable strategic activity such as mathematical problem solving (Montague, 1997). Less proficient problem solvers often do not see a task as a whole, but instead focus on only parts of the task. They often score low in working memory tasks and have difficulty reorganizing information and distinguishing between relevant and irrelevant information (Passolunghi & Siegel, 2001; Kramarski, Mevarech, & Arami, 2002). 91 PAGE 107 Language and cognitive skills are interdependent, with the result that students with language difficulties often have trouble with memory, attention, selection of appropriate strategies, and generalization of learning to new tasks. The metacognitive skills necessary for successful problem solving, such as selfregulation, awareness of ones own knowledge, and the use of cognitive strategies, are often lacking in students with languagebased difficulties (Gurganus & Del Mastro, 1998; Landi, 2001; Stevens & Zaharias, 1997). These students with disabilities seem to be aided by instruction in processing the problem schemata so they can recognize and represent the situation described in a problem (Jitendra & Xin, 1997). Cognitive strategy instruction, which addresses students reading comprehension and problemsolving deficiencies, seems to be a promising approach to helping students with learning disabilities apply their skills to become better problem solvers. Specific cognitive strategies include visualization, verbal rehearsal, paraphrasing, summarizing, and estimating. In order to benefit from this instruction, students must possess adequate reading and mathematical computation skills (Montague, 1997). For students who have a strong repertoire of problemsolving strategies but use them ineffectively, instruction in metacognitive strategies such as selfinstruction, selfmonitoring, or selfassessment may improve their ability to solve mathematical problems (CardelleElawar, 1995; Montague, 1997). 92 PAGE 108 Highlighted Research Findings in SelfRegulated Learning Selfregulated learning has been defined differently by different researchers. For the purposes of this study, selfregulated learning is defined as students being metacognitively, motivationally, and behaviorally active participants in their own learning (Zimmerman, 1990, p. 4). Studies in various domains have determined that metacognitive knowledge is a good predictor of students strategy use. However, very few studies have focused on the content area of mathematics, and only a small portion of those have focused on young elementaryaged children. These studies have found that teaching metacognitive knowledge about strategies to young children has the potential to improve their mathematical performance. Even high school students continue to benefit from this training because they often fail to monitor their own progress in problem solving (Braten, 1998). CardelleElawar (1994) conducted a study to determine the effect of metacognitive instruction on lowachieving thirdthrough eighthgrade students mathematics achievement and attitudes. The study results indicate that metacognitive training in selfmonitoring and selfregulation was effective in facilitating the problem solving processes of lowachieving students. The process of selfinquiry developed in the treatment group seemed to build students understanding by helping them reflect on the processes required to solve problems instead of being passive followers of procedure. Malpass (1996) investigated the effects of selfregulated learning, selfefficacy, goal orientation, and worry on highstakes mathematics tests. Students were tested on these four 93 PAGE 109 variables using a questionnaire after they completed the Advanced Placement calculus exam. The results of the study indicate that learning goal orientation was moderately related to selfregulated learning, and that selfregulated learning positively, but marginally, affected mathematics achievement. Regarding selfefficacy, Malpass found that it mediated the relationship between prior and current mathematics achievement, and was related to selfregulation. The relatively small sample, the fact that most participants were AsianAmerican, and the fact that the mathematics achievement was measured in a highstakes environment, may have compressed the correlations and resulted in the appearance of a weak relationship between selfregulation and achievement. Fuchs et al. (2003) hypothesized that helping students to selfregulate would have a positive impact on their problem solving abilities. They noted that previous research in the area of mathematics has focused primarily on computational skills. With the reform emphasis on problem solving and the fact that complex problem solving requires metacognition and perseverance, they chose to focus their research efforts on problem solving rather than computation. They randomly assigned thirdgrade teachers to one of three conditions: control (teacherdesigned instruction), transfer (including teaching rules for problem solving, teaching for transfer, and cumulative review), and transferplusSRL. In the transferplusSRL treatment, students scored and tracked their own work, were encouraged each day to try to outscore their work on the previous day, and were periodically invited to share with the class how they were able to transfer their learning in mathematics to other situations. Their results indicated that 94 PAGE 110 improvement of transfer varied as a function of condition, with the transferplusSRL group scoring the highest, followed by the transfer group and the control group. On measures of selfregulation, a significant effect for treatment was found for all measures. On the questions assessing students perception of their learning, effort, and improvement, the transferplusSRL group outscored the transfer group. These results indicate that mathematical problem solving may be strengthened by explicit transfer instruction. Also, instruction designed specifically to increase students selfregulatory behaviors promotes both selfregulatory processes and learning. Summary of SelfRegulated Learning The results of the highlighted studies on selfregulated learning are summarized on Table 3. With all that is known about selfregulatory learning in a general sense, it appears that efforts to determine how to encourage and support this behavior are warranted. In the field of mathematics, with the current focus on problem solving, these efforts may further help teachers guide students to becoming independent learners. Whether we ascribe to Winnes (1995) notion of the solitary nature of learning, or to Pressley (1995) and Alexanders (1995) view of the social nature of even seemingly solitary work, the impact of students ability to regulate their own learning should not be underestimated (CardelleElawar, 1994; Fuchs et al., 2003; Malpass, 1996). As both selfefficacy and selfassessment play a vital role in selfregulation and are key elements of this research, an indepth review of both topics follows, with special emphasis on their relationship to mathematics. 95 PAGE 111 Table 3 Results of SelfRegulation in Mathematics Studies Author/Date Participants Results SelfEfficacy CardelleElawar, 1994 Fuchs et al., 2003 Malpass, 1996 489 3 r d 8 th grade students 395 3 rd grade students 144 gifted high school students Training in selfmonitoring and selfregulation improved the performance of low students. In the treatment that included SRL, students were better able to transfer learning to new problem solving situations compared to the treatment without SRL. SRL positively affected mathematics achievement. Selfefficacy is highly and negatively related to worry. SelfEfficacy Selfefficacy is defined as personal beliefs about ones capabilities to learn or perform specific behaviors and skillful actions. Researchers hypothesize that feelings of selfefficacy influence choice of activities, effort, persistence, and achievement. Compared with students who are not confident about their learning capabilities, those with high selfefficacy for accomplishing a task work harder, participate more readily, persist longer when difficulties are encountered, and achieve at a higher level (Bandura, 1986; Pintrich & Schrauben, 1992). Students who feel efficacious about learning are more persistent and are more likely to select effective strategies (Zimmerman, 1989). These relationships also hold when reversed, as proficient students tend to possess high selfefficacy for accomplishing academic tasks and believe they have the capacity to learn and solve problems at designated levels by exerting taskappropriate strategies (Bandura, 1986; Braten, 1998) 96 PAGE 112 Mathematics selfefficacy is distinguished from other measures of mathematics attitudes in that it measures an individuals confidence in her/his ability to perform a specific task successfully, as opposed to a general feeling of confidence in the domain of mathematics (Hackett & Betz, 1989). Selfefficacy is contextdependent, depends on a mastery criterion of success rather than normative criteria, and is assessed prior to actual performance, so it can play a causal role in academic functioning (Hanlon & Schneider, 1999). Bandura (1997) believes that selfefficacy expectations are a major determinant of whether a student will attempt a particular task, how much effort he/she will be willing to expend, and how much persistence will be displayed in tackling the task in the face of obstacles. These beliefs of selfefficacy mediate the influence of other determinants of academic outcomes such as innate ability. Students confidence in their own capabilities impact what they do with the knowledge and skills that they possess. Therefore, cognitive theorists posit that students academic performance is governed in large part by the confidence they exhibit when approaching academic tasks (Pajares & Miller, 1997). In later writing, Bandura (1986) cautioned that because selfefficacy judgments are domain and context specific, measures of selfefficacy should be specifically tailored to the domain and task being analyzed. He also noted that selfefficacy judgments should ideally be measured immediately before engagement with the task being assessed. The relation between efficacy beliefs and action is revealed most accurately when they are measured in close temporal proximity. The closer in time, the better the test of causation (Bandura, 97 PAGE 113 1997, p. 67). Bandura (1986) also argued that the most functional efficacy judgments are those that slightly exceed what a student can actually accomplish, and that excessive overconfidence can result in serious, irreparable harm (p. 394). SelfEfficacy Measurement and Relationship to Performance Selfefficacy research in the field of education has focused on two primary areas: the relationship between selfefficacy beliefs and academic motivation and achievement, and the link between selfefficacy and college major or career choices (Pajares & Miller, 1995). When selfefficacy was yet a young field of study, Dowling (1978) created a selfefficacy scale to measure the correspondence between confidence and performance. Her taskspecific Mathematics Confidence Scale (MCS) allowed students first to provide judgments about their ability to solve math problems, then later to complete an alternateforms test of the problems on which their confidence was assessed. Betz and Hackett (1983) then developed the Mathematics SelfEfficacy Scale (MSES) incorporating a measure of taskspecific efficacy similar to Dowlings, with the addition of two scales. The first scale measures students feelings of selfefficacy about performing mathrelated tasks, such as figuring income tax or the amount of wood needed to complete a project. The second scale measures students confidence in their ability to earn an A or B in certain mathrelated courses. The MSES has been used in studies of both academic performance and career choice, but the correlations between the composite MSES score and actual performance were lower than the researchers anticipated (Hackett & Betz, 98 PAGE 114 1989). The study also found that sex differences in mathematical selfefficacy were correlated with sex differences in mathematical performance. However, they found no support for their hypothesis that womens selfefficacy expectations were unrealistically low when compared to their actual performance. In contrast to the findings of the MSES study, Pajares and Miller (1994) obtained a correlation of .70 when they compared scores on a form of Dowlings (1978) MCS with performance scores on the same test on which selfefficacy was measured. They also found in a path analysis that taskspecific selfefficacy had a stronger direct affect on problemsolving performance than an overall measure of mathematics selfconcept, perceived usefulness of mathematics, mathematics background, number of college credits, or gender. Not surprisingly, researchers (Randhawa, Beamer, & Lundberg, 1993) who have used fullscale MSES scores as a means of measuring generalized mathematics selfefficacy, have discovered much weaker relationships between selfefficacy ratings and performance. In Randhawa et al.s study, the criterial task, solving mathematical problems, was only related to one of the three scales of the MSES, and the actual problems on the performance measure were different from those on which students judged their selfefficacy. The issue of concern in Randhawa et al.s study is not the finding of a weak relationship between selfefficacy as measured by MSES and performance, but rather the methodological flaw of attempting to find a correspondence between generalized selfefficacy and performance on a specific set of problems. Bandura (1986) cautioned researchers about the necessity of matching selfefficacy judgments that are by definition taskspecific, 99 PAGE 115 to the actual tasks being assessed. However, his warnings have apparently been disregarded by some who fail to provide a match, either in specificity or content, between the tasks on which selfefficacy is measured and those on which performance is measured. Mathematics selfefficacy is defined as a situational or problemspecific assessment of an individuals [feelings of] confidence in her or his ability to successfully perform or accomplish a particular task or problem (Hackett & Betz, 1989, p. 262). In keeping with this definition, and following Banduras (1986) guidelines for measuring selfefficacy, Pajares and Miller (1995) set out to determine whether students judgments of their ability to solve mathematics problems was more predictive of their ability to solve those problems than either their confidence in solving other mathrelated tasks or succeeding in mathrelated courses. In essence, they hoped to show that the three scales included in the MSES are most highly correlated only with the specific things they measure, and that attempts should not be made to correlate MSES composite scores with performance on specific tasks. In a study of 391 undergraduates, they found that although the three efficacy subscales were related to each other and were significantly related to problemsolving performance, the problem solving selfefficacy measure had a stronger relationship with problemsolving performance than did either of the other two subscales. The results confirm Banduras (1986) assertion that the most theoretically appropriate way to assess selfefficacy is through tasks that require the same or similar skills as those needed for the performance task. 100 PAGE 116 Pajares and Miller (1997) took their previous research to a new level of specificity by examining whether the form of selfefficacy assessment (multiplechoice or openended question format) differentially predicts performance on multiplechoice or openended tests. In all measures of calibration, defined as the degree to which students accurately predict their performance, the format of the selfefficacy instrument did not impact the relationship between selfefficacy and performance. However, for both forms of the selfefficacy instrument, calibration scores were lower for the students who were tested with an openended test. Boys were overall better predictors of their performance than girls, but the interaction effect led to interesting results. Girls with low selfefficacy performed better than boys with low selfefficacy, yet girls with high selfefficacy did not perform as well as boys with high selfefficacy. Bong (2002) sought to confirm Banduras (1997) claim about the necessity of measuring selfefficacy in close temporal proximity to measuring performance and with a high level of specificity. She found that the more specific measures of selfefficacy, those that measured selfefficacy on tasks that were very similar in content and context to the actual performance tasks, were superior to general measures when predicting outcomes. Contrary to the hypothesis, the relationship between selfefficacy beliefs and achievement scores did not differ much by the temporal distance between their assessments. It appears from this study that measuring selfefficacy at an appropriate level of specificity to the target outcome is more important than assessing it closer in time. 101 PAGE 117 A metaanalytic investigation of the relationship between selfefficacy beliefs and academic outcomes helped to solidify the relationship between selfefficacy and performance (Multon, Brown, and Lent, 1991). The 39 studies included in the analysis were published between 1977 and 1988, included measures of selfefficacy and performance, and provided enough information to calculate pooled effect sizes. The results of the analysis support the notion of a positive relationship between selfefficacy and academic performance and persistence. They found a stronger relationship between selfefficacy and performance among lowachieving students as compared to normally achieving students. The strongest relationship between selfefficacy and performance was produced by the most specific assessment: basic skills. The second strongest relationship was found in the second most specific assessment: classroombased performance. The weakest relationship existed with the most general assessment included in the study, that of standardized achievement tests. This finding lends support to Banduras (1986) belief in the situationspecific nature of selfefficacy, but is cause for alarm in regards to students relatively poor ability to predict their performance on standardized tests. This finding may be due to the relatively loose match between selfefficacy and performance in form, content, and timing for the achievement test, but in the current climate of highstakes testing, it certainly deserves further study. SelfEfficacy Versus SelfConcept In order to add to the field of cognitive psychology, selfefficacy must be distinguished from other dimensions such as academic selfconcept. In a 102 PAGE 118 confirmatory factor analysis, Lent, Brown, and Gore (1997) found that although selfefficacy is related to other measures of generalized mathematics selfconcept, it is neither interchangeable with nor subsumed by selfconcept. Their results also found that selfefficacy is a significant predictor of mathrelated performance. Pajares and Miller (1994), and Pietsch, Walker, and Chapman (2003) found that efficacy beliefs have a stronger association with academic performance in mathematics than selfconcept. In fact, in Pietsch et al.s study, the path coefficient from selfefficacy to performance was significant, but the path coefficient from selfconcept to performance was not. In support of Banduras (1997) call for specificity when measuring selfefficacy, the students in the study appear to have considered each problem and their perceived competence to solve it as a unique event. SelfEfficacys Role in SelfRegulation Banduras (1986) triadic theory of social cognition posits that students efforts at selfregulation are determined by three factors: their personal processes, the environment, and their behavior. Students selfregulated learning is not believed to be an absolute state of functioning, but varies based on the academic context, personal efforts to selfregulate, and performance outcomes. The effectiveness of their selfregulatory strategies depends on the development of a sense of selfefficacy for learning and performing well (Schunk, 1995, 1996). Although selfefficacy has been shown to impact students levels of selfmonitoring and their academic motivation and achievement, little work has 103 PAGE 119 been done to relate selfefficacy to students use of these selfregulated learning strategies. Zimmerman and MartinezPons (1990) sought to show that students level of academic efficacy is predictive of their use of triadic selfregulation strategies. They found that boys surpass girls in verbal efficacy but are comparable in measures of mathematical efficacy. Although girls exhibited more planning, goalsetting, record keeping, and monitoring behaviors than boys, overall, both boys and girls perceptions of mathematical and verbal efficacy were correlated with their use of selfregulated learning strategies. A glaring weakness of this study was the lack of performance data used in establishing the accuracy of students efficacy perceptions. Academic selfregulation seems to be independently influenced by selfefficacy at the outset (Bandura, 1997), and by selfevaluation at the conclusion (Laveault, Leblanc, & Leroux, 1999). In a small study of middle school students, Laveault et al. found that although boys and girls performed similarly on mathematics tasks, boys had higher selfefficacy than girls. On completion, girls were more likely to report that the task was more difficult than expected. Students who clearly understood the task were more confident and set higher goals. SelfEfficacy and MathRelated Career Choices Although the relationship between selfefficacy and performance in mathematics is a critical issue, the longterm impact on students choices of career lends further credence to the necessity of studying this phenomenon. 104 PAGE 120 Hackett (1985) proposed that the gender differences in the numbers of male and female students who select mathematicsoriented college majors may be explained by the proven gender differences in mathematics selfefficacy. Cooper and Robinson (1991) found a moderately strong relationship between mathematics selfefficacy and career selfefficacy among college students who had selected mathrelated majors. They also found that, for this group, there were no significant gender differences in mathematics selfefficacy. Unfortunately, this finding does not illuminate the relationship between mathematics selfefficacy and career choice because the participants had already selected mathrelated careers, and therefore all likely had high levels of mathematics selfefficacy. Also, although the fact that nearly eightypercent of the participants were male does highlight the imbalance of females in mathrelated majors, it hinders the generalizability of the researchers conclusions. This study would have been much more effective in drawing conclusions about gender differences in efficacy if it had used a random sample rather than one composed exclusively of students who had already demonstrated their efficacy stances through their career choice. OBrien, Kopala, and MartinezPons (1999) used path analysis to examine the contributions of prior mathematics achievement, ethnic identity, gender, and selfefficacy to students sciencerelated career interests. They found a strong direct link between selfefficacy and career choice, but while ethnic identity and prior achievement impacted selfefficacy, gender did not. The only variable in the study significantly impacted by gender was career interest. This research signals 105 PAGE 121 a need for further work on the influence of mathematics selfefficacy on career choices. Intervention Efforts for Performance and Career Choices Although little is known about the relationship between mathematical selfefficacy, performance, and career choices, even less research has been conducted to explore the effects of intervention on these relationships. Hanlon and Schneider (1999) worked with a small group of high school graduates in an intensive fiveweek summer program aimed to improve the accuracy of students selfefficacy judgments. Throughout the intervention period, students were asked to rate their level of selfefficacy on the current content and were then tested on the content. The intervention did not lead to a reduction in the discrepancy between students math quiz scores and selfefficacy perceptions. However, it is impossible to determine whether the intervention actually proved beneficial because the course material increased in difficulty as the program progressed, possibly making it more difficult for students to assess their own abilities with accuracy. Fouad, Smith, and Enochs (1997) attempted to validate the use of a selfefficacy scale for assessing changes in middleschool students careerrelated selfefficacy, outcome expectancies, and intentions and goals resulting from careerrelated selfefficacy intervention. They found evidence of adequate reliability and validity for use of the instrument in assessing the results of the intervention program, but noted that intervention efforts with minority and female students did not result in significant gains. 106 PAGE 122 Schunk and Gunn (1986) conducted the only published study found that addressed the impact of intervention on the mathematical selfefficacy of elementaryaged students. They provided intensive training in solving division problems to determine the effect of the intervention on childrens selfefficacy and subsequent performance, as well as how attributions affected childrens selfefficacy and skills. They found a strong relationship between the intervention and performance, and between selfefficacy and performance, but not between the intervention and selfefficacy. They also found a negative relationship between effort attributions and selfefficacy, indicating that when students become more confident in their ability to solve a problem, they are less likely to attribute their success to effort. Although the results are interesting, they have limited applicability to the reformoriented classroom because the only mathematics task involved was division computation. Computation skills have received reduced emphasis in the classroom as a result of the recommendations of the National Council of Teachers of Mathematics (1989). In addition, the sample size was too small to prove stable for use in multiple regression, and the researchers classification of algorithmic procedures as strategies is problematic. The term strategies implies a deep level of cognition more often associated with problem solving than the simple application of stepbystep procedures used to solve the computation exercises described in the study. Gender and SelfEfficacy The relationship between gender and selfefficacy has not been as fully explored as the relationship between selfefficacy and mathematical 107 PAGE 123 performance, and the findings of genderrelated research are not consistent (Junge & Dretzke, 1995; Pajares & Miller, 1997). Although some research supports the notion that females are less selfefficacious than males in mathematics content (Laveault et al., 1999), and in relation to mathematics courses (Lent, Brown, & Gore, 1997), other studies have found no significant gender differences (Cooper & Robinson, 1991; Hackett & Betz, 1989; Lundeberg, Fox, & Puncochar, 1994; Zimmerman & MartinezPons, 1990). The only consistent finding is that when gender differences have been detected, males have been more confident than females in their mathematics ability (Laveault et al., 1999; Randhawa et al., 1993). Lundeberg et al.s (1994) study of college psychology majors did support the notion that men are more confident than women in mathematics, but since both men and women were overconfident about the accuracy of their answers, it did not support the notion that women generally suffer from low confidence. Also, women and men answered very differently when estimating their general feelings of confidence versus their confidence in the accuracy of their answers on specific problems. These results confirm the findings of other studies that gender differences in confidence are dependent on the level of specificity and content of the questions being asked (Laveault et al., 1999; Randhawa et al., 1993). In a study of mathematically gifted students, Junge and Dretzke (1995) found statistically significant gender differences even though the selfefficacy ratings of most students were high due to their gifted status. Although the males 108 PAGE 124 exhibited higher overall mathematics selfefficacy, the older students, both males and females, were more selfefficacious than their younger counterparts. Williams (1994) focused her research on understanding gender differences in the discrepancies between selfefficacy and performance across four subject domains: English, mathematics, reading, and science. Her findings concurred with other studies in establishing a positive relationship (explaining 35% of the variance) between selfefficacy across content areas for both genders. Interestingly, the relationship between selfefficacy and performance was stronger for mathematics than any other content area for both males and females. Most student efficacy scores were not congruent to their performance scores, but males efficacy estimations were more accurate than females. SelfEfficacy Summary Table 4 provides a summary of the studies reviewed in this section. The research on selfefficacy related to mathematics confirms the taskspecific nature of the construct (Bong, 2002; Pajares & Miller, 1994; Pajares & Miller, 1995; Randhawa et al., 1993), and establishes selfefficacy as a key element of selfregulation (Bandura, 1997). Mathematics selfefficacy is positively related to academic performance, persistence, and mathrelated career choices (Multon et al., 1991; OBrien et al., 1999), but intervention efforts have thus far not proved successful (Fouad et al., 1997; Hanlon & Schneider, 1999; Schunk & Gunn, 1986). Both males and females tend to be overly confident of their mathematical ability (Junge & Dretzke, 1995; Lundeberg et al., 1994), but males selfefficacy 109 PAGE 125 Table 4 Results of SelfEfficacy Studies Grouped by Age of Participants Author/Date Participants Results Cooper & Robinson, 1991 Hackett & Betz, 1989 Hanlon & Schneider, 1999 Lent, Brown, & Gore, 1997 Lundeberg et al., 1994 Pajares & Miller, 1994 Pajares & Miller, 1995 Bong, 2002 Junge & Dretzke, 1995 Pietsch et al., 2003 291 college mathrelated majors (229 men, 61 women) 262 college students (109 men, 153 women) 17 new college students attending summer program 205 psychology students (54 men, 151 women) 254 college psych majors (73 men, 181 women) 350 undergraduates (121 men, 229 women) 391 undergraduates (144 men, 247 women) 235 high school freshmen (all girl school) 113 gifted grade 912 students (58 boys, 55 girls) 416 high school students Selfefficacy is related to performance and career selfefficacy. No gender differences exist. Moderate correlation between mathematics selfefficacy and performance using MSES. No gender differences in selfefficacy. Intervention in selfefficacy did not prove beneficial. Selfefficacy and selfconcept are unique dimensions. Selfefficacy can predict math performance. Most students were overconfident; women had more accurate perceptions of their incorrect answers than men. Selfefficacy had a stronger direct effect on performance than gender, high school math level, or college credits. Selfefficacy ratings of problems was more highly correlated with problem performance than were selfefficacy ratings on tasks or courses. Used MSES. Studies of Colle g ea g ed Students Specificity of selfefficacy measures more important than temporal proximity. Males and older students were more selfefficacious, though all rated high. Math selfefficacy predicted performance, but selfconcept did not. Studies of Hi g hSchool Students 110 PAGE 126 Table 4 (Continued) Author/Date Participants Results Randhawa et al., 1993 OBrien et al., 1999 Williams, 1994 Fouad et al., 1997 Laveault et al., 1999 Pajares & Miller, 1997 Schunk & Gunn, 1986 Multon et al., 1991 Zimmerman & MartinezPons, 1990 225 high school students (117 male and 108 female) 415 eleventh grade students (221 boys, 194 girls) 131 collegebound 11 th and 12 th grade students (43 male, 88 female) 361 seventh and eighth graders 45 grade six to eight students 327 eighthgrade algebra and prealgebra students (149 boys, 178 girls) 50 upperelementary students (28 boys, 22 girls) Metaanalysis 180 students (60 fifthgraders, 60 eighthgraders, 60 eleventhgraders) Mathematics selfefficacy mediates the relationship between mathematics attitudes and achievement. Boys were more selfefficacious, but did not perform better than girls. Used MSES. Selfefficacy is strongly related to career choices, and ethnic identity is a predictor of selfefficacy. Gender predicts career interest. Selfefficacy was a stronger predictor of math performance than other subjects. Most students did not accurately predict their outcomes. The Middle School SelfEfficacy Scale is valid, but intervention efforts for females and minorities were not successful. Boys had higher selfefficacy than girls; on completion, girls thought the tasks were harder than boys. Performance was equal. The test format, not the selfefficacy format, impacted calibration scores. Boys were overall better predictors of their performance than girls. Intervention efforts and selfefficacy had a positive impact on performance, but the intervention did not increase selfefficacy. Selfefficacy is positively related to performance and persistence. The strength of the relationship varies according to student ability and type of outcome measured. Girls plan, monitor, and goalset more than boys. Boys have higher verbal efficacy but the same math efficacy as girls. Efficacy impacts use of selfregulatory strategies. Studies of Students in Elementar y and Middle Grades Studies of Students Across A g e Grou p s 111 PAGE 127 surpasses that of females (Laveault et al., 1999; Randhawa et al., 1993), and students tend to grow less selfefficacious as they mature (Junge & Dretzke, 1995). For these reasons, continued research efforts to understand the relationships among mathematical selfefficacy, performance, and gender are warranted. SelfAssessment Over a quartercentury ago, Lichtenstein and Fischhoff (1977) served to revive interest in metacognitive processes in general and selfassessment in particular with the question: Do those who know more also know more about how much they know? (p. 159). Students ability to selfassess their knowledge and the skills that need improvement are valuable selfregulatory strategies (van Kraayenoord & Paris, 1997) and provide teachers with useful assessment data (Charles, Lester, & O Daffer, 1987). When students become proficient in monitoring and interpreting their actions, they gain the ability to assess their work across a wider range of cognitive skills with more accuracy (Rosenholtz & Simpson, 1984). To selfassess with accuracy, students must first internalize the standards against which their performance should be judged, and as the accuracy of their assessments increases, their perceptions of ability and efficacy increase (Schunk, 1989). Students who understand their personal strengths and weaknesses and the strategies they can use to enhance their performance are better able to plan and monitor their work (van Kraayenoord & Paris). As a result, selfassessment distinguishes itself as including all three domains of self112 PAGE 128 regulated learning: cognitive, motivational, and affective (Paris & Paris, 2001). The National Council of Teachers of Mathematics (1995) also supports the classroom practice of selfassessment in its Assessment Standards for School Mathematics. According to the Assessment Standards, student selfassessment serves to improve students confidence in their ability to do mathematics, and allows them to become more independent learners of mathematics. The accuracy of students selfassessment attempts is measured primarily by comparing their selfassessment scores to their actual performance. Confidence bias is defined as the systematic error of judgment made by individuals when they assess the correctness of their responses to questions relating to intellectual or perceptual problems (Pallier, 2003). The correspondence between subjective probability (i.e., a personal assessment of accuracy) and the actual result provides a measure of calibration with which to assess confidence bias. When an individuals average confidence score exceeds his or her average performance score, the bias score is positive and the person is said to be overconfident. A negative bias score indicates underconfidence. Several theoretical models have been developed to explain the phenomenon of overconfidence and underconfidence, the most prominent of which are the heuristics and biases approach, the ecological approach, and the individual differences perspective. In brief, proponents of the heuristics and biases approach claim that errors in confidence judgments occur due to errors in thinking or procedure (Kahneman & Tversky, 1996). With the ecological 113 PAGE 129 approach, overconfidence results from the misleading nature of the questions on general knowledge tests that provide misleading cues intended to trick students into choosing an incorrect answer. The individual differences perspective, based on the work of Pallier et al. (2002), suggests that the cause of miscalibration is an independent metacognitive trait that mediates the accuracy of selfassessment. Within this framework, individuals tend to express a consistent confident level, irrespective of their accuracy level. Highlighted Research on SelfAssessment Confidence bias was the focus of Palliers (2002) dual research studies. The participants in the studies selfassessed the accuracy of their responses to questions covering a wide range of cognitive and perceptual abilities. As found in other studies, the confidence factor transcended any single facet of cognitive ability, indicating that the subjects possessed a trait that mediated their ability to evaluate the accuracy of their responses. This confidence trait relied on the individuals cognitive ability and on some aspects of personality to a small extent, and was a major determinant of the accuracy of selfassessment. Unfortunately, because no gender data was reported in these studies, it is impossible to draw any conclusions as to how gender contributes to the confidence factor. Although Halpern and LaMay (2000) concluded from a review of current research that no gender differences in general intelligence exist, they acknowledge the existence of gender differences in performance on specific tasks. Concerned that an overall measure of IQ may fail to identify confidence biases that exist on tasks that involve different cognitive abilities, Pallier (2003) 114 PAGE 130 followed up on his original work with a pair of additional studies aimed at uncovering gender differences in the accuracy of selfassessment. The college men and women in the first study had nearly identical accuracy scores, but the confidence and bias scores indicated significant gender differences. Men were more confident in their performance on both tasks tested, resulting in women being better calibrated on the general knowledge task, and men on the visualization task. In short, gender affected the accuracy of selfassessment. The participants in the second study were adults of all ages, and the results were similar to the first study. The men exhibited higher confidence than women, and their confidence levels increased as they aged. As a result, women were better calibrated than men, but the accuracy of all participants selfassessment declined with age. Hassmen & Hunt (1994) did not find any significant gender differences in the accuracy of students selfassessment or in their level of confidence, but found that females benefited from being asked to assess their own work. Men outscored women on the test in both the control and treatment groups, but the score gap decreased by a significant amount in the treatment condition where students were asked to assess their answers. A followup study conducted by Koivula, Hassmen, and Hunt (2001) found similar results, but no interaction between treatment and gender. As in the earlier study, men outscored women on the quantitative test, but all students benefited from the condition of selfassessing their test question responses. These findings together support the 115 PAGE 131 notion that selfassessment plays a vital role in selfregulation and is related to improved performance. Wright and Houck (1995) studied high school students to examine possible gender differences in the accuracy of selfestimates of academic ability. On average, males tended to hold higher selfassessments of their verbal and mathematical ability compared to females. However, contrary to other research findings, both males and females underestimated their performance on the verbal and mathematics tests. One interesting finding for males was that their own selfassessment was more highly correlated with their test scores than either the teachers assessment or the students GPAs. The biggest threat to validity in this study involves the measurement of students selfassessment, and may explain why the results were not in line with other research. The students were not asked to selfassess their test performance until two weeks after the tests were administered. It is possible that the time delay resulted in the students selfassessing their general verbal and mathematics ability more so than their performance on the test itself. The time delay may also be responsible for the finding that males underestimated their performance, because other similar studies found the opposite relationship. Ideally, the selfassessment should have either been embedded in the test or administered immediately after the test. In a study of upper elementary aged students, van Kraayenoord and Paris (1997) found that girls were more accurate in providing selfassessments of their work across content areas. Vermeer, Seegers, and Boekaerts (1994) worked with the same age group, but obtained students measures of selfassessment at 116 PAGE 132 three stages of the mathematical problem solving process: orientation, execution, and verification. They found that the gender differences in confidence ratings were dependent on the type of problem being assessed. For an algorithmic problem, girls and boys were equally confident at the orientation and verification stages, but girls were more confident than boys during the execution stage. For word problems, however, boys were more confident than girls at all stages. Among the students who were successful on the word problems, girls showed less confidence in their work than boys until the verification stage where their ratings were similar. Summary of SelfAssessment Most of the selfassessment research summarized in Table 5 points to overconfidence in males and better calibration in females, with both genders showing better performance as a result of assessing their own work (Hassmen & Hunt, 1994; Kiovula et al, 2001; Pallier, 2003; Vermeer et al., 1994; Wright & Houck, 1995). Selfassessment, as a selfregulatory strategy, would appear to benefit learners of all ages (Hassmen & Hunt; Kiovula et al). Because development of selfregulated learners is a goal of education, the literature appears remiss in working to understand the possible relationships among selfassessment, performance, and gender in children under high school age. A majority of the research in this area deals with collegeage students (Hassmen & Hunt; Kiovula et al; Pallier et al., 2002; Pallier, 2003), a population that is easy to access for research purposes, but is nearing the end of formal education. Research on this age group, although important for adding to the field of 117 PAGE 133 Table 5 Results of SelfAssessment Studies Author/Date Participants Results Hassmen & Hunt, 1994 Koivula et al., 2001 Pallier et al., 2002 Pallier, 2003 van Kraayenoord & Paris, 1997 Vermeer et al., 1994 Wright & Houck, 1995 120 undergraduate students 574 students: 494 high school, and 80 firstyear college Study 1: 520 Air Force recruits. Study 2: 107 undergrad psychology students Study 1: 185 undergrad psychology students. Study 2: 303 adults ages 1780. 93 students in grades 3, 4, & 5 51 students 1112 years old 222 students in grades 911 Females benefited from being asked to assess the accuracy of their responses. Both males and females benefited from being asked to assess the accuracy of their responses on a quantitative test. Study 1: Confidence is an independent factor, and transcends type of cognitive ability. Study 2: Confidence plays a major role in the accuracy of selfassessment. Study 1: Men assessed themselves higher than women, but gender differences in calibration were task dependent. Study 2: Women were less confident, but more accurate in selfassessment than men. Girls provided more accurate selfassessments than boys, across content areas. In problem solving, boys were more confident of their work than girls at all stages of the process. Males had higher selfassessments than females, but both genders underestimated their performance on the verbal and mathematics tests. knowledge, has less potential to effect changes in pedagogy that may help develop selfregulated learners than does research with younger children. Also, only one of the two studies conducted with elementary aged children was specific to mathematics (Vermeer et al., 1994), and the strong gender differences found in students selfassessment at that age, together with the evidence that 118 PAGE 134 students ability to selfassess their work varies by task, certainly indicates the need for further study. Written Communication in Problem Solving Traditional school mathematics has emphasized the receptive aspects of language; requiring students to attend to the way the teacher and the text communicate information. To empower students mathematically requires that they be invited into the conversation and allowed to express their own conceptions and understandings both verbally and in writing (Miller, 1993). Both written and oral language skills support childrens problem solving processes (Kroll & Halaby, 1997), and an open channel of communication with and between students also promotes reading comprehension (Fuentes, 1998). Students need a multitude of opportunities to communicate about mathematics in order to write about it effectively (Fortescue, 1994), and having students justify their answers, think aloud, and consider different possibilities for solving problems provides these opportunities (Fuentes, 1998). Writing in mathematics has been receiving increased emphasis in recent years due to its perceived impact on learning (Johnson, Jones, Thornton, Langrall, & Rous, 1998). Writing is fast becoming a major component of teaching and learning as well as a way to assess students understanding. It is being successfully used in problem solving instruction through the writing and solving of problems, writing about how to solve a problem, comparing and contrasting 119 PAGE 135 various approaches to solving a problem, and preparing reports or plans for more involved problem situations (Pugalee, 1998). Writing and problem solving are fundamentally linked (Pugalee, 1998). First, writing forces the child to condense inner speech so that it is understandable, structured, and able to be communicated (Vygotsky, 1987). Second, problem solving and writing processes are by their nature recursive rather than linear. In problem solving, as in writing, the student revisits previous decisions to determine how to move forward. Writing during the problem solving process encourages recursive behavior within the planning, evaluating, revising, and editing stages of the composing process (Pugalee, 1998). Challenges and Benefits of Written Communication Writing about mathematical ideas is even more difficult than reading about them. Writing requires that students use language to explore partlyformed understandings and refine their ideas. Though it is difficult, writing about mathematics is an invaluable tool for helping students explore concepts, engage in highlevel thinking, and actively construct meaning (MacGregor, 1990). The notion that children learn to write by writing has been adopted by many in the field of mathematics and taken to mean that children will learn to write mathematics by writing mathematics. However, without the proper guidance, modeling, and scaffolding, the result of requiring students to write about their problem solving will likely be trivial responses that lack mathematical substance (Pengelly, 1990). When students are properly supported and routinely required 120 PAGE 136 to communicate their mathematical understanding in writing, they become more proficient in doing so (Miller, 1993). One of the benefits of using writing during problem solving is that when students write about all the steps, procedures, and processes involved in solving a problem, they are more able to understand the steps than when they do not write about their problem solving. The result is a greater likelihood that the student will reflect on, evaluate, and modify each of the steps (Kenyon, 1989; Kroll & Halaby, 1997). Writing also helps students become actively involved in the problem solving process, providing a space for them to reason and understand the problem situation as well as clarify and justify their approach (Goldsby & Cozza, 2002; Pugalee, 1998). It helps students focus on their understanding of the nature of problems and what approaches are possible rather than focusing on algorithms and facts. Finally, writing provides an avenue for bringing students everyday world and problem solving together in a meaningful context (Hildebrand, Ludeman, & Mullin, 1999), a connection encouraged by NCTM (2000). Improved conceptual understanding is another perceived benefit of using writing to communicate about mathematical processes (Quinn & Wilson, 1997). Writing about problem solving helps the writer integrate the mathematical concepts into his/her own thought processes (JohnSteiner, 1989). Writing involves processes necessary to mathematics learning that otherwise may not be engaged (Johnson et al., 1998), and helps students create meaning in problem solving (Pugalee, 1997). Writing about mathematical concepts or solutions to 121 PAGE 137 problems allows students to refine their thinking about important mathematical concepts through the critical examination of concepts that are required during the writing process (Woodward, Monroe, & Baxter, 2001). The writing process allows students to communicate, clarify, organize, formalize and reflect on their mathematical thinking, increasing their recall and helping them gain a better understanding of mathematical relationships (Burns, 1995; Drake & Amspaugh, 1994; Fortescue, 1994; Johnson et al., 1998). Metacognition in Writing When students are required to write about a problem they are forced to clarify their thoughts (NCTM, 2000), making the writing process an integral part of the thinking process (Kenyon, 1989). Students engaged in writing about their problem solving processes must critically reflect on their thought processes from the beginning to the end, helping them clarify their thoughts and reflect on their ideas and what they have learned (Kroll & Halaby, 1997; Krulik & Rudnick, 1994; Powell, 1997). The process of writing aids students in the development of their thinking skills in mathematics, helping them to see reflection and synthesis as a normal part of the problem solving process (Muth, 1997; Pugalee, 1997). While students work through the writing process, they are practicing the very behaviors that produce independent learners who take responsibility for their own learning, namely monitoring and reflecting (Fuentes, 1998; Pugalee, 1998). By making students more aware of their own thought processes, writing promotes a clear understanding of conceptual relationships and an enhanced personal ownership of knowledge (Shepard, 1993). The ability of the writing 122 PAGE 138 process to place the learner at the center of his/her own learning makes writing a powerful facilitator of learning anything that involves language (Powell, 1997). Through writing, students improve their reasoning and problem solving abilities while learning to analyze and reflect on their own thinking (Ciochine & Polivka, 1997; Hartman, 1994). Writing enables students to reframe information in their own words, deepen their understanding of a problem situation, and assist them in ordering their thoughts (Cook, 1995), conveying their own understanding, and analyzing their own errors (Hartman, 1994). Writing as Assessment Data Writing makes childrens thinking visible (Hartman, 1994), providing the teacher with a window into the students understanding of the process (Cook, 1995). However, caution must be taken when using students work to assess understanding because written computational work often does more to show what the child cannot do than what she/he can do (LeBlanc, Proudfit, & Putt, 1980). To assess students conceptual understanding, the teacher must have access to what students were thinking as they worked through the problem. Asking students to explain how they know their answer is correct will elicit much more linguistic information about students thinking than simply asking students to show their work. When properly facilitated, childrens writing in problem solving can also reveal misconceptions and gaps in understanding that may not be exposed through traditional assessments (Krulik & Rudnick, 1994; MacGregor, 1990). When students are only required to produce correct answers, teachers have no 123 PAGE 139 way to differentiate between a student who happens on the correct answer without understanding the problem and a student who fully understands the problem but simply made a computational error (Goldsby & Cozza, 2002). Some students can produce accurate computational work, but are not able to articulate their processes in writing because they do not truly understand but are merely mimicking the process demonstrated by the teacher (Dusterhoff, 1995). When students are required to justify their processes, their writing can help teachers discern their level of understanding as well as the reasoning behind the mistakes they made during the problem solving process (Drake & Amspaugh, 1994; Powell, 1997). Teachers who are equipped with this level of assessment about their students understanding will be prepared to make informed decisions when planning subsequent instruction (Burns, 1995; Dusterhoff, 1995). When using students writing about problem solving for assessment purposes, the key items to evaluate are how well the student understood the problem; if the student used productive strategies to solve the problem; if the solution was verified; and whether the composition was organized (Ciochine & Polivka, 1997). As much as writing in the mathematics classroom promises a multitude of benefits, it can also present monumental challenges to students with learning disabilities (Woodward, Monroe, & Baxter, 2001) and for whom English is not their native language (Kang & Pham, 1995). Care must be taken when using student writing for the assessment and evaluation of students with writing disabilities. These students are often unable to communicate their thought process in writing, have trouble using written language as an organizational tool, 124 PAGE 140 and may avoid writing altogether with the result that their teachers view them as less capable than they really are (Gurganus & Del Mastro, 1998; Peressini & Bassett, 1996). Highlighted Research on Writing and Problem Solving The literature is replete with articles discussing the benefits of writing, methods for incorporating writing in the mathematics classroom, and encouragement for mathematics teachers to begin making writing a key element of their instructional practice. However, due to the relative dearth of research on writing in problem solving (Johnson et al., 1998), the burden is on teachers to draw their own conclusions based on their practice. The few studies that have been conducted do provide support for the integration of writing into mathematical problem solving. In one such study, Pugalee (2001) employed qualitative methods to determine the extent to which students linguistic descriptions of their problem solving methods show evidence of metacognitive behaviors. An analysis of the childrens writing revealed obvious metacognitive frameworks in the orientation, execution, and verification phases of problem solving. Johnson et al. (1998) studied both writing to learn and learning to write in the context of probability. Using multiple measures, the researchers analyzed the changes in the level of students thinking and writing about problem situations involving probability that were brought about by the intervention of ten 45minute writing sessions held over five weeks. During the sessions, students responded to a problem prompt in their journals, discussed the problem as a class, worked 125 PAGE 141 in groups to solve the problem, then shared their thinking back in the wholeclass format. The researchers found a significant positive change in students probabilistic thinking and cognitive level in writing from pretest to posttest. Further analysis showed that the students who made gains in both the content of probability and their writing proficiency benefited from having to write justifications for their solutions and explanations for their drawings. Although the sample sizes in the Pugalee (2001) and Johnson et al. (1998) studies were small, the results do support reform efforts to increase writing in mathematics and raise questions about the potential for writing to serve as a support for the metacognitive behaviors considered important for problem solving. The results from Rudnitsky, Etheredge, Freeman, and Gilberts (1995), largescale writing study indicate that students in the treatment group who engaged in writing their own problems performed better on the posttest than either the control group or the group that strictly did problem solving without problem writing. In addition, their overall superiority actually increased on the retention test. Noticeably absent from the literature are any studies that examine how including linguistic descriptions of problem solving processes as a scoring criterion in highstakes assessment differentially impacts students. Also, does requiring linguistic explanation truly differentiate between students who chance upon a correct answer and those who understand the mathematical process, or is it more a measure of writing ability? If the former is true, then the validity of the measure is questionable. The former is certainly the goal, but if the latter is the 126 PAGE 142 case, then holistic rubric scores that include linguistic explanation as a criterion may be misleading. Gender Differences in Writing Mathematical reform efforts and research evidence point to the need for the integration of writing into mathematical problem solving and assessment. As changes are made, especially those with highstakes ramifications, the possibility that gender differences in writing may preference some students over others must be considered. Several studies point to a distinct female advantage in writing that is consistent across grade levels. Gambell and Hunter (1999) found that male students in elementary, middle, and high school were on average significantly weaker in all writing skills compared to females. In addition, female students had more positive attitudes toward writing. The issue of attitude is an important one. According to Banduras (1986) social cognitive theory, childrens judgments of what they can accomplish are important arbiters of their academic accomplishment. This is, in part, because these judgments of confidence, or selfefficacy beliefs, are said to act as mediators between other influences such as skill and previous performance that children bring to an activity, and their subsequent performance. Knudsons (1995) study bears this out in her finding that elementary students who had a more positive attitude toward writing were more likely to be aboveaverage writers. She also found a significant female advantage in writing ability. Pajares, Miller, and Johnson (1999) also found that the selfefficacy beliefs of upper elementary school students were good 127 PAGE 143 predictors of their writing performance, even when writing aptitude was controlled. Although there were no significant gender differences in selfefficacy beliefs after controlling for writing aptitude, girls writing performance exceeded that of boys. Gabrielson, Gordon, and Engelhard (1995) found that gender had a significant effect on high school students writing content, style, conventions, and sentence formation, with girls outscoring boys on all measures. Gormleys (1993) qualitative study of students journal writing found gender differences in the style of writing among sixth grade students, though the study did not seek to make any judgments as to the superiority of different styles. Malecki and Jewell (2003) found that girls outscored boys across grade levels on all productionindependent indices (total words written, words spelled correctly, and correct writing sequences), as well as the productionindependent indices (percent of words spelled correctly, and percent of correct writing sequences). This finding indicates that girls may have an advantage over boys in assessments that use holistic scoring since they tend to write more text, more accurately, than boys. In a study of childrens perceptions of boys and girls writing competencies, Peterson (2000) found that grade four students did not favor one gender over the other, but eighth graders perceived girls to be better writers. The teachers involved in the study characterized girls as more competent writers than boys at both grade levels. The results regarding teachers mirror those found in a previous study (Peterson, 1998), with teachers characterizing girls writing as more sophisticated, detailed, organized, and developed. 128 PAGE 144 Peterson and Bainbridge (1999) also studied teacher perceptions and gendered expectations and found that teachers attempted to ignore the influence of gender perceptions in their blind assessment of students narrative writing. Despite their efforts to ignore gender, they still constructed the writers gender while reading the narratives, allowing for the perpetuation of gender stereotypes and the inequalities they create. Summary of Writing The results of the research reviewed in this section are summarized in Table 6. In brief, the practice of having students write their own problems and justifications for their answers to problems appears to improve their problem solving proficiency (Johnson et al., 1998; Pugalee, 2001; Rudnitsky et al., 1995). Girls and boys differ in their writing styles (Gormley, 1993), with girls scoring higher than boys on measures of writing skill (Gabrielson et al., 1995; Gambell & Hunter, 1999; Knudson, 1995; Malecki & Jewell, 2003; Pajares et al., 1999). Also, both teachers and students have fairly consistent perceptions about girls superior writing ability (Peterson, 1998; Peterson, 2000; Peterson & Bainbridge, 1999). With the expanding role of writing in problem solving, and the advent of holistic scoring rubrics for highstakes assessments that include linguistic explanation of problem solving processes as a criterion, these findings related to gender are cause for concern and may signal the future reversal of boys current superiority over girls in highlevel mathematics. 129 PAGE 145 Table 6 Results of Writing Studies Grouped by Topic Author/Date Participants Results Johnson et al., 1998 Pugalee, 2001 Rudnitsky et al, 1995 Gabrielson et al., 1995 Gambell & Hunter, 1999 Gormley, 1993 Knudson, 1995 Malecki & Jewell, 2003 Pajares et al., 1999 Peterson, 1998 Peterson, 2000 Peterson & Bainbridge, 1999 24 5 th grade students 20 9 th graders 401 3 rd graders 34,200 11 th grade students 3,214 students in grades 5, 8, & 11 36 sixth graders 430 students in grades K6 946 1 st 8 th graders 363 3 rd 4 th & 5 th graders 174 6 th grade teachers 386 4 th and 8 th grade students, and their 12 teachers 96 teachers Students benefited mathematically from having to write justifications for their answers. Childrens writing about problem solving showed evidence of metacognitive behaviors. Students who engaged in writing their own problems performed better on a problem solving test. Gender had a significant effect on all measures of writing ability with girls outscoring boys. Females outscored males in all writing skills and had more positive attitudes towards writing. Girls and boys writing styles are distinctly different. Girls outscored boys in writing, and students with better attitudes towards writing were more likely to be aboveaverage writers. Girls outscored boys on all measures of fluency and accuracy. Selfefficacy beliefs about writing were good predictors of writing performance, even when writing aptitude was controlled. Girls writing performance outscored boys. Teachers judged girls writing as superior to boys in detail, organization, and development. Fourth graders do not perceive either boys or girls to be better writers, but eighth graders perceive girls to be better writers. Teachers see girls as better writers than boys. Teachers were unsuccessful in ignoring gender perceptions when grading narrative writing pieces. 130 PAGE 146 How the Review of Literature Informed this Study From the review of research on the subject of gender differences in mathematics, it becomes clear that although studies have not consistently detected significant differences in students achievement at all grade levels, there are complex differences in how boys and girls respond cognitively and affectively to different types of questions and use different strategies to arrive at their solutions. This research sought to add to the body of knowledge about how boys and girls choice of strategies to solve mathematical problems differs. There is extensive research that verifies the connection between childrens reading comprehension and mathematical performance, but a majority of that research focuses on children with reading disabilities and how those disabilities can best be accommodated in testing situations. Because of the current focus on problem solving as the heart of all mathematics instruction, and also because problem solving is dependent on textual communication, further research that addresses the link between reading ability and problemsolving performance is still needed. Gender differences in reading ability complicate this relationship and were a topic of interest in this research. Selfregulated learning is linked to strong performance in mathematics, and selfefficacy, as an element of selfregulation, mediates the relationship between students attitudes about mathematics and their mathematical achievement across age ranges. Calibration, a measure of the accuracy of students selfefficacy beliefs, is an efficient means of observing whether 131 PAGE 147 students level of efficacy exceeds or falls short of their actual performance and was a variable of interest in this study. Research into selfassessment shows that regardless of the accuracy of students selfassessments, the simple act of assessing their own work improved their performance. Selfassessment has not yet gained much attention in the literature, but this finding, coupled with other findings that indicate a tendency for males to be more confident than females in their selfassessments, points to a potentially fruitful area for research. This research examined gender differences in the accuracy of childrens selfassessments on a mathematical problemsolving test. As problem solving is becoming more prevalent in highstakes testing, and students are required more often to provide linguistic explanations of their problemsolving processes, writing in the context of problem solving becomes an increasingly important area for research. Writing, as a general area of research, has historically received scant attention. Moreover, writing as it relates to mathematics has received even less attention. Those who have journeyed into the field found that students benefited from writing their own problems and writing justifications for their answers. This research sought to examine how boys and girls differentially respond to the invitation to provide linguistic explanations of their problemsolving processes. 132 PAGE 148 CHAPTER 3 METHODS The purpose of this study was to examine how gender, reading ability, and mathematics ability differentially moderated students mathematical problemsolving processes, linguistic explanations of those solution processes, achievement on a mathematical problemsolving test, selfefficacy on a selfreported rating scale, and selfassessment on a selfreported rating scale. This chapter contains five sections. The first section describes the design of the study. The second section describes the population and sample selection as well as group categorization for the study. The third section includes discussion of the development and validity of the testing instrument. The fourth section discusses the measures taken to ensure the reliability of the data. The fifth section provides specific details concerning data collection. The final section explains the manner in which the data was analyzed and interpreted. Design The intent of this study was to address the following research questions: 1) To what extent are students choices of mathematical problemsolving processes related to gender? 133 PAGE 149 2) To what extent is childrens performance in mathematical problem solving related to gender, reading ability as measured by teacher ratings and Florida Comprehensive Assessment Test (FCAT) reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 3) To what extent is the quality of students linguistic explanations of their problemsolving processes, as measured by a holistic rubric, related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 4) To what extent is the relationship between students feelings of selfefficacy when commencing work on a mathematical problem, as measured by a selfreported rating scale, and their actual performance on a mathematical problem solving test as measured by a holistic scoring rubric related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 5) To what extent is students ability to assess their own performance on mathematical problem solving tasks as measured by a selfreported rating scale related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 134 PAGE 150 The design of this nonexperimental causalcomparative study includes both quantitative and qualitative methods, depending on the question being analyzed. The first research question dealt with the different processes that students chose to solve problems, and was analyzed using qualitative methods coupled with descriptive statistics and chisquare tests. The qualitative analysis was performed using a stance of objectivist grounded theory, allowing the students responses to define the categories used in the analysis. Grounded theory provided the researcher with a framework for coding data to provide some standardization and rigor (Patton, 2002). Objectivist grounded theory: Accepts the positivistic assumption of an external world that can be described, analyzed, explained, and predicated: truth, but with a small tIt assumes that different observers will discover this world and describe it in similar ways (Charmaz, 2000, p. 524). The remaining four questions were analyzed with quantitative methods using descriptive statistics and a series of factorial ANOVAs coupled with multiple regression. The three independent variables for the factorial ANOVAs were gender (male or female), reading ability (high, middle, or low), and mathematics ability (high, middle, or low). The three independent variables for the multiple regression tests were gender (male or female), reading ability (using the students FCAT reading scale scores), and mathematics ability (using the students FCAT mathematics scale scores). The five dependent variables were as follows: 135 PAGE 151 1) ProblemSolving Process Themes: a categorical score for each of the twelve test items that identifies the overarching theme of the process used to solve the problem. 2) Mathematical Performance: a total test performance score as assessed with the FCAT fourpoint holistic rubric (possible scores range from 0 to 48). 3) Linguistic Explanations: a total score for the quality of students linguistic explanations rated using a threepoint rubric (possible scores range from 036). 4) Selfefficacy: a measure of mean bias for selfefficacy (found by the average difference between students selfreported ratings of selfefficacy and their performance scores). Students selfreported selfefficacy responses on the fiveface scale were converted to a numerical score ranging from zero for the saddest face to four for the happiest face. 5) SelfAssessment: a measure of mean bias for selfassessment (found by the average difference between students selfreported scores of selfassessment and their performance scores). Students selfreported selfassessment responses on the fiveface scale were converted to a numerical value ranging from zero for the saddest face to four for the happiest face. 136 PAGE 152 Population and Sample Selection The school district from which the sample was drawn encompasses the entire west central Florida county that has a population of over 270,000. The sample was limited to the east side of the county in an effort to obtain a fairly homogenous sample of students who were relatively successful academically. Due to the number of variables of interest already in the study, the researcher did not wish to add an additional variable of the socioeconomic status of the schools. Also, because the ability of the researcher to perform indepth analysis on student work was dependent on the students ability to produce work that could be analyzed, the researcher chose to limit the study schools to those with a history of high performance on the FCAT test. The study county is experiencing enormous growth, primarily in the previously undeveloped east side, with the result that the socioeconomic status of most east county residents is uppermiddle class. In the older west county areas, very affluent areas abut lowincome areas resulting in a wide breadth of backgrounds in each schools student population. Although the percent of students receiving free or reduced lunch ranges from 4% to 95% countywide, all three schools in the study sample provide free or reduced lunch to 10% or less of their student bodies. The percentage of students with Limited English Proficiency at the three schools ranges from 3.5% to 5.1% (County School Board, 2004). All three schools have an A grade in the state of Florida, and ranked in the top three for the county in both reading and mathematics for two of the three grades tested in the 2003 administration of the Florida Comprehensive Assessment Test (FCAT Results, 137 PAGE 153 2004). School grades range from A to F and are determined based on how well students are doing, how much progress they have made, and how much progress struggling readers are making. All of these areas are measured via the FCAT reading, writing, and mathematics sections (State of Florida Department of Education, 2005). All schools in the county use mathematics textbooks from the State Adopted Mathematics Textbook List that have also been approved by the countys school district and address problem solving in keeping with the guidelines of the Sunshine State Standards (State of Florida Department of Education, 1996). For the year of the study, schools 1 and 2 used Houghton Mifflins curriculum, and school 3 used McMillan McGraw curriculum. In these elementary schools, mathematics is generally taught one hour per day, and teachers are provided with a curriculum guide as well as a textbook to structure their instruction. The decision to test fifthgrade students was made for a variety of reasons. The goal was to study the youngest students possible who would likely provide the data desired. Because one of the research questions was focused on analyzing students linguistic explanations of their problem solving processes, primarygrade students were eliminated from consideration as their writing skills are not well enough developed. In addition, the simplicity of the word problems that would be appropriate for primaryage students would not likely result in the rich and varied approaches to problem solving the researcher hoped to obtain for analysis. Much work has been done with young students in the area of gender 138 PAGE 154 differences in problem solving processes through collection of observational data (Carr et al., 1999; Carr & Davis, 2001; Fennema et al., 1998), but upper elementary students have been largely overlooked as participants in these types of studies. Students in middle and high school were not of particular interest because much work has already been done regarding gender differences in achievement and problem solving (Ai, 2002; Ethington, 1992; Fierros, 1999; Gallagher et al., 2000; Lane, Want, & Magone, 1996; Pomplun & Capps, 1999; Webster & Henriksson, 2000; Zambo & Follman, 1994). Also, because the emphasis of this proposed research is more on understanding the underlying development of gender differences rather than differences in performance that result from developmental causes, younger students were a more appropriate population to study. Gender differences in achievement do not often appear until the middle to high school years, but the roots of those differences must be present in the elementary grades. Although most previous research on selfefficacy and selfassessment used middle school through college students, research indicates that upperelementary students are capable of differentiating their competence across various activity domains (Eccles, Wigfield, Harold, & Blumfield, 1993; Stipek & Gralinski, 1991). By studying fifthgrade students, the data they provided was sufficient for analyzing gender differences in their problem solving processes and linguistic explanations, leading to a better understanding of the formative stages of differences that will later moderate overall achievement. 139 PAGE 155 The general rule in quantitative study is to use the largest sample possible, particularly when using a subgroup analysis. Statistical power, the probability of detecting a significant effect when the effect really does exist in nature, is heavily dependent on sample size (Stevens, 1999). The effect sizes likely to be found in a majority of social sciences research are small (.20) to medium (.50)(Cohen, 1977). At a .05 level of significance (), with a .30 effect size, and eighteen groups (2 levels of gender x 3 levels of reading ability x 3 levels of math ability), the total sample size needed to achieve power of .80 is 468. With an estimated .50 effect size, the sample size needed to achieve power of .80 drops to 189. The three schools that were included in the study had a total fifthgrade population of 396 students. To have adequate power to detect any significant gender differences that existed, all effort was made to ensure a high percentage of participation. If an insufficient number of students parents allowed their children to participate, the study would have been underpowered according to the a priori power analysis, and if no significant effects were obtained, then the possibility would have existed that effects might truly exist in nature, but were not detected because of the low number of participants. The independent variables included in this study were gender, reading ability, and mathematics ability. Gender data was gathered during test implementation based on students selfreport on the test cover page. Each students reading and mathematics ability were categorized according to the following procedure. At the time of test administration, each classroom teacher was provided with a form (Appendix G) on which to rank each participating 140 PAGE 156 students reading and mathematics ability as either high, middle, or low, based on professional judgment. Research evidence shows that teacher judgments of student ability are reliable, accurate, and free of gender bias (Desoete, Roeyers, & Buysse, 2001; Feinberg & Shapiro, 2003; Helwig, Anderson, & Tindal, 2001). However, since the study was conducted in the first half of the school year, the teachers were at a disadvantage in assessing their students. For that reason, students prior years mathematics and reading level scores on the Florida Comprehensive Assessment Test (FCAT) were included as criteria in the classification procedure. For the FCAT test, a score of three, four, or five is considered to be indicative of meeting adequate yearly progress. For this reason, all students who scored a three or above on the FCAT were categorized as either high or middle for this study. A score of 5 indicates very high performance; therefore all students with FCAT scores of 5 were categorized as high for this study, regardless of teacher rating. Students who scored a 4 on the FCAT were categorized as high only if they received a teacher rating of high. Students with middle or low teacher ratings and an FCAT score of 4, together with all students with an FCAT score of 3 were categorized as middle. A level score of 1 or 2 indicates that the student had not met the grade level expectations of the Sunshine State Standards (State of Florida Department of Education, 1996) and was in danger of failing. There are rare instances when, due to illness or various stresses, a competent student will receive an unsatisfactory FCAT score. A teacher rating of high for a student with a 1 or 2 FCAT score is an indicator of 141 PAGE 157 this type of situation. For this reason, a student who received an FCAT score of 1 or 2 and a teacher rating of high in a content area was categorized as middle for the purposes of this study. Students who scored 1 or 2 and received a middle or low teacher rating were categorized as low for the study. For a complete explanation of the five FCAT achievement levels, refer to Appendix E. Table 7 illustrates how the categorization criteria were applied. Table 7 Categorization of Student Ability FCAT score Teacher rating 5 4 3 2 1 High High High Middle Middle Middle Middle High Middle Middle Low Low Low High Middle Middle Low Low The researcher acknowledges that this categorization process has its limitations and may force students who vary by minor differences into separate ability categories. The process does, however, allow for statistical testing that compares groups of students, allowing for conclusions to be drawn and discussed in language that will hopefully benefit teachers and teacher educators. To offset the limitations of this categorization procedure, the data were also examined continuously through multiple regression. Students gender, along with 142 PAGE 158 their FCAT scale scores (ranging from 100500) for mathematics and reading, were used as the independent variables in further analysis of the data for questions two through five. Statistical Analysis Software (SAS) was used to perform all of the statistical tests. Participant Demographics Participation Levels by School and Gender The population that this study sampled consisted of 396 fifthgrade students from three schools in a westcentral Florida school district. All students in the sixteen fifthgrade classes in the three schools were invited to participate. Of the total population, 286 students, composed of 163 boys (71% of the boys in the population) and 123 girls (73% of the girls in the population), elected to participate in the study, and obtained parental permission to do so. All of the 286 students who obtained parental permission were tested using the test instrument developed for the study (Appendix J). Teacher ratings of the students mathematics and reading ability were obtained for all participating students. As indicated by the teachers, only 3 of the 286 students were classified as ESOL (English for Speakers of Other Languages), signifying that language issues did not play a significant role in the study. None of the ESOL students requested or received any accommodations during testing. Table 8 shows the participation levels of boys and girls by school. The data show that although more boys than girls elected to participate in the study, this is primarily due to the fact that there were more boys than girls in the study 143 PAGE 159 schools. School 3 had the highest participation rate of the three schools at 77.6%, followed by school 1 at 71.2% then school 2 at 67.6%. School 2 is the newest of the three schools and experiences the greatest rate of influx of students new to the area. Table 8 Participation Levels of Boys and Girls by School Boys Girls Boys and Girls Total Participants Total Participants Total Participants School n n % n n % n n % 1 67 49 73.1 43 30 69.8 111 79 71.2 2 77 46 59.7 65 50 76.9 142 96 67.6 3 84 68 81.0 60 43 71.7 143 111 77.6 Total 228 163 71.5 168 123 73.2 396 286 72.2 Treatment of the Data Due to the scheduled changing of classes in one school, two students did not finish the test. Since this represented an anomaly that would likely result in outlying scores, these students were eliminated from the study. Complete prior years FCAT reading and mathematics scores could not be obtained for 45 of the remaining 284 students. All students were tested for this study between the second week of November and the first week of December, and requests for test 144 PAGE 160 scores were made within one week of the completion of testing. Complete FCAT scores were unavailable for three groups of students: those who had not attended a Florida school for fourth grade (no scores were available), those who moved away between the time they were tested and the test scores were requested, and those who attended a different Florida school in fourth grade. For this last group of students, their cumulative folders only contained FCAT level scores and Norm Reference Test scores. The scale scores that were used as an independent variable for all regression analyses were not reported. In order to be considered for inclusion in the analysis, each student had to finish the test, be rated for mathematics and reading ability by her/his teacher, and have FCAT scores for reading and mathematics on record with the school. Only 239 of the 286 participants (131 boys and 108 girls) met all three criteria. The next step in finalizing the sample was to identify all students whose scores were outliers. For the purposes of the ANOVA analysis, an outlier was defined as a student whose score on any dependent variable was more than 2.7 standard deviations away from the group mean. In a normal distribution, less than 4 in 1,000 meet this criteria for outliers (Glass & Hopkins, 1996). A review of the box and whisker plots and descriptive statistics for the sample groups revealed a total of six instances of student scores that met the criteria for outliers for the dependent variables of mathematics performance, linguistic explanation, selfefficacy mean bias, and selfassessment mean bias. Further analysis found that the six outlying scores were attributable to only three students. Student 1 was a boy in the low reading ability group and the 145 PAGE 161 middle math ability group. His scores were outliers for both total mathematics performance and selfefficacy mean bias. Student 2 was a boy in the high ability group for both mathematics and reading. A review of his raw data indicated that he marked the saddest face for selfefficacy and selfassessment for every test item, yet he scored relatively well on all questions. As a result, his selfefficacy and selfassessment mean bias scores were extremely low. Student 3 was the only student among the 286 participants who was categorized in the high mathematics group and the low reading group. His mathematics performance and linguistic explanation scores were extremely high among the low readers, resulting in his scores being outliers for both variables. Because these three students scores were anomalous and would not contribute to an understanding of the population as a whole, their inclusion in the data analysis could lead to spurious results. For that reason, they were eliminated from the analysis. One of the three outliers was already eliminated from inclusion in the analyses due to missing FCAT scores. When the remaining two outlying students were removed from the multiple regression analysis, no other students scores approached a Cooks D value of 1. Cooks D is a measure of how much the regression coefficients would change if a participant was eliminated, and any value greater than one is considered large (Stevens, 1986). This finding confirms the removal of only the outliers revealed by the descriptive statistics. There were three instances of missing data for the students, two for selfefficacy and one for selfassessment. For all three cases, the missing data were estimated using the following process. The first step was to identify all items for 146 PAGE 162 which the student achieved the same mathematics rubric score as the item that had missing data. The average of the students selfefficacy or selfassessment scores for all the identified items was used to estimate the value of the missing score. The only instance of missing data for the teachers was in the ratings of their students preparedness for successfully completing each item. Due to a copying error, one teachers rating sheets only contained the first seven of the twelve test questions. As a result, no ratings were obtained at the time of test administration for the last five test items for this teacher. The purpose of acquiring the data at the time of test administration was to ensure that the teachers assessment of their students preparedness could be matched with their students performance. Due to the time lapse between the test administration and the discovery of the missing data, the researcher determined that omitting the data from the analysis would be a better course of action than acquiring new data that may not be trustworthy. Final Sample Demographics The final sample for the analyses consisted of 237 students (108 girls and 129 boys). A decision was made to use listwise deletion to ensure that the sample size was the same for the entire study so that conclusions could be drawn across research questions and analyses. Of the 284 students who completed the test, a total of 47 students were eliminated from the analysis, 45 for missing data, and 2 for being outliers. More boys than girls were deleted from the sample, but there were more boys than girls in the study schools and in the 147 PAGE 163 original participant pool. The mean scores for performance, linguistic explanations, selfefficacy, and selfassessment for the eliminated students were compared with the scores for the remaining sample. The students who were omitted from the study scored an average of 28.99 points for performance, compared to 26.76 points for the remaining sample. The mean linguistic explanation score for the deleted students was 16.01, compared to 14.2 for the final sample. The deleted students were slightly less selfefficacious with a mean total score of 33.68, compared to the sample mean of 35.73. However, the deleted students assessed their work higher (M=37.39) than the remaining sample (M=36.97). Cohens d (1977) is an effect size measure used to compare the differences in the means of two groups in standard deviation units. The rough guidelines for interpreting the effect size are that a measure of 0.2 is small, a 0.5 effect size is medium and would be apparent to the researcher, and an effect size of 0.8 is large. The effect sizes for the differences in the means between the final study sample and the students omitted from the study were 0.31, 0.25, 0.27, and 0.06 for performance, linguistic explanations, total selfefficacy, and total selfassessment, respectively. All of these effect sizes are in the small to medium range, and do not pose a threat to the validity of this study. As predicted from the categorization of the small sample in the pilot study, the cell sizes were unbalanced. Not surprisingly, no students in the final sample were categorized as having high ability in one area and low ability in the other. A total of 170 students (72%) in the sample fell in the diagonals representing identical categorization for both subject areas. Table 9 illustrates the fact that 148 PAGE 164 there is a disproportionately high number of boys in the high/high cell, and more girls than boys in the low/low cell. Also of note is the fact that only two girls were categorized higher in mathematics than in reading, whereas thirteen boys fell into that category. Conversely, more girls (28) than boys (24) were categorized in a higher reading level than mathematics level. Table 9 Cell Distribution of Sample Students Mathematics Ability High Middle Low Reading Ability n % n % n % Girls (n=108) High 17 15.7 13 12.0 0 0.0 Middle 2 1.9 46 42.6 15 13.9 Low 0 0.0 0 0.0 15 13.9 Boys (n=129) High 31 24.0 12 9.3 0 0.0 Middle 7 5.4 52 40.3 12 9.3 Low 0 0.0 6 4.7 9 7.0 Note. Percentages were computed according to gender. Table 10 shows the mean mathematics and reading FCAT scale scores and levels for the sample. The table presents the data grouped by school, by gender, and for the total sample. The data indicate that the students from school 149 PAGE 165 three had the highest scores for both mathematics and reading, and according to Table 10 this is the same school that had the highest percentage of participation. Table 10 Mean FCAT Scores for Sample Grouped by School and by Gender Mathematics Reading Group Scale Level Scale Level By school School 1 331.3 3.18 342.8 3.50 School 2 323.1 2.97 340.3 3.45 School 3 337.5 3.29 347.8 3.65 By gender Boys 338.6 3.33 342.4 3.52 Girls 322.1 2.94 345.8 3.56 Total 331.1 3.16 344.0 3.54 Note. FCAT scale scores have a range of 200500. FCAT level scores range from 1 to 5. N=237 total, 129 boys, 108 girls, 68 students from school 1, 76 from school 2, and 93 from school 3. School 2 students scored the lowest among the three schools and this school had the lowest percentage of participation. The boys in the sample had higher mathematics scores but lower reading scores than the girls, mirroring the distribution of the sample in the abilitylevel cells. 150 PAGE 166 Instrumentation Description of the Testing Instrument The test that was administered to the participants included 12 constructedresponse mathematical problemsolving items. A copy of the test is located in Appendix H. The student work space shown in the appendix examples is smaller than the work space on the actual test. In the appendix, the size of the work space was reduced in order to accommodate the appendix heading. The cover page included spaces for students to enter their name, teacher, school, and gender. After the test was administered, each student was assigned an alphanumeric code that was entered on the cover page and at the top of the first test page. At that time, the cover page was removed and stored separately from the test documents so that each students identity was protected throughout the data analysis process. The test was formatted such that only one question appeared on each page. In the header area of each page is a small box labeled DO NOT WRITE IN THIS AREA that was used to record students alphanumeric identification code together with their scores on that question. The problem statement was presented at the top of the page, followed by a box designed to obtain students selfreported measure of selfefficacy. After the work space at the bottom of the page was another box designed to obtain students selfassessment score. Measuring SelfEfficacy and SelfAssessment The selfefficacy directions state, Before you begin working, circle the face that best shows how well you think you can solve this problem. The 151 PAGE 167 instructions were followed by a series of five faces ranging from very happy to very sad. The selfassessment directions stated, When you are done working, circle the face that best shows how well you think you did on this problem. These instructions were followed by the same series of five faces. The number of faces used in these measurements was determined through both a review of similar research and results of the pilot study. Boekarts, Seegers, and Vermeer (1995), and Vermeer, Boekaerts, and Seegers (2000) are two studies that measured students confidence during the process of solving individual problems. Their testing was conducted via computer, and students were prompted to select one of five faces to rate their confidence level after reading the problem (selfefficacy), at 40second intervals while solving the problem, and when they were finished solving the problem (selfassessment). By duplicating the method they used to measure selfefficacy and selfassessment, the results of this research add to their findings. Only four faces were used in the pilot study in an effort to keep students from overselecting a neutral rating. However, followup interviews conducted with a small sample of the pilot students revealed some ambiguity in selection of the slightly happy face versus the slightly sad face. Although students explanations for their selection of the happiest and saddest faces were clear and consistent, their explanations for selection of the two middle faces yielded similar and overlapping descriptions. For example, students described their selection of the slightly happy face for the selfefficacy question, as I wasnt sure that I was going to do well or not and I might do well on it, might not. For the slightly sad 152 PAGE 168 face, some of the responses were I thought that I wasnt going to do that well but I had a chance to get it right, and Not too well but not sure. Similar responses were given for the two middle faces with the selfassessment scoring. Adding a fifth face in the middle position with a neutral, straightline mouth, increased the likelihood that students would view the slightly happy face as representing a primarily positive response, and the slightly sad face as representing a primarily negative response. Development of the Test The sources for potential test items were the fourthgrade released items from the 1993, 1996, and 2003 administrations of the National Assessment of Educational Progress (National Center for Education Statistics [NAEP], 2003), and the fifth grade released items from the 2001 FCAT (State of Florida DOE, 2001). Questions from earlier versions of the NAEP test were not considered because they were written before publication of the Curriculum and Evaluation Standards for School Mathematics (NCTM, 1989), and may or may not reflect evidence of mathematics reform. The goal was to select problems from existing released items of tests that had undergone extensive piloting, reliability and validity checks to enhance the validity and reliability of the study test instrument. Because the study participants were fifthgrade students in the first half of the school year, the fourthgrade NAEP questions (generally administered in the second half of fourth grade) were determined to be at an appropriate level of difficulty. Also, because the study sample was children in Florida schools who were learning mathematics under the guidelines of the Sunshine State 153 PAGE 169 Standards, questions from the FCAT (Florida Comprehensive Achievement Test) that were designed to assess students mastery of the state standards were considered for inclusion. The selection process began with the identification of all questions that met the following four criteria: 1. The problem was set in a realworld context. 2. The problem was already formatted as a constructed response item or was structured such that simply removing the multiple choice answers made the problem suitable for constructed response format. 3. No manipulatives or tools were required to solve the problem. 4. The problem lent itself to student explanation of the solution process. The 53 problems that met all four criteria were then categorized as assessing one of the following five content standards: number and operations, algebra, geometry, measurement, and data analysis and probability (NCTM, 2000). Not only did a majority of the selected test questions focus on number and operations (29 out of 53), but also this standard is the dominant focus of elementary mathematics. Number and operations is also a vital part of much of students problemsolving efforts throughout elementary school, therefore students number and operations problemsolving performance in fifth grade is less a matter of specific instruction from a specific teacher than it is a reflection of the students entire mathematics education experience. The standard of geometry would not have been a good choice for this study because student performance in geometry is directly related to specific instruction and may have produced a 154 PAGE 170 teacher effect that would have confounded the results (Crowley, 1990). For these reasons, the list of potential problems was narrowed to include only number and operations questions. The next goal was to select a subset of the remaining 29 problems that would represent a diverse range of topics within the number and operations strand, represent a wide range of difficulty, provide a variety of contexts, yet be small enough that administration of the test could be completed in about one hour. Twelve problems were selected that were representative of the breadth of the number and operations standard, ranged in difficulty from a low of 8% correct to a high of 60% correct based on NAEP and FCAT data, included a range of readability from 2.02 to 3.99, and were fairly well divided among the possible types of gendered labeling. Allowing for an average of five minutes per problem for twelve problems also meant that the length of the test would likely be appropriate. In the pilot administration of the test, the average length of time taken was 33 minutes, with a median of 29 minutes and a range of 16 to 56 minutes. Based on this information, the test length was deemed appropriate. Many of the questions required slight modification. Multiplechoice items were converted to constructed response formats through removal of the answer choices and rewording of the problem text to eliminate reference to the answer choices. All problems were amended to include the statement, Show your work, then explain how you know your answer is correct, following the original problem text. No other changes were made to problem context, numbers, or original wording. 155 PAGE 171 156 Table 11 lists each problem by name, and includes the concept it assesses, the source from which the probl em was taken, the percent of students who originally answered the problem co rrectly, the readability level of the question (Homan et al., 1994) the gendered labeling of the problem (Walsh et al., 1999), and the original fo rmat of the problem before it was modified for this study. Problems involving a male c haracter were considered malelabeled, problems using a female character were femalelabeled, item s including mention of both males and females were consi dered both maleand femalelabeled, and problems with no reference to males or females were considered neutral. The problems are listed on the table in t he order of their placement on the pilot test. The order of the problems was determined by pur poseful placement of a few of the easier problems (according to data of percent correct) at the beginning. In order to increase the likelihood that students would become engaged in the test and persist through the twelve items, it was necessary to purposefully place some of the easier it ems (based on historic al data) at the beginning of the test (D. R. Thompson, personal communication, November 17, 2003). Although random order ing could have been appli ed, the researcher determined that the negative impact on student engagement of potentially having the most difficult items appear at the beginning of the test overshadowed the potential benefits of random ordering. The remaining problems were arranged in PAGE 172 Table 11 Description of Test Questions Selected for the Study Question Name Content Source % Correct Readability a GL b OF c Flower pot Multiplication with interpretation of a graphic NAEP, 1996 50 2.77 M MC Ms. Hernandez Division and interpretation of the remainder NAEP, 1996 39 2.35 F CR Calories Estimation with rounding and addition NAEP, 1992 45 3.61 N CR Jeans class Proportional reasoning and addition NAEP, 2003 34 3.05 B MC Pizza Conceptual understanding of fractions NAEP, 1992 24 2.54 B CR String Division of fractions NAEP, 2003 27 2.70 M MC Balloons Estimation using interpretation of a graphic FCAT, 2001 60 3.61 N CR Flour Multiplication of fractions NAEP, 1992 21 3.01 N MC Calculator Place value NAEP, 1992 20 2.23 F CR School lunch Decimal multiplication and addition NAEP, 1996 17 3.99 M CR Pencils Fraction sense and interpretation of graphic NAEP, 2003 11 2.02 B CR Birdseed Decimal division and proportional reasoning NAEP, 1992 8 2.92 N MC a Readability is a grade level estimate based on Homan, Hewitt, & Linder (1994). b GL = gendered labeling where M=male, F=female, N=neutral, and B=both. OF c = original format with MC = multiple choice, CR = constructed response. NAEP National Assessment of Educational Progress. FCAT Florida Comprehensive Assessment Test. 157 PAGE 173 an effort to separate those that were particularly difficult (based both on percent correct and readability), those that contained a graphic, and those that tested similar content or required similar skills. A review of students performance on individual items for the pilot study confirmed the ordering of the items for the full study. Validity The validity of a test is generally concerned with the extent to which an instrument measures what it is supposed to measure. More specifically, validity is defined as the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores (Ary, Jacobs, & Razavieh, 1996, p. 262). Validity is always specific to the particular purpose for which the instrument is being used, and there are three different types of evidence that can be gathered to support the inferences being made from the scores of a measuring instrument: contentrelated, criterion related, and constructrelated (American Psychological Association, 1985). Contentrelated evidence speaks to the extent to which the sample of items on a test is representative of some defined domain of content (Ary et al., 1996). The fact that the questions being used in the study test were drawn from the National Assessment of Educational Progress and the Florida Comprehensive Assessment Test, together with the range of content covered by the twelve selected items (Table 8) provides evidence of content validity. Criterionrelated evidence shows the extent to which the scores on a measuring instrument are related to an independent external variable (criterion) believed to 158 PAGE 174 measure directly the behavior or characteristic in question (Ary et al., 1996, p. 265). A criterion measure must demonstrate relevance and freedom from bias. Evidence of relevance was achieved by a careful comparison of the test items to the content appropriate for fifthgrade students (NCTM, 2000). Freedom from bias is achieved if the scoring of a criterion measure is not influenced by any factors other than actual performance on the criterion. As the test items were scored according to an established rubric, and the test papers did not contain any information about the student that would potentially impact the scoring, the resulting scores are deemed to be free from bias. Constructrelated evidence focuses on the performance, linguistic explanation, selfefficacy, and selfassessment scores as measures of a trait or construct (Ary, 1996). In this instance, the measurement of problem solving ability, quality of linguistic explanations, selfefficacy, and selfassessment should be as independent as possible from the measure of any other construct. The measures of selfefficacy and selfassessment follow the work done by other researchers, and are believed to possess constructvalidity. However, as the research literature reviewed in Chapter 2 demonstrates, problemsolving ability cannot be measured independent of reading ability. Additionally, when linguistic explanations are required, as with this and other problemsolving tests, and scoring rubrics include the explanation as a criterion, the measurement of problemsolving ability is not independent of the measurement of writing ability. The testing and scoring instruments that were used in this study were intended to be representative of the types of instruments currently being used to assess 159 PAGE 175 students for the purposes of making highstakes decisions about their academic futures. Given that this research was designed to imitate the current assessment situation using instruments that admittedly present threats to construct validity, the researcher included the two confounding variables as variables of interest in this study: reading ability as an independent variable, and quality of linguistic explanations as a dependent variable. This research attempted to determine the extent to which reading ability moderates students problem solving scores. Also, because the problem solving rubric did include students linguistic explanations as a criterion, the linguistic explanations were scored independent of the problem solving score to determine the extent to which students ability to explain their thinking may have impacted their problem solving scores. Reliability of the Data Cronbach Alpha The reliability of a measuring instrument is the degree of consistency with which it measures whatever it is measuring (Ary et al., 1996, p. 273). One way to measure reliability involves assessing a tests internal consistency, the extent to which all test items are measuring the same thing. The Cronbach alpha coefficient, named after Lee Cronbach (1951), is used to measure internal consistency when the test items are not scored simply as right or wrong, but are given a range of scores. Because the items on the study test were scored on a scale of zero to four for performance, and zero to three for linguistic explanations, the Cronbach alpha is an appropriate measure of reliability. The Cronbach alpha 160 PAGE 176 coefficient for the pilot administration of the test for mathematical performance (n=23) was .89, and for linguistic explanations was .75. These numbers are considered satisfactory following the guidelines of Guilford and Fruchter (1978). The Cronbach alpha coefficient was computed for mathematical performance, linguistic explanations, mean bias of selfefficacy, and mean bias of selfassessment for the full study. The raw coefficients for each of these variables were .79, .87, .83, and .79 respectively. These numbers are considered satisfactory following the guidelines of Guilford and Fruchter (1978), and indicate that for these four variables, the test instrument produced scores that had an acceptable level of internal consistency. Interrater Reliability Training and Scoring Another reliability issue is the consistency of the scoring of test items. To measure the extent to which the researcher accurately and reliably applied the rubrics for scoring mathematics performance and linguistic explanations, a stratified random sample of twentyfive test papers was doubledscored. Prior to any work being done by the second scorer, two training sessions were conducted by the researcher. The second scorer was a mathematics education professional with extensive experience in elementary mathematics content and pedagogy. The training process began with a general discussion of the anchor papers selected from student work in the pilot study, the general rubrics for performance and linguistic explanations (Appendices A and C), and the itemspecific rubrics for performance and linguistic explanations (Appendices K and 161 PAGE 177 L). Next, one complete student test from the pilot study was selected at random for review and discussion. The researcher and second scorer discussed the students work for each test item and how each response would be scored according to each of the rubrics. The researcher then selectively sampled two additional pilot test papers that reflected a wide variation in performance scores based on the pilot test results. These papers were deemed likely to provide a more complete practice experience in the scoring process than test papers for which the students scored similarly on all test items. In between the two training sessions, both the researcher and the scorer independently scored each of the twelve items for each of the two tests for quality of performance and linguistic explanations. The second training session was used to discuss all disagreements and arrive at a consensus. The researcher and scorer then proceeded with the independent scoring of 25 test papers selected from the full study. All interrater reliability scoring and discussion of scoring results was completed before the researcher scored any of the remaining test papers for the study. The 25 test papers that were doublecoded were selected using a stratified random sampling process (Patton, 2002) to ensure equitable representation for each school and classroom in the study sample. One test paper was selected at random from each classroom included in the study, resulting in a total of 16 tests. The remaining nine tests were selected by randomly choosing three tests from each of the three schools. All twelve test items for all twentyfive tests were independently scored on both measures by 162 PAGE 178 the researcher and the second scorer. Any disagreements between the numerical score assigned to a response by the two scorers were tallied, and a percentage of agreement was calculated. Any disagreements that were a direct result of a lack of clarity in the text of either the general rubric or the itemspecific rubrics were tallied separately. The discussion between the researcher and the scorer that resulted from these disagreements led to clarification of the language of the rubrics, thereby enhancing the reliability of the subsequent scoring of all test papers by the researcher. The reliability of the categorization of solution processes was enhanced by having the second scorer assess the solution process of every item for which the process was unclear to the researcher. Any differences of opinion were negotiated with the result of complete agreement for the categorization of all processes that were in question. For the mathematics performance score on the 25 doublescored tests, there was preliminary agreement on 243 of the 300 items (81%). Of the 57 items for which there was not initial agreement, 63% of the disagreements were the result of the researcher assessing a score one point higher than the second scorer. A further discussion of these 57 items resulted in the identification of 31 items for which the disagreement was a direct result of ambiguity in the language of the itemspecific rubrics. When the language of the rubrics was clarified, all 31 of these disagreements were rectified, resulting in an overall 91% agreement rate. The remaining 26 disagreements were discussed until consensus was reached. 163 PAGE 179 For the linguistic explanation scores, there was preliminary agreement on 213 of the 300 items (71%). Of the 87 items for which there was not initial agreement, fully 76% were the result of the researcher assessing a score one point higher than the second scorer. A further discussion of the 87 items resulted in the identification of 30 items for which the disagreement was a direct result of ambiguity in the language of the general and itemspecific rubrics. When the language of the rubrics was clarified, all 30 of these disagreements were rectified, resulting in an overall 81% agreement rate. The remaining 57 disagreements were discussed until a consensus was reached. The clarifications made in the language of the itemspecific rubrics and the general rubric for scoring linguistic explanations that resulted from the interrater reliability process were implemented in the scoring of all remaining test papers. An example of one of these clarifications occurred in the itemspecific rubric for the question about Jeans class. Throughout the scoring for mathematical performance, a top score of 4 was reserved for a model response that demonstrated a clear understanding of the concept, showed all necessary work, and provided a contextualized explanation. A score of 3 represented a successful response and differed from a score of 4 by, among other things, either omitting the explanation or providing an explanation that lacked context. In the original version of the rubric for this item, the language describing a score of 3 stated Correct solution with correct work but no explanation OR Minor flaw in solution process but explanation clear and correct. Neither of these possibilities 164 PAGE 180 included the situation of a correct solution with correct work and a correct explanation without context. An example of an explanation without context for the Ms. Hernandez problem would be, I divided thirtyfour by eight and got four with a remainder of two. Although this explanation does detail the steps of the computation, it does not relate the computation to the context of the problem of teams and substitutes, and does not satisfy the requirement of the question that states, Explain how you know you are correct. Therefore, an answer with this explanation is not considered a model response and should receive a score of 3. Because neither the description for a score of 4 nor the description for a score of 3 included this scenario, the scorer was left to wonder how to score this response. To clarify the description, language was added to the descriptor for a score of 3 to include an explanation that lacked context. MemberCheck Interviews The purpose of the member check interviews was to ensure the reliability of the researchers categorization of problemsolving processes, and the students choices of faces to denote their level of selfefficacy and selfassessment. Four boys and four girls were purposefully selected for the interviews based on a preliminary scoring of their test papers. The goal was to talk with the students who performed reasonably well on the test but provided very little linguistic explanation for their work. Although there may have been a greater potential for error in the categorization of problemsolving processes with students who scored very low on the test, there was also less likelihood that 165 PAGE 181 these low students would be able to identify and successfully verbalize their processes. Students who scored well and provided a high level of linguistic explanation for their processes were not selected because their written work combined with their linguistic explanations would likely yield a reliable categorization of their problemsolving processes. To reduce the time period between test administration and the interviews, the selection process began with a cursory perusal of all student work to assess the quantity of linguistic explanations provided and eliminate from consideration all students who provided linguistic explanations for most of their problemsolving processes. Every student whose test paper contained very little in the way of linguistic explanations was considered for the interview. These tests were scored using the performance rubric, and only those students who scored 24 or above on the performance rubric remained eligible for consideration. A score of 24 represents an average of 2 points on each question and indicates some level of problem solving proficiency. The remaining students tests were then scored for linguistic explanations and the final selection was based on those scores. To ensure an equitable representation of interviewees at the three schools, three students were chosen from each of the two larger schools (two boys and one girl from one school, and one boy and two girls from the second school), and two students (one boy and one girl) were chosen from the smaller school. A total of four boys and four girls with a minimum score of 24 for performance and the lowest scores for linguistic explanations were selected for interviewing. An alternate boy and girl were also selected at each school in case the firstchoice 166 PAGE 182 student would not or could not be interviewed. Seven of the eight original choices were interviewed, but a substitute of one male alternate was required due to teacher request. To reduce the likelihood that students memory failures would impact the results of the interviews, the students own work served as a means to stimulate recall, and the students were allowed time to review their own work before providing a verbal explanation. The time lapse between test administration and interviews was approximately three weeks for the pilot study, and the students were successful in recalling their thought processes when given time to think. The time lapse between test administration and interviews for the full study was an average of two and a half weeks. During each individual interview, the student was shown her/his original test booklet and was asked to first read the question aloud, explain why he/she chose to circle the given face for selfefficacy, explain his/her work for the question, then explain why he/she chose to circle the given face for selfassessment. The interview protocol was repeated for each of the twelve test items, and the interviews were audiotaped and videotaped. The researcher intended to use the audiotape for transcription and the videotape to add the details that could only be perceived through watching the video. However, the poor quality of the audiotape resulted in the videotape being used for both purposes. The interviews were approximately 20 minutes in duration. After all interviews were completed and transcribed, and all tests were scored, the researcher reviewed the edited transcript alongside the students 167 PAGE 183 work to determine percentages of agreement for three items. The first was whether the students verbal explanation of his/her problem solving processes matched the descriptions the researcher assigned to the written work. The second and third measures involved the percentage of agreement between the students verbal explanations for their circling of faces for selfefficacy and selfassessment and the language used in the test training to describe the meaning of each face. ProblemSolving Processes The transcripts of the interviewed students verbal explanations for their problemsolving processes were compared to the problemsolving process by which the written explanation had been categorized. Eight students were interviewed and thirteen process descriptions were reviewed for each student. Although there were only twelve questions, the calculator question required students to provide two different methods for solving the problem, resulting in two separate codings for this question and a total of thirteen responses per student. Of the total 104 responses (8 students x 13 responses each), 99 responses were coded accurately representing 95% accuracy. The five errors in coding of problemsolving process were then analyzed to determine how many resulted in an error in the coding of problemsolving theme. Of the five incorrectly coded responses, three did not impact the coding of the theme. Only two of the errors resulted in an incorrect coding for the theme of the process, yielding a 99% accuracy rate for the coding of themes. This high level of accuracy gives the 168 PAGE 184 researcher confidence that the system for the coding of process descriptions and themes yielded a reliable representation of students actual written work. SelfEfficacy The next stage of the individual membercheck interview review was to compare students verbal explanations for their selection of level of selfefficacy to the range of students descriptions of the levels of selfefficacy that were provided during the wholeclass training sessions. These training sessions occurred immediately prior to each testing event and were documented through field notes taken by the researcher while the students were testing. The researcher compiled the field notes that were recorded after each session, and organized the students comments as to why they would circle each face for selfefficacy and selfassessment. The review indicated that the students verbal explanations for their selection of the appropriate face to circle matched the descriptions of each face provided by students during training in 89 out of 96 cases (93% agreement.) An example of an agreement would be a student who said he/she circled the very happy face after reading the question, Because I thought it was easy. This response was in line with the responses provided by students during the pretest training such as, It seems really easy, or I know how to do the problem. Of the seven instances of disagreement, six were situations in which the student circled a face that indicated a level of selfefficacy that was one level higher than their verbal description of their reasoning for circling that face. An example of this situation is a student who explained that he/she circled the 169 PAGE 185 slightly happy face Cause I wasnt sure if, if um I could really do it. This verbal description would align more closely with the middle face that is neither happy nor sad and indicates an uncertainty in ability to complete the problem successfully. The high level of agreement, and the consistency in the direction and severity of the disagreements, indicates that the fiveface scale was a reliable measure of students selfefficacy during the testing event. SelfAssessment The last stage of the member check interview review was to compare students verbal explanations for their selection of the face that best depicted how well they think they did on each question (selfassessment) to the range of students descriptions of the levels of selfassessment that were volunteered during the wholeclass training sessions. The review indicated that the students verbal explanations for their selection of the appropriate face to circle matched the descriptions of the faces provided by students during training in 88 out of 96 cases (92% agreement.) An example of an agreement would be a student who said he/she circled the very happy face after completing work on the problem, Cause I was pretty confident that I got it correct. This response was in line with responses provided by students during the pretest training such as, I figured it out and I know I got it right, or Im confident I got it right. Of the eight instances of disagreement, six were situations in which the student circled a face that indicated a selfassessment rating that was one level higher than his/her verbal description of his/her reasoning for circling that face. An example of this situation is a student who explained that he/she circled the 170 PAGE 186 slightly happy face Cause I wasnt so sure if it was the right answer. This verbal description would align more closely with the middle face that is neither happy nor sad and indicates an uncertainty as to whether the problem was successfully completed. Some examples of student explanations of the middle face for selfassessment from the pretest training are: Im not sure if I got it right, or I might have gotten it right but Im not sure. The high level of agreement, and the consistency in the direction and severity of the disagreements, indicates that the fiveface scale was a reliable measure of students selfassessment during the testing event. Data Collection In order to answer the research questions, the study test was administered to the fifthgrade students participating in the study. The answer for each test question was evaluated for the quality of the work based on a fourpoint rubric (see Appendix A for scoring guide), and the quality of the linguistic explanation based on a threepoint rubric (see Appendix B for scoring guide). Students indications of selfefficacy and selfassessment on the five face scale were converted to a numerical value ranging from zero to four. Participation in this study was voluntary, required parental permission, and was conducted in accordance with the University of South Florida Office of Research, Division of Research Compliance. All fifthgrade students in the three study schools, with the exception of the students who participated in the pilot study as fourth graders, were invited to participate. Informed consent forms 171 PAGE 187 (Appendix H) were completed prior to test administration, and included permission for both the test administration and the followup interview. The informed consent documents were sent home with a cover letter (Appendix I) that briefly explained the purpose and procedures of the study, provided a place for a parent to sign if they were not willing to have their child participate, and explained how to complete the attached informed consent document if parental permission was to be granted. Students were given a small, nonedible incentive for returning either the cover letter indicating refusal of parental permission, or a signed informed consent document indicating that parental permission had been granted. Students who did not initially return either document were given another copy of the original cover letter and informed consent forms to take home. Organization of Testing Testing was performed at three elementary schools in Florida, and was conducted with one group of students at a time. Students who were present on the initial testing day were either tested in their regular classroom, in a neighboring teachers room, or in the media center, depending on individual teacher preference. Students who were absent on the initial testing day were tested individually or in groups in either the media center or a private work room. Before testing began, each student was given a test booklet and asked to complete the cover page with his/her name, teacher, school, and gender. When all students had completed that information, they were directed to turn to the first page of the test booklet for review of a practice problem. The purpose of the practice problem page was to familiarize the students with the test format and the 172 PAGE 188 selfefficacy and selfassessment scales. The instructions for both the selfefficacy and selfassessment scales were read aloud and explained by the researcher. Students were then led in an open discussion of the conditions under which they might choose each of the five faces for each scale. The researcher kept a journal record of student comments that occurred during this discussion. Students were told that each question asked them to show their work and explain their thinking with the phrase Explain how you know your answer is correct. Students were reminded of the importance of providing complete explanations of their problemsolving processes, and were then encouraged to do their best work and completely answer each question. They were told that during the test administration, the researcher would circulate around the room and point at the students papers whenever they had neglected to circle a face for one of the selfefficacy or selfassessment scales. They were then instructed to raise their hand when they were finished with the test. When all students indicated an understanding of the test format, they were instructed to begin the test. During the test administration, each teacher was provided with an amended copy of the test. The cover page for the test explained that due to the fact that the study was being conducted early in the school year and the teacher may not have covered all material included on the test, the researcher was requesting that each teacher rate each question for the extent to which his/her students were prepared to answer the question. In the body of the test, a fiveface scale was placed in the work space for each question with the text Circle 173 PAGE 189 the face that best shows how well prepared your students are to answer this question successfully. All teachers were requested to answer the preparedness question for each test item. For the duration of each test administration, the researcher circulated about the room to remind students to circle the selfefficacy and selfassessment faces as they worked. All work was completed individually by each student, and the researcher only answered student questions that related to test format or the selfefficacy and selfassessment scales. Only one student whose Individual Education Plan permitted the reading aloud of test questions requested that accommodation. He was seated away from the other students during the test administration, so that any items read aloud would not be heard by other students. Because the intent of this study was to determine the relationship between reading ability and the dependent variables, and a readaloud accommodation would have created a confounding variable, this student was removed from the sample. The researcher noted the elapsed time for each students test on the test booklet when it was completed and handed in to the researcher. Based on data from the pilot study, the researcher estimated that test administration would take approximately one hour; however, no time limit was imposed on the students except in one situation. At one of the subject schools, the students changed classes for most subjects. During one testing session, the change of classes took place while two students were still working on the test. As these were the 174 PAGE 190 only two students in the study who did not finish, their test results were not included in the data analysis. Analysis of Data Upon data collection from all participating students, the data were analyzed to answer each of the five research questions. A summary of the data sources and data analysis for each research question is presented in Table 12. Research Question One Analysis of students problem solving processes was performed on a per item basis. For each test item, all student responses were sorted according to the process by which students solved each item (McCoy, 1994; Zhang, Wilson, and Manon, 1999). By way of illustration, some examples of process descriptions for a correct response to the flour problem were: 1 1/3 x 3 with correct answer Converted fractions to decimals (correctly) then did computation Solved strictly with picture/diagram correct answer 1 1/3 + 1 1/3 + 1 1/3 correct answer A detailed list of all the processes identified for each question is provided in Appendix M. Before the analysis of problemsolving processes commenced, the researcher intended to conduct a chisquare statistical test for each item to determine whether gender differences in the students problemsolving 175 PAGE 191 Table 12 Summary of Data Sources and Methods Research Question Data Source Data Analysis 1. Processes Student work on Qualitative analysis of 12item test student work on each item followed by chisquare test 2. Performance Student work on Factorial ANOVA using 12item test students total test score as measured by the FCAT 4point rubric (048 points possible), and multiple regression analysis 3. Linguistic Students' Factorial ANOVA using Explanations explanations of their students total score for problemsolving linguistic explanation as processes measured by a 3point rubric (036 points possible), and multiple regression analysis 4. SelfEfficacy Students mean bias Factorial ANOVA using score = average of students mean bias students selfreported scores for selfefficacy, score of selfefficacy and multiple regression minus their performance analysis score on each item 5. SelfAssessment Students mean bias Factorial ANOVA using score = average of students mean bias students selfreported scores for selfassessment score for selfassessment together with multiple minus their performance regression analysis score on each item processes were statistically significant. For the pilot study, the largest number of processes identified for a single item was twelve. In addition, with the small 176 PAGE 192 amount of data generated from the testing of twentythree students, the researcher was unable to identify clear themes that cut across all questions. For the full study, the smallest number of processes identified for any one question was twelve, and the sum of all processes for all twelve questions was 232. Conducting chisquare tests for 232 different processes would provide a large volume of data from which no meaningful conclusions would likely be generated. Also, as the data analysis for the full study proceeded, emergent analysis (Patton, 2002) across all items facilitated the identification of three broad categories of solution processes based on success in solving the problem. The broad categories that emerged were Successful Response (S), Partially Successful Response (P), or Unsuccessful Response (U). Under each of the three broad categories, the researcher identified five to six solution process themes for a total of seventeen themes. Six of the seventeen themes were similar to other themes that fell under different broad categories. For example, Computational problems fell under the categories Partially Successful Response and Unsuccessful Response. Each of the two iterations of Computational problems was counted individually as a theme, even though the two iterations represented levels of severity of the same error. A more detailed discussion of the broad categories and solution process themes that emerged as a result of the analysis is presented in Chapter 4. These broad categories and themes were then used as an organizing mechanism for the coding of the 232 individual solution processes used by students to solve the twelve questions. A 177 PAGE 193 list of all solution processes for each of the twelve test items along with the category and theme used to code each process is included in Appendix M. Viewing the solution processes through the lens of overarching themes allowed for analysis of the solution processes of boys and girls across all questions instead of strictly by individual test items. This change in methodology resulted in a deeper and richer analysis of the data and allowed for more meaningful conclusions to be drawn about the problem solving processes used by boys and girls. The researcher assessed for convergence and divergence throughout the process of identifying categories and themes and assigning solution processes to themes. To deal with the challenge of convergence, deciding what things fit together, the researcher looked for recurring regularities in the data that revealed patterns that could be sorted into categories. The placement of solution processes into categories and themes was judged according to two criteria: internal homogeneity and external heterogeneity. The first criterion is concerned with the extent to which the responses that belong in a certain category hold together in a meaningful way. The second criterion is concerned with the extent to which the differences between categories are clear. The researcher then worked back and forth between the data and the classification system to verify both the meaningfulness and accuracy of the categories and themes and the placement of solution processes in the categories (Patton, 2002). Because the researcher categorized strictly based on student work, such as whether the student used repeated addition or multiplication to solve the 178 PAGE 194 problem, double coding for interrater reliability was not necessary. However, there were many responses for which the students work was not clear. These responses were flagged, and the decision for categorization was made jointly by the researcher and the same second scorer who assisted in the other interrater reliability checks. Upon completion of the categorization process, simple descriptive statistics (frequencies and percentages) were computed to summarize the data for analysis of gender differences. Chisquare statistical tests were then conducted for each question to determine whether gender differences in the students problem solving processes were statistically significant. Additional chisquare tests were conducted by collapsing similar themes. For example, three individual process themes focused on students use of pictures or diagrams to solve the problem; one for a successful response, one for a partially successful response, and one for an unsuccessful response. Chisquare tests were conducted first on each of these three themes individually, then on the data from all three themes combined. Research Questions Two and Three The data analysis processes for the questions of performance and linguistic explanations were conducted in similar fashion. Each test item was scored according to a rubric, but to ensure that the scoring of the two criteria was not confused, all papers for one test item were scored first for mathematical performance and then for linguistic explanation. The performance score was obtained through application of the FCAT mathematics fourpoint rubric 179 PAGE 195 (Appendix A) and itemspecific rubrics (Appendix K), together with the use of anchor papers, a sampling of which is provided in Appendix D. The itemspecific rubrics were created by the researcher using student responses from the pilot study, and reflected the format of itemspecific rubrics used by the State of Florida in conducting its training for FCAT 5 th grade mathematics scorers (State of Florida Department of State, 2001). The anchor papers were selected from the data collected in the pilot study and exemplified the qualities and characteristics of each possible score (04) for each item. Anchor papers were identified for most but not all scores for every item due to the fact that the small sample in the pilot study did not yield the full range of scores for every item. To provide additional data for later analysis, questions for which the student made no attempt were flagged using the numeral . All items scored as were treated as zero scores for the purposes of data analysis to answer the research questions. The linguistic explanations were scored using the researchercreated threepoint general rubric (Appendix B) together with itemspecific rubrics (Appendix L). The threepoint general rubric was based on criteria established by Ciochine and Polivka (1997). The itemspecific rubrics were created using student work from the pilot study. For the purposes of scoring, symbols and numerals embedded in the textual explanation were regarded as text. Students total scores for performance and writing were used as data for the analyses, but descriptive statistics for performance and writing on a peritem basis have also been reported. The scores for performance could range from 048, based on 12 180 PAGE 196 questions scored on a 4point rubric. The scores for linguistic explanations could range from 036, based on 12 questions scored on a 3point rubric. Samples of student work from the pilot study for a range of scores is provided in Appendix D. Criterion scoring reliability was determined from a sample of 25 randomly selected student tests. All twelve items on these 25 student tests were independently scored for performance and linguistic explanations by another mathematics education professional. Any disagreements between the numerical score assigned to a response by the researcher and the numerical score assigned to that same response by this second scorer were tallied. All disagreements were then resolved. Of the total 300 scores (12 items for 25 tests) for each of the two criteria, a percentage of agreement for interrater reliability was computed. Internal consistency reliability was also computed for performance and writing scores using the Cronbach alpha formula (Ary et al., 1996). For both questions, the data were first examined to determine measures of central tendency and dispersion. The assumption of independence was met by having all students complete their tests individually. The distribution of scores was analyzed to ensure that the assumption of normality for the factorial ANOVA model was not violated. Levenes (Stevens, 1986) test of homogeneity of variances was used to ensure that the assumption of equal variances was not violated. The threeway factorial ANOVA statistical design is a robust procedure that examines the effects of three independent categorical variables on one continuous dependent variable (Stevens, 1999). A factorial ANOVA (2 x 3 x 3) 181 PAGE 197 was applied to answer each question, using the students total score for performance as the continuous dependent variable to answer question two, and students total score for linguistic explanations as the continuous dependent variable to answer question three. For the two factorial ANOVAs being used to answer these two questions, the independent categorical variables were student gender (2 levels), reading ability (3 levels), and mathematics ability (3 levels). Given this design, there were potentially eight sources of variation, including three main effects, three twoway interactions, one threeway interaction, and within cells error. Posthoc analysis included the Tukey procedure for the pairwise group comparisons of all significant main effects with nonsignificant interaction effects (Stevens, 1999) This procedure is appropriate when the cell sizes are unequal, and the researcher is interested in more than half of all pairwise comparisons. Planned comparisons were conducted for any significant interaction effects. As was the case in the pilot study, the cell sizes for this study were disproportional, with only a small percentage of students falling into the low category for reading and mathematics, and several of the cells having no members. If the disproportionate cell sizes were not indicative of the distribution of the population, the effects could become correlated, and unless these correlations are taken into account, the results could be misinterpreted (Stevens, 1999). However, based on an understanding of the population, the disproportionate cell sizes were expected. Because the review of literature supported the ordering of the independent variables, the hierarchical method was 182 PAGE 198 used. Chapter four includes a detailed explanation of the rationale for the method selection and the ordering of the variables. The data for these two questions was also independently analyzed using multiple regression. The independent variables for both questions were student gender, student mathematics FCAT scale scores (ranging from 100500), student reading FCAT scale scores (ranging from 100500), the gender and reading ability interaction, and the gender and mathematics ability interaction. This analysis allowed the researcher to draw conclusions as to the relationship between the independent variables and the dependent variables of performance and linguistic explanations, and the extent to which an increase in students FCAT scores yielded an increase in the dependent variables. Research Questions Four and Five The statistical procedures for research questions three and four, dealing with selfefficacy and selfassessment, were nearly identical. To begin, students selfreport of selfefficacy and selfassessment was converted from the smileyface likert scale to a numerical score from zero to four. A score of zero was assigned to the saddest face, a score of four was assigned to the happiest face, and the other faces were scored with the values one through three. The variable used in the threeway factorial ANOVA for each question was a measure of calibration called the mean bias score as described by Schraw (1995), and Yates (1990). Bias reveals the direction of the errors in judgment and is computed by subtracting actual performance from predicted performance. To obtain the bias scores for selfefficacy, the students average performance score was subtracted 183 PAGE 199 from his/her average selfefficacy score, resulting in a mean bias score between 4 and +4. Expressing no selfefficacy (score of 0) and providing a completely incorrect answer (score of 0) reflected a zero bias. The same lack of confidence with a completely correct answer (score of 4) resulted in a bias score of (zero minus four) indicating extreme underconfidence. Expressing complete confidence (score of 4) with a completely incorrect response resulted in a bias score of 4 (four minus zero) indicating extreme overconfidence. Hence, bias scores larger than zero correspond to overconfidence and scores less than zero correspond to underconfidence. The same procedure was used to compute a mean bias score for each student regarding selfassessment. Descriptive statistics were computed for students ratings of selfefficacy, selfassessment, and mean bias. Descriptive statistics for selfefficacy and selfassessment on a peritem basis are also reported. The mean bias score served as the single continuous dependent variable for each of the threeway ANOVAs used to analyze the data for questions four and five. For both analyses, the three categorical independent variables were gender (2 levels), reading ability (3 levels), and mathematics ability (3 levels). The experimental method of computing sums of squares was used for both of the questions (Stevens, 1999). Chapter four includes a full explanation of the method selection process. Posthoc Tukey tests were conducted for all significant main effects with more than two levels. Criterion scoring reliability and internal consistency were established through the interrater reliability work and Cronbach alpha computation that was 184 PAGE 200 conducted in the analysis of questions two and three. Because the only other data included in the analysis of these questions involved scores that were selfreported by the participants using a scale that has been reliably used in other studies (Boekarts et al., 1995; Vermeer et al., 2000), no further reliability work was required. The data for these two questions was also independently analyzed using multiple regression. The independent variables for both questions were student gender, student mathematics FCAT scale scores (ranging from 100500), and student reading FCAT scale scores (ranging from 100500), and the interaction of gender with both the mathematics and reading scores. This analysis allowed the researcher to draw conclusions as to the relationship between the independent variables and the dependent variables of mean bias for selfefficacy or mean bias for selfassessment, and the extent to which an increase in students FCAT scores yielded an increase in the dependent variables. Summary A nonexperimental causal comparative design was used to study the relationship between the independent variables of gender, reading ability, and mathematics ability and the dependent variables of students problem solving processes, selfefficacy, selfassessment, performance, and linguistic explanations. The participants were all fifthgrade students from three Florida elementary schools who obtained parental permission. Participant measurements were obtained through administration of a twelveitem problem185 PAGE 201 solving test. Member checks were performed with a selected sample of students based on performance and linguistic explanation scores. Statistical procedures for data analysis included descriptive statistics for all five questions. Qualitative analysis and a chisquare statistical test were used to analyze the data to answer the first research question involving problem solving processes. Threeway factorial ANOVAs were used for each of the remaining four questions with the independent categorical variables of gender (2 levels), reading ability (3 levels), and mathematics ability (3 levels). Post hoc Tukey tests were performed to identify what group means were significantly different when the F test indicated a significant main effect. Total performance scores based on a fourpoint rubric were the continuous dependent variable for question four. Total scores for linguistic explanations based on a threepoint rubric were the continuous dependent variable for question five. Questions four and five, involving selfefficacy and selfassessment, required the computation of mean bias scores for use as the continuous dependent variable. Interrater reliability was established through the independent scoring of a random sample of test papers by a mathematics education professional. Cronbach alpha coefficients were obtained to estimate the internal consistency of test items for performance, linguistic explanation, selfefficacy, and selfassessment. 186 PAGE 202 CHAPTER 4 FINDINGS The purpose of this study was to examine how gender, reading ability, and mathematics ability differentially moderated students mathematical problemsolving processes, linguistic explanations of those solution processes, achievement on a mathematical problemsolving test, selfefficacy on a selfreported rating scale, and selfassessment on a selfreported rating scale. This chapter consists of the results of the statistical tests for each of the five study questions. The descriptive and inferential statistical results, together with the interpretation of the results, are organized according to the question they address. Question One: Findings for ProblemSolving Processes The findings in this section address the following research question: To what extent are students choices of mathematical problemsolving processes related to gender? The data for problemsolving processes, the variable of interest for this question, were obtained via a qualitative analysis of students work on the study test instrument. 187 PAGE 203 Coding and Emergent Themes To answer this question, each students work on each question was first coded to indicate the specific problemsolving process used to solve the problem. Appendix M provides a list of all identified processes for all test questions, and shows the frequencies and percentages of students use of the processes. The actual number of processes identified for each test item ranged from a low of twelve for the pencil problem to a high of twentyeight for the string problem. After all processes were coded, the researcher identified three broad categories that would serve as a preliminary organizing framework for all responses: Successful response (S), Partially Successful response (P), and Unsuccessful response (U). Through a process of emergent analysis (Patton, 2002), five to six solution process themes were identified for each of the three broad categories, yielding a total of seventeen themes. All student responses for all test items for all students were coded again, this time according to the theme that encompassed the problemsolving process for each question. Because the calculator question required students to solve the problem in two different ways, the student responses for each of the two ways received a separate coding. For this reason, each students test paper was coded for thirteen themes. The list of process descriptions in Appendix M also includes the category and theme to which each process was assigned. The following examples represent a subset of the problemsolving processes identified for the school lunch problem and serve to illustrate the relationship between the processes, categories and themes. The school lunch problem 188 PAGE 204 states: Sam can purchase his lunch at school. Each day he wants to have juice that costs 50, a sandwich that costs 90, and fruit that costs 35. His mother has only $1.00 bills. What is the least number of $1.00 bills that his mother should give him so he will have enough money to buy lunch for 5 days? Process: $9 answer found by adding 3 items x 5 then rounding (minor computational error okay). Category: Successful response. Theme: Solved using a traditional method not a drawing/diagram. Process: $8.75 answer found by adding 3 items x 5 (minor computational error okay). Category: Partially Successful response (student did not answer question). Theme: Either difficulty understanding question or inattention to question. Process: Rounded total for one day x 5 but with major computational error. Category: Partially Successful. Theme: Computational problems. Table 13 presents all the categories and themes of problemsolving processes that emerged in this study. To clarify some of the terms used in the coding process, a traditional method is defined for this study as a method that a textbook curriculum or a classroom teacher might present as a means of solving the problem. The researchers knowledge of what constitutes a traditional method derives from personal experience with teaching mathematics at the fifthgrade level. Some examples of traditional methods follow: 189 PAGE 205 Table 13 Categories and Themes of ProblemSolving Processes Category Theme Description Successful 1 Solved using traditional method not a drawing/diagram Successful 2 Solved using novel method not a drawing/diagram Successful 3 Solved via drawing/diagram Successful 4 Solved using simplified method not a drawing/diagram (e.g., repeated addition vs. multiplication, trial and error, etc.) Successful 5 Solved correctly without written work Partially Successful 6 Computational problems Partially Successful 7 Either difficulty understanding question or inattention to question Partially Successful 8 Problems with mathematical logic/concept (e.g. incorrect order of operations) Partially Successful 9 Problems with drawing/diagram (either creation of own or use of given) Partially Successful 10 Novel approach to problem that shows good thinking, but does not produce a solution within the confines of the problem Partially Successful 11 Reason for difficulty cannot be determined, but answer shows some entry into problem Unsuccessful 12 Computational problems Unsuccessful 13 Either difficulty understanding problem or inattention to question Unsuccessful 14 Problems with mathematical logic/concept Unsuccessful 15 Problems with drawing/diagram Unsuccessful 16 Nonsense answer Unsuccessful 17 Source of error cannot be determined 190 PAGE 206 Flowerpot item: Multiplied 20 pots x 3 seeds per pot = 60 seeds needed. String item: Divided by and answered 6 pieces. Pizza item: Answered that Joses pizza was bigger, therefore the half he ate was bigger. Calculator item: Added 100 to the incorrect number of 8275 to get the desired number of 8375. A novel method is one that incorporates creative thinking with sound mathematical logic. Some examples of responses judged to be novel follow: Flowerpot item: Grouped pots by 10 x 3 seeds = 30 seeds. 30 seeds x 2 groups = 60 seeds. String item: Converted all measures to inches and solved correctly. Pizza item: Jose is right because his half had more toppings and was therefore more pizza. Calculator item: Multiply the display by zero to get zero, then enter the desired number. A simplified method is one which employs, for example, repeated addition of five items instead of multiplying by five, or trial and error addition as a substitute for division. The term simplified is not used to indicate that the actual computation required by the chosen method is easier, but that the operation chosen is taught at an earlier age and is considered to be a more rudimentary method of completing the computation. For example, children are taught addition before multiplication. For a child who is competent in both operations, multiplying 191 PAGE 207 a single number by five would be faster than adding the number five times. However, because repeated addition requires a lower level of computational knowledge, it is considered a simplified method for this study. Items coded with a theme of solving via a drawing/diagram are those that used the drawing/diagram as the actual solution method, not simply as a way to illustrate computational work. Some themes were duplicated under more than one category because they represented varying levels of success with a particular method. An example of this is themes 6 and 12 computational problem. To begin, a response that used productive mathematical logic, included computation that indicated a complete understanding of the computation process, but contained a careless error, was categorized as a successful response. In order for a response to be coded with the partially successful theme six computational problems, the computational work had to show some level of sound mathematical logic, but indicate a misunderstanding of the computational process. An example of this would be a student who knew that to solve the flour problem required multiplying 3 x 1 1/3, but incorrectly multiplied the fraction and reported the answer as 3 and 3/9. If that same student completed the same computation and reported the answer as 1 3/9, the response would have been coded as the unsuccessful theme 12 computational problems. In this instance the student erred in the computation for both the whole number and the fraction, and provided an answer that was unreasonable given the context of the problem. 192 PAGE 208 The theme Either difficulty understanding question or inattention to question appeared in the categories of Partially Successful and Unsuccessful. An example of a partially successful response that was coded with this theme comes from the Ms. Hernandez problem. The question in this item required students to divide the total number of students by the size of each team, then interpret the remainder as the number of substitutes. A student response that completed the division but reported the number of teams instead of the number of substitutes was coded with the partially successful version of this theme. This student showed an understanding of part of the problem, but was inattentive to the actual question. An example of an unsuccessful response for this theme comes from the school lunch problem. The item required students to find the total cost of purchasing three school lunch items for five days, and then requires that the total be rounded to the next dollar because the mother only has one dollar bills to give the child. A few student responses indicated that they interpreted the question to be asking Given that the three lunch items cost this much each, what could the child buy if he only had one dollar to spend? This response shows a complete misunderstanding of the problem and was coded as unsuccessful for this theme. Descriptive Statistics The first statistical measures used to understand the data for this question were descriptive in nature. Table 14 organizes the data according to theme, and indicates the number and percentage of boys and girls whose problemsolving processes were coded according to each theme. The percentages in the total 193 PAGE 209 Table 14 Descriptive Statistics for Coding of ProblemSolving Process Themes Girls Boys Total Theme Description n % n % n % Successful Reponses 1 Traditional 605 43.1 793 47.3 1398 45.4 2 Novel 67 4.8 100 6.0 167 5.4 3 Drawing/diagram 31 2.2 25 1.5 56 1.8 4 Simplified 46 3.3 51 3.0 97 3.2 5 No work 9 0.6 13 0.8 22 0.7 Partially Successful Responses 6 Computational problems 89 6.3 83 5.0 172 5.6 7 Question difficulty 142 10.1 174 10.4 316 10.3 8 Math logic problems 31 2.2 38 2.3 69 2.2 9 Drawing/diagram problems 25 1.8 22 1.3 47 1.5 10 Novel with problems 10 0.7 11 0.7 21 0.7 11 Cannot be determined 50 3.6 73 4.4 123 4.0 Unsuccessful Responses 12 Computational problems 15 1.1 19 1.1 34 1.1 13 Question difficulty 39 2.8 26 1.6 65 2.1 14 Math logic problems 27 1.9 37 2.2 64 2.1 15 Drawing/diagram problems 27 1.9 21 1.3 48 1.6 16 Nonsense answer 40 2.9 45 2.7 85 2.8 17 Cannot be determined 151 10.8 146 8.7 297 9.6 Note. N=3,081 responses. Percentages for boys and girls are based on 108 girls and 129 boys. Minor errors in percentage totals are due to rounding. 194 PAGE 210 column are based on the sample of 3,081 responses from 237 students (129 boys (54%) and 108 girls (46%)) from their responses to thirteen items (the calculator question was coded with two themes). Inferential Statistics The chisquare statistical test was used for this question to determine whether the proportions of girls and boys who were observed to have used a particular problemsolving process theme to answer the questions differed significantly from the theoretically expected proportion. The test was run for each question individually, and the chisquare results, pvalues, and Cohens w (1992) effect size estimates for each of the tests are reported in Table 15. The guidelines for interpreting Cohens w are that small, medium, and large effect sizes are 0.10, 0.30, and 0.50, respectively. In order to keep the overall type I error rate to =.05, the Bonferroni adjustment was applied resulting in an =.004 for each question. The null hypothesis for this question was that there was no statistically significant difference in the percent of boys and girls for the themes of the processes they used to solve the problems. The chisquare results indicate that none of the differences were statistically significant, resulting in a failure to reject the null hypothesis. There is no way to be sure whether this failure to reject was indicative of a lack of gender differences in the population or the result of this study being underpowered due to the size of the sample. If, in fact, there is a gender difference in the population, but the effect size was small, the sample for this study would have been too small to detect the difference. 195 PAGE 211 Table 15 ChiSquare Results for ProblemSolving Process Themes Question X 2 p w Flowerpot 10.6523 0.2222 0.2120 Ms. Hernandez 5.6407 0.6874 0.1543 Calories 9.9248 0.0774 0.2046 Jeans Class 12.4521 0.0866 0.2292 Pizza 10.5211 0.2303 0.2107 String 11.5281 0.4843 0.2205 Balloons 7.3760 0.4967 0.1764 Flour 12.5717 0.1275 0.2303 Calculator1 3.0895 0.7975 0.1142 Calculator2 7.4091 0.3876 0.1768 Lunch 2.0554 0.8414 0.0931 Pencils 16.9784 0.0303 0.2677 Birdseed 14.9828 0.1833 0.2415 Note. N=237, X 2 =Chisquared, w=Cohens w measure of effect size. None of the effect sizes were classified as large, but the effect size for the pencil question was the highest of all questions, and was nearly medium. It is, therefore, the only one that bears further discussion. Table 16 presents the frequencies and percentages of boys and girls whose problemsolving processes for this question were coded with each theme. Missing theme numbers indicate that no student responses were coded for that theme for this problem. 196 PAGE 212 Table 16 ProblemSolving Process Themes for the Pencil Problem Girls Boys Total Theme Description n % n % n % Successful responses 1 Traditional 35 14.8 50 21.1 85 35.9 2 Novel 3 1.3 14 5.9 17 7.2 3 Drawing/diagram 8 3.4 5 2.1 12 5.5 Partially Successful Responses 6 Computational problems 5 2.1 5 2.1 10 4.2 7 Question difficulty 13 5.5 25 10.6 38 16.0 Unsuccessful Responses 12 Computational problems 6 2.5 4 1.7 10 4.2 13 Question difficulty 12 5.1 7 3.0 19 8.0 15 Drawing/diagram problems 2 0.8 4 1.7 6 2.5 17 Cannot be determined 24 10.1 15 6.3 39 16.5 Note. Minor errors in percentage totals are due to rounding. N=237 (108 girls and 129 boys). Because boys constituted 54% of the sample for this question, it was expected that the number of boy responses for each theme would be greater than the number of girl responses. However, girl responses outnumbered boy 197 PAGE 213 responses in three of the four themes that were categorized as unsuccessful: themes 12, 13, and 17. Over four times as many boys as girls solved the problem successfully via a novel method, and more girls than boys successfully solved the problem using a drawing or diagram. Nearly twice as many boys as girls were coded with theme 7 indicating that they were only partially successful because they either had difficulty understanding the question or were inattentive to the question. The lack of significant findings led to the decision to rerun the chisquare tests after the themes were collapsed. The data for themes that appeared under more than one major category were combined, such as the successful, partially successful, and unsuccessful themes for solving via a drawing or diagram. Again, no significant results were found. Question Two: Findings for Mathematical Performance The findings in this section address the following research question: To what extent is childrens performance in mathematical problem solving related to gender, reading ability as measured by teacher ratings and Florida Comprehensive Assessment Test (FCAT) reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? The data for mathematical performance, the dependent variable for this question, were obtained through the administration of the study test instrument and consisted of twelve constructedresponses mathematical problemsolving items. Each item 198 PAGE 214 was scored according to a fourpoint holistic rubric (Appendix A), resulting in a possible total score ranging from 0 to 48 points. Descriptive Statistics Descriptive Analysis by Question Analysis of overall sample. Table 17 shows the mean, standard deviation, skewness, and kurtosis for students total mathematics rubric scores with a total possible score of 48, as well as for each of the twelve questions that have a total possible score of 4 points. The data for students total math performance scores yielded a mean of 26.76, a standard deviation of 7.24, and a relatively normal distribution with skewness and kurtosis values at 0.04 and 0.61. As expected, the distributions for individual questions for which students scored relatively high were negatively skewed. Several of the distributions for individual questions were noticeably leptokurtic, a characteristic that is indicative of a spiked curve with many more extreme scores than would be found in a normal distribution. As with skewness, this level of kurtosis is not unexpected for the scores of a single item. Because the total mathematics performance scores were nearly normal, the distributions do not violate the assumption of normality essential to the factorial ANOVA test. The highest and lowest scoring questions were the flowerpot and string problems, respectively. The flowerpot item required students to determine the number of pots pictured and multiply that number by three to determine how many seeds would 199 PAGE 215 be needed to plant three seeds in each pot. The computation required for this question was below the fifthgrade level, and should have resulted in high scores. Table 17 Descriptive Statistics for Mathematics Performance Scores Question M SD Skewness Kurtosis Flowerpots 3.08 0.83 1.38 2.88 Ms. Hernandez 2.68 0.89 1.31 1.25 Calories 2.83 0.88 0.85 0.37 Jeans Class 2.68 0.98 0.51 0.20 Pizza 2.31 1.30 0.15 1.12 String 0.90 1.06 1.07 0.06 Balloons 2.91 0.88 1.46 2.75 Flour 2.04 1.02 0.48 0.22 Calculator 1.79 1.41 0.02 1.36 School Lunch 2.22 1.03 0.79 0.31 Pencils 1.87 1.48 0.06 1.43 Birdseed 1.46 1.26 0.46 1.04 Total 26.76 7.24 0.04 0.61 Note. N=237, max. score for individual questions = 4, max. score for total =48. The string problem required students to divide fractions to determine how many pieces of string, each of a yard long, could be cut from a piece of string of a 200 PAGE 216 yard long. Division of fractions is not usually explicitly taught until the end of the first half of fifth grade, therefore it is not surprising that few students were able to reason out a successful response. The results for the easiest and hardest items mirror the findings of the pilot study and confirm the decisions as to placement of the questions according to difficulty. The test began with what turned out to be the easiest question, allowing students to become engaged in taking the test. The hardest question (string) was placed midway through the test and was followed by the second easiest question (balloons) so that students potential discouragement resulting from a difficult question could be overcome by an item on which they could be successful. The pencil and calculator questions had the highest dispersion of scores, and the flowerpot, calories, balloons, and Ms. Hernandez questions, ranking 1, 2, 3 and 5 in order of difficulty, had the smallest standard deviations. Analysis by group. Tables showing the mean and standard deviations of students scores for each question, organized according to group, are provided in Appendix N. A look at the mean and standard deviations of boys and girls scores on individual questions yielded some interesting findings. Boys outscored girls on eight of the twelve questions. The questions for which girls mean score was higher than boys were the flowerpot, balloon, pizza, and lunch questions, whose respective ranks in order of difficulty according to overall means were 1, 2, 6, and 7. A rank of one indicates that the question garnered the highest mean score of all questions and was therefore the easiest question on the test for the study participants. Generally speaking, this result signifies that girls outscored 201 PAGE 217 boys on some of the easiest questions, and boys outscored girls on all of the hardest questions. There did not appear to be a relationship between how well boys and girls performed on individual items based on the gendered wording. The four items for which girls scored higher than boys had a variety of classifications of gendered wording; two had male wording, one was neutrally worded, and the wording of the other included both male and female wording. The standard deviation of girls scores was higher than that of boys scores for eight of the twelve questions and the same as boys for two items. Boys standard deviations were higher than girls for the flowerpot and string questions, ranked 1 and 12 respectively in order of difficulty. In other words, there was more variability in the boys scores than the girls scores for only the easiest and most difficult questions on the test. A comparison of the mean scores of high, middle and low ability mathematics students reveals that for each group, the rank order of the means for individual questions did not vary significantly from the rank order of the means for the entire sample. The rank order of the scores for the middle and low ability groups varied by no more than one place from the ranking found in the overall sample. For the high ability group, the rank order of all items was within one place of the order for the overall sample except for the school lunch and pencil problems whose order differed by two places. When the students were grouped according to reading ability, the rank order of the scores for all three groups differed by no more than one place for all questions. 202 PAGE 218 Descriptive Analysis of Total Mathematics Performance Scores Table 18 shows the mean and standard deviation of students total mathematics performance scores for each level of gender, mathematics ability, and reading ability. The figures indicate that boys as a group scored 1.19 total points higher on the test than girls. The standard deviation of boys scores was slightly lower than that of girls. When viewing the mathematics scores through the lens of mathematics ability, predictably, students of higher ability scored Table 18 Total Mathematics Performance Scores by Group Group n M SD By Gender Boys 129 27.30 7.12 Girls 108 26.11 7.37 By Mathematics Ability High 57 34.19 5.27 Middle 129 26.11 5.64 Low 51 20.10 5.06 By Reading Ability High 73 32.44 5.97 Middle 134 25.24 5.91 Low 30 19.73 5.92 Note. N=237, max performance score = 48 points. 203 PAGE 219 higher. The mean difference between the high and middle groups (8.08) exceeded the difference between the middle and low groups (6.01). The standard deviations of the scores for all three mathematics ability groups were similar with a range of 5.06 to 5.64 points. When the sample was grouped according to reading ability, similar results were found. The students of higher reading ability received higher total mathematics performance scores than students of lower reading ability. The difference between the high and middle groups was 7.20 points, and the difference between the middle and low groups was 5.51 points. The dispersion of scores within the high, middle, and low ability groups was nearly identical. The box and whisker plot shown in Figure 1 illustrates the total performance scores for the study sample organized according to three different grouping variables: gender, mathematics ability, and reading ability. For the grouping variable of gender, the boxplot illustrates the wider range of scores for girls, but shows that the girls Hspread (Glass & Hopkins, 1996), the distance between the first and third quartiles, is smaller than boys. Because the Hspread defines the middlescoring half of the group, it can be concluded that the middle half of all girls scored within a smaller range than the middle half of all boys. The boxplots for the grouping variable of mathematics ability illustrate that the sample students total scores behaved in a predictable fashion. The maximum, quartile 3, median, quartile 1, and minimum scores were all patterned according to ability, with the high ability group having the highest scores, the middle ability group having the second highest scores, and the low ability group having the lowest 204 PAGE 220 scores. The Hspread for the middle ability students was slightly larger than for the high and low ability students. The entire Hspread of the highability group was above that of the middleability group. The boxplots for the grouping variable of reading ability illustrate the same pattern as that found for mathematics ability with the students of higher ability having higher scores for all quartiles. 05101520253035404550BoysGirlsH MathM MathL MathH RdgM RdgL RdgGroupTotal Performance Score Q1 min median max Q3 Figure 1. Box and whisker plot of total performance scores grouped by gender, mathematics ability, and reading ability. Correlations Between Mathematics Performance and Other Variables As described in the methods chapter, the teachers whose students participated in the study were asked to rate how well prepared their students were to answer each of the questions. Therefore, the teachers were rating each 205 PAGE 221 206 test items difficulty relati ve to a generalized notion of their students abilities. The ratings were recorded using a five face likert scale, and the teachers responses were converted to numerical scores ranging from 4 (very well prepared) to 0 (not well prepared). Table 19 shows the teacher ratings of their students preparedness for each of the twelve test items. The average of the teachers responses was correlated wit h the mean score for each question with a resulting Pearson correlation coefficient of 0.89. This indicates a strong positive relationship between teachers predictions of their students performance and their actual performance. When the t eachers scores were partitioned according to the gender of the teacher the results showed a stronger positive correlation between the five male teachers responses ( r = 0.90) than the nine female teachers responses ( r = 0.74). Additional correlations were computed to determine the strength of the relationshi p between the teachers assessments of their students preparedness for each school. The correlation coefficients were 0.85, 0.72, and 0.89 for schools one, two, and three, respectively. These results indicate that the teachers at school th ree were the most accurate in their assessment of how well prepar ed their students were to be successful on the test items. Students mean mathematics performanc e scores were also correlated with the reading level of each question. The reading level was determined using the HomanHewitt Readability Formula (Homan, Hewitt, & Linder, 1994) whose process is described in chapter three. The resulting Pearson correlation coefficient of 0.33 indicates a weak positive relationship. PAGE 222 Table 19 Teacher Rating of Student Preparedness by Question School Teacher Gender Flowerpot Ms. Hernandez Calories Jeans Class Pizza String 1 A M 4 3 4 4 2 1 1 B M 4 4 3 4 3 0 1 C M 4 4 4 4 2 3 1 D F 4 4 4 4 3 1 2 A F 4 4 4 3 2 1 2 B F 3 2 3 3 4 3 2 C F 4 3 4 4 1 2 2 D M 4 4 4 3 3 2 2 E F 4 3 2 4 2 0 2 F M 4 3 4 4 1 1 3 A F 3 2 4 3 3 2 3 B F 4 4 4 2 1 0 3 C F 4 4 3 4 2 1 3 D F 4 3 3 3 1 0 3 E F 4 4 4 4 2 0 3 F F 4 3 4 3 2 1 Note. Teacher ratings range from 0 to 4 points. A score of four indicates the highest level of student preparedness. 207 PAGE 223 2 F M 3 1 . . 3 C F 4 2 2 2 1 2 3 D F 3 2 3 1 2 3 3 F F 2 2 3 3 2 1 2 E F 4 2 2 1 1 3 3 A F 4 2 3 3 2 2 3 B F 3 2 3 2 1 2 3 E F 2 2 3 3 0 1 Table 19 (continued) Teacher Rating of Student Preparedness by Question School Teacher Gender Balloons Flour Calculator Lunch Pencils Birdseed 1 A M 4 1 4 4 2 2 1 B M 4 0 2 3 1 2 1 D F 4 3 3 3 1 1 2 C F 4 0 4 2 3 4 1 C M 4 4 2 4 2 3 2 D M 3 4 2 2 3 2 2 A F 3 3 3 3 3 2 2 B F 1 2 3 3 3 3 Note. Teacher ratings range from 0 to 4 points. A score of four indicates the highest level of preparedness. Missing data indicated by . 208 PAGE 224 The amount of time each student used to take the test was noted as each student submitted his/her finished work. The mean elapsed time for the sample was 39.41 minutes with a standard deviation of 12 minutes. Girls mean time was almost exactly two minutes longer than boys, but the dispersion of boys elapsed time was greater. Elapsed time was correlated with students total mathematics performance scores to determine if there was a direct relationship between the two. The Pearson correlation coefficient was 0.20, indicating a weak positive relationship. The final correlations studied related students mathematics performance scores to the accuracy of their selfefficacy and selfassessment. The Pearson correlation coefficient for selfefficacy mean bias was 0.47, indicating a fairly strong negative relationship. This can be interpreted to mean that higher performance scores were indicative of lower selfefficacy mean bias scores; the students who performed better on the test were more accurate in their feelings of selfefficacy. The Pearson correlation coefficient for selfassessment mean bias was 0.43, also indicating a moderately strong negative relationship. Students who achieved higher scores on the test were more accurate assessors of their own work than students with lower scores. Inferential Statistics Factorial ANOVA Results In order to suggest that differences in gender, mathematics ability, and reading ability would result in differences in mathematical performance scores in the population, chance must be ruled out as a plausible explanation for the 209 PAGE 225 observed differences in the sample. To assess the tenability of a chance explanation, a 2x3x3 factorial ANOVA was conducted with an alpha level set at .05 for each effect. The degree to which the Type I error rates were actually controlled to the specified alpha level depended on how adequately the data met the assumptions of independence, normality, and equal variances. The assumption of independence was met by ensuring that different observations came from different individuals. The study test was administered individually and each student completed her/his work alone. The descriptive statistics indicate that the assumption of normality was not violated. To assess the assumption of equal variances, Levenes test for homogeneity of variances was utilized. This procedure tests the null hypothesis that the group variances of the group means are equal. The results of the Levenes test at F(12, 224)=.76, p=0.6954, indicated that the null hypothesis cannot be rejected. As a result, the variances of the group means were not found to be heterogeneous, leading to the conclusion that the assumption of equal variances was not violated. According to this analysis of the assumptions, it appeared reasonable to conduct the factorial ANOVA. The cell sizes for this study were not equal. If the disparity in cell sizes was not deemed to be the result of differences in the population, the regression approach (Type III) to analyzing the sums of squares would have been in order. However, it was expected that in accordance with the population, most of the sample would collect across the diagonal that represented identical categorization for both mathematics and reading ability. It was also expected 210 PAGE 226 that instances of a student being classified as high in one subject area and low in the other would be rare. Because the sample cell distribution was judged to be indicative of the distribution in the population, either the Type I (hierarchical) or Type II (experimental) method of computing sums of squares was appropriate. The hierarchical method is reserved for situations in which there is an established order for the effects and was judged to be the most appropriate for answering this question. For the hierarchical method, each effect is adjusted only for those preceding it in the order. For this question, gender was placed first in the ordering because it can be argued that gender may affect mathematics ability and reading ability, but not vice versa. Mathematics ability was placed second as it is more closely tied to the outcome variable of mathematics performance than reading ability (Ethington, 1992; Fierros, 1999). The obtained F(12,224)=18.18, p<.0001 for the overall Ftest indicates that one or more of the independent variables was significantly related to the mathematics performance scores. The Rsquare value of 0.4933 shows that nearly 50% of the variance in the scores can be explained by the independent variables. The results of the main and interaction effects for the threeway factorial ANOVA are presented in Table 20. The factorial ANOVA results indicate that the main effect for mathematics ability at F(2,236)=96.39, p<.0001 is statistically significant using a predetermined Type I error rate of .05. Omega square, a conservative population estimate of the proportion of variance in the dependent variable associated with 211 PAGE 227 the independent variable (Stevens, 1999), shows that an estimated 43% of the variance in population means for mathematical performance was related to Table 20 Factorial ANOVA Results for Total Mathematics Performance Scores Source df MS F p 2 f Gender 1 83.42 2.98** 0.0857 0.00 0.11 Mathematics 2 2698.39 96.39 <.0001 0.43 0.89 Gender*Math 2 11.95 0.43 0.6530 0.00 0.06 Reading 2 150.71 5.38 0.0052 0.02 0.21 Gender*Reading 2 122.17 4.36 0.0138 0.02 0.19 Math*Reading 3 7.37 0.26 0.7688 0.01 0.06 Gender*Math*Reading 1 41.92 1.50 0.2223 0.00 0.08 Note. N=237, 2 = omega squared, f=Cohens f measure of effect size. *p<.05. **p<.10. mathematics ability. The Tukey test can be used as a post hoc procedure to determine where the significant differences lie while maintaining the overall alpha rate at .05. The Tukey test for mathematics ability indicated that the differences in means between the three ability groups all differed by a significant amount. This result was expected, and is not of particular interest for further discussion. Of greater interest was the result that reading ability was found to be statistically significant with F(2, 236)=5.38, p=.0052. This finding indicates that if the null hypothesis was true, that there was no relationship between reading 212 PAGE 228 ability and mathematical performance on the problemsolving test, the probability of obtaining an F as large or larger than the one obtained would be .0052. Because this probability is so small (less than .05), the null hypothesis is rejected in favor of an alternative hypothesis that suggests that at least one pair of population group means differ. Cohens f effect size measures how far the group mean typically deviates from the grand mean in standard deviation units. The guidelines for evaluating Cohens effect size are that a value of 0.1 indicates a small effect, 0.25 indicates a medium effect, and 0.4 represents a large effect (Stevens, 1999). The Cohens f value of 0.21 for reading ability indicates a medium effect size, and the omega squared value of .02 indicates that 2% of the population variance in mathematics performance is related to reading ability. The followup Tukey test indicated that the group means between all three ability levels differed by a significant amount. The results also showed that the interaction effect for gender and reading ability was significant at F(2,236)=4.36, p=0.0138. This means that the difference in observed group means between boys and girls of high ability is enough different than the difference in group means between boys and girls of middle or low ability to conclude that the differences must exist in the population. To better understand this interaction, a graph showing the group means is provided in Figure 2. As a result of these findings, the null hypothesis that the interaction of gender and reading ability does not affect students mathematics performance is rejected. The graph shows that girls of high reading ability outscored boys of high reading ability by an average of nearly two points. The 213 PAGE 229 reverse is true for boys and girls of middle reading ability, with boys outscoring girls by slightly over two points. For boys and girls of low reading ability, there was only a tenth of a point difference in average scores in favor of girls. This means that possessing high reading ability gives girls more of an advantage for this type of mathematical problem solving than it does boys. 0510152025303540HighMiddleLowReading AbilityMathematics Score Girls Boys Figure 2. Total mathematics score interaction of reading ability and gender. N=237. The F test for the interaction of reading ability and gender shows that the interaction is significant, but does not provide information as to where the differences between boys and girls performance scores were significant. Planned contrasts are a means of determining whether the gender differences 214 PAGE 230 were significant at the high, middle, or low reading ability levels. The results show that the contrast differences between boys and girls of high reading ability was not significant F(1,236)=1.87, p=.1726, nor was the difference in group means of mathematics performance scores for boys and girls of low reading ability F(1,236)=0.00, p=.9506. The only contrast which reached significance was the one comparing boys and girls of middle reading ability F(1, 236)=4.74, p=.0305. The remainder of the null hypotheses must stand because the factorial ANOVA failed to find significant differences. The main effect of gender on mathematical performance in the sample was not significant at F(1, 236)=2.98, p=.0857. Therefore, no conclusions can be drawn about the relationship between gender and mathematics scores for the population. This does not necessarily mean that this variable does not moderate mathematics performance in the population, but that this study, as conducted, did not have the power to rule out chance as the cause of the differences. Multiple Regression Analysis Introduction. The sample for this analysis consisted of 237 students, 108 girls and 129 boys. The three main effect predictor variables considered for the analysis were gender (dummy coded with girl=1, boy=0), FCAT mathematics scale score (abbreviated as FCATM and ranging from 100500) and FCAT reading scale score (abbreviated as FCATR and ranging from 100500). Because one of the interaction effects was significant in the factorial ANOVA, the interactions of gender with reading ability and mathematics ability were also 215 PAGE 231 considered as predictors. The outcome variable for this analysis was students total mathematics performance score with a possible range of 048 and an abbreviation of TOTALM. The multiple regression test was first conducted using the three main effects and the two interaction effects. Neither interaction effect was found to be significant. Because the inclusion of interaction effects in multiple regression analyses makes interpretation of the regression coefficients and standardized regression coefficients exceedingly complex, and because the interactions were not significant, the multiple regression was run again without the interactions. Although the results and discussion presented in text are based on the model without interactions, a table presenting the results with the interactions is included in Appendix T. Descriptive statistics. The first step in the analysis was to examine the descriptive statistics for each continuous variable involved in the study. The mean of the FCAT mathematics scores was 331.09 with a standard deviation of 43.31. The distribution was not skewed (sk=0.04), but was noticeably leptokurtic (ku=1.22). The mean of the FCAT reading scores was slightly higher at 343.96 with a smaller standard deviation at 42.00. The distribution of scores was not skewed (sk=0.05), but was slightly leptokurtic (ku=0.97). Multiple regression results. A summary of the multiple regression results is presented in Table 21. Students FCAT mathematics and reading scores were both found to be significant predictors of their mathematics performance. Gender was not a significant contributor to the predictive utility of the regression 216 PAGE 232 equation, and neither of the interactions was significant. The obtained R 2 value for the model was .5358, indicating that about 54% of the variability in the mathematics performance scores was explained using the set of predictors. Table 21 Summary of Multiple Regression Analysis for Mathematics Performance Variable b SE t p Intercept 18.8046 0.0 2.87926 6.53* <.0001 Gender 0.0562 0.0039 0.6767 0.08 0.9338 FCAT Mathematics Score 0.0798 0.4775 0.0103 7.72* <.0001 FCAT Reading Score 0.0557 0.3230 0.0105 5.31* <.0001 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. Because R 2 tends to be an overly optimistic estimate of population variability, the adjusted R 2 value is used as a more conservative alternative. The adjusted R 2 for this model was 0.5299, indicating very little shrinkage. Cohens effect size f 2 = R 2 /(1R 2 ) was computed to be 1.1542 which is interpreted as a very large effect size using Cohens rough guidelines (.02 small, .15 medium, .35 large). These results indicate that this combination of variables serves as a good predictor of total mathematics performance scores. The standard error of estimate was 4.96557 which means that the prediction equation, on average, will 217 PAGE 233 produce a predicted total mathematics performance score that errs by about 5 points. The obtained prediction equation was: TOTALM = 18.8046 + 0.0562*GENDER + 0.0798*FCATM + 0.0557*FCATR To better interpret the equation, consider two students of the same gender who have identical FCAT reading scores. If one student had a 100point higher FCAT mathematics score, he/she would be predicted to have a total performance score that was a little less than eight points higher. Similarly, if two students of the same gender had identical FCAT mathematics scores, but one had a 100point higher FCAT reading score, he/she would be predicted to have a total mathematics performance score that was almost six points higher. The standardized regression coefficient of 0.4775 for FCATM indicates that a 1 standard deviation increase in a students FCAT mathematics score would lead to a predicted 0.4775 standard deviation increase in her/his total mathematics performance score. The standardized regression coefficient of 0.3230 for FCATR means that a 1 standard deviation increase in a students FCAT reading score would result in a predicted 0.3230 standard deviation increase in her/his total mathematics performance score. The standardized regression coefficient of 0.0562 for gender indicates that the predicted total performance score for girls would be 0.0562 standard deviations lower than boys. Squared semipartial correlations for each predictor were also examined. FCATM was found to account uniquely for 11.9% of the variability in the 218 PAGE 234 performance scores, whereas FCATR accounted for only 5.6%, and gender accounted for only 0.001%. Assumptions. Regression analyses are based on a host of assumptions. The data were screened and an assessment of each assumption was made. The assumption of homoscedasticity of errors says that the variance of the errors stays the same as the predictions move along the regression line. In other words, the assumption is that the prediction errors created when the regression equation predicts low outcome values for students are similar to those found when high outcome values are predicted. To determine whether this assumption was violated, the residuals were plotted with the predicted values. The resulting plot indicated that this assumption was not violated. The residuals were also found to be normally distributed with skewness of 0.02, and kurtosis of 0.53. A further examination of the residual plot indicated a linear relationship between the variables, and nothing in the design of the study indicates that the residuals are not independent. The removal of the three outliers prior to running the factorial ANOVAs resulted in no additional outliers being identified through a review of students Cooks D values. Due to the stringent testing conditions under which students FCAT scores were obtained, the assumption that the predictors were measured without error is satisfied. Although the predictors cannot be considered fixed, multiple regression is considered robust to violations of this assumption. In conclusion, based on the screening of the data, it appears that multiple regression was an appropriate procedure for analyzing the data for this question. 219 PAGE 235 Conclusion. The fact that students FCAT mathematics scores were significant predictors of their mathematics performance on the study test is not surprising. The results for FCAT reading scores were also significant, and lend further support to previous research findings regarding the relationship between reading ability and problemsolving performance. Although gender did not contribute significantly to the ability of the regression equation to predict students total performance, because the data was easily obtained, there is no harm in including gender in the equation. Question Three: Findings for Linguistic Explanations The findings in this section address the following research question: To what extent is the quality of students linguistic explanations of their problemsolving processes, as measured by a holistic rubric, related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? The data for the dependent variable for this question, linguistic explanations, were obtained through administration of the study test instrument. The test consisted of twelve constructedresponse mathematical problemsolving items. The linguistic explanations students provided to explain their problem solving processes were scored according to a threepoint rubric (Appendix B). The sum of the scores for all twelve items, ranging from 036, was used as the dependent variable for this question. 220 PAGE 236 Descriptive Statistics Descriptive Analysis by Question Analysis of overall sample. Table 22 shows the mean, standard deviation, skewness, and kurtosis for students total linguistic explanation rubric scores with Table 22 Descriptive Statistics for Linguistic Explanation Scores Question M SD Skewness Kurtosis Flowerpots 1.37 1.11 0.13 1.35 Ms. Hernandez 1.30 0.96 0.14 1.21 Calories 1.29 0.99 0.13 1.07 Jeans Class 1.32 1.00 0.12 1.08 Pizza 1.91 0.79 0.55 0.11 String 0.67 0.74 0.74 0.39 Balloons 1.21 1.12 0.26 1.36 Flour 0.91 0.88 0.36 1.18 Calculator 1.30 0.95 0.04 1.02 School Lunch 0.84 0.89 0.54 1.02 Pencils 1.31 1.02 0.15 1.14 Birdseed 0.78 0.76 0.52 0.70 Total 14.20 7.32 0.09 0.97 Note. N=237, max. score for individual questions = 3 points, max. total score = 36 points. 221 PAGE 237 a total possible score of 36, as well as for each of the twelve questions with a possible score of 3 points. The data for students total linguistic explanation scores yielded a mean of 14.20 and a standard deviation of 7.32. The skewness value of 0.09 is nearly normal, but the kurtosis value of .96 indicates a noticeably platykurtic distribution. This indicates a flat or broad curve and coincides with the large size of the standard deviation relative to the mean. Although the kurtosis value is high, factorial ANOVA is believed to be relatively robust to violations of this assumption (Glass, Peckham, & Sanders, 1972). Most of the distributions for the scores for individual questions were also platykurtic, but not skewed. Because the total linguistic explanation score distribution is not skewed and not extremely leptokurtic, and because of the relatively large sample size, factorial ANOVA is believed to be fairly robust in this situation. The mean of 14.20 points represents an average score of just under 40%, compared to an average score for mathematics performance of 56%. These low scores should not be interpreted as meaning that students failed the test on both measures. Holistic rubric scores are not an interval measure and are not intended to be converted to percentage scores. However, the percentage scores do provide a means of comparing student performance on multiple tasks that are scored with a holistic rubric. The percentages can be used to conclude that, overall, students did much better with the mathematics than they did with the linguistic explanations. 222 PAGE 238 The pizza question garnered the most complete linguistic explanations of any question, scoring more than a halfpoint higher on average than any other question. Given that this problem did not call for computation, but for students to make sense of the fact that the size of a fractional part is dependent on the size of the whole, this result is not surprising. The string and birdseed problems, by far the most difficult items on the test, resulted in the lowest mean scores for linguistic explanations. In many instances, students were not able to make an informed attempt at solving these two problems, therefore it follows that they would be illequipped to explain their problemsolving processes. Analysis by group. Descriptive statistics of students bygroup performance on each question are provided in Appendix O. A descriptive analysis of the byquestion data for boys and girls reveals that girls achieved higher scores for their linguistic explanations on nine of the twelve test items. Boys linguistic explanations received higher scores for the calories, string, and pencil questions, ranked 3, 12, and 9 in order of difficulty, with 12 being the most difficult. The standard deviations of girls scores were higher than boys for all questions except the calorie and pizza items. All three mathematics ability groups had the highest linguistic explanation scores for the pizza question, and the lowest scores for the string question. For the questions with an overall difficulty ranking of 4 th 9 th the rank order of the linguistic explanation scores varied widely by mathematics ability level. The same is true for the sample when it was grouped according to reading ability. 223 PAGE 239 Descriptive Analysis of Total Linguistic Explanation Scores Table 23 shows the mean and standard deviation of students total linguistic performance scores grouped by gender, reading ability, and mathematics ability. The data show that girls total linguistic explanation scores were slightly less than one point higher than boys, and slightly more dispersed than boys. Students classified with higher mathematics ability scored higher than students of lower ability, and the standard deviations of the scores were Table 23 Total Linguistic Explanation Scores by Group Group n M SD By Gender Boys 129 13.82 7.16 Girls 108 14.66 7.51 By Mathematics Ability High 57 18.75 7.53 Middle 129 13.41 6.81 Low 51 11.12 5.97 By Reading Ability High 73 17.58 7.62 Middle 134 13.30 6.68 Low 30 10.03 6.06 Note. N=237, max. score = 36 points. 224 PAGE 240 ordered according to ability with the highest ability group having the highest standard deviation, and the lowest ability group having the lowest standard deviation. Similarly, students of higher reading ability received higher linguistic explanation scores than students of lower reading ability. The scores in the low reading ability had the least amount of dispersion. Figure 3 presents the box and whisker plots for linguistic explanation scores grouped by the three different independent variables. The plots for boys and girls linguistic explanation scores are similar, with a larger Hspread and a higher median score for girls. When grouped by mathematics ability, the plots reveal that the Hspread of the low ability group is smaller than that of the other two groups indicating that the middle half of the low ability group had a smaller 05101520253035BoysGirlsH MathM MathL MathH RdgM RdgL RdgGroupTotal Linguistic Explanation Scores Q1 min median max Q3 Figure 3. Box and whisker plot of total linguistic explanation scores grouped by gender, mathematics ability, and reading ability. 225 PAGE 241 range of scores than that of the other two groups. The maximum, quartile 3, and median scores followed a pattern whereby the higher ability groups had higher scores. Both the middle and low mathematics ability groups had minimum scores of zero and similar quartile 1 scores. The three plots for the three levels of reading ability follow a pattern similar to that formed by using mathematics ability as the grouping variable, except that the minimum and first quartile marks also follow a pattern according to ability. Inferential Statistics Factorial ANOVA Results In order to suggest that differences in gender, mathematics ability, and reading ability would result in differences in the level of students linguistic explanations in the population, chance must be ruled out as a plausible explanation for the observed differences in the sample. To rule out the likelihood of a chance explanation, a 2x3x3 factorial ANOVA was conducted with an alpha level set at .05 for each effect. The degree to which the Type I error rates are actually controlled to the specified alpha level depends on how adequately the data meet the assumptions of independence, normality, and equal variances. The assumption of independence was met by ensuring that different observations came from different individuals. The study test was administered individually and each student completed her/his work alone. The descriptive statistics indicate that the distribution of scores was noticeably leptokurtic, but due to the large sample size, factorial ANOVA is believed to be relatively robust to the violation of this assumption. To assess the assumption of equal variances, 226 PAGE 242 the Levenes test for homogeneity of variances was conducted with the result of F(12, 224)=1.30, p=0.2173. The pvalue signifies that the null hypothesis of homogeneous variances cannot be rejected, thereby indicating that the assumption of equal variances was not violated. Based on this analysis of the assumptions, it appeared reasonable to conduct the factorial ANOVA for linguistic explanations. The cell sizes for this study were not equal. If the disparity in cell sizes was not deemed to be the result of differences in the population, the regression approach (Type III) to analyzing the sums of squares would have been in order. However, in accordance with the population, it was expected that most of the sample would collect across the diagonal that represented identical categorization for both mathematics and reading ability. It was also expected that instances of a student being classified as high in one subject area and low in the other would be rare. Because the sample cell distribution was judged to be indicative of the distribution in the population, either the Type I (hierarchical) or Type II (experimental) method of computing sums of squares was deemed appropriate. The hierarchical method is reserved for situations in which there is an established order for the effects as is the case for this question. For the hierarchical method, each effect is adjusted only for those preceding it in the order. For this question, gender, as a demographic variable, was placed first in the ordering because although gender may affect students mathematics and reading abilities, the reverse argument is not viable. Reading ability was placed second as it is more closely tied to the outcome variable of linguistic explanations 227 PAGE 243 than is mathematics ability. Reading and writing are two facets of literacy that are inextricably linked, and a students ability in one area is directly related to her/his ability in another (Routman, 1994; Spivey & King, 1994). Because the relationship between mathematics ability and linguistic explanations of problem solving has not been clearly established in the literature, this variable was placed third in the ordering. The obtained F(12,224)=4.02, p<.0001 for the overall Ftest indicates that one or more of the independent variables is significantly related to the mathematics performance scores. The Rsquare value of 0.1772 indicates that about 18% of the variance in the scores can be explained by the independent variables. The results of the main and interaction effects for the threeway factorial ANOVA are presented in Table 24. The results of the factorial ANOVA for linguistic explanations indicates that both reading ability and mathematics ability are statistically significant with F(2,236)=16.14, p=<.0001 for reading ability, and F(2, 236)=6.16, p=.0025 for mathematics ability. As a result, the null hypotheses that reading ability and mathematics ability are not related to the quality of students linguistics explanations can be rejected. This signifies that the observed average difference between the linguistic explanation scores of high, middle, and low reading ability groups and high, middle, and low mathematics groups is large enough to conclude that a difference exists in the population. None of the interaction effects were significant. The omega squared value for the main effect of reading ability shows that an estimated 11% of the variance in the population is associated with students 228 PAGE 244 reading ability level. The large Cohens f value of 0.37 indicates that the group means typically deviate from the grand mean by 0.37 standard deviation units, and represents a large effect size. The omega squared value for mathematics ability signifies that 4% of the variance in the linguistic scores of the population is estimated to be related to students mathematics ability. The Cohens f effect size estimate of 0.23 reveals that the means of students grouped by mathematics ability deviate from the grand mean by 0.23 standard deviation units. Although the interaction effect of gender*reading ability is shown to have a small effect size of 0.09 according to Cohens f, the differences in group means were not found to be statistically significant. The sample size may have resulted Table 24 Factorial ANOVA Results for Total Linguistic Explanation Scores Source df MS F p 2 f Gender 1 41.06 0.88 0.3479 0.00 0.06 Reading 2 749.01 16.14 <.0001 0.11 0.37 Gender*Reading 2 47.21 1.02 0.3633 0.00 0.09 Mathematics 2 285.97 6.16 0.0025 0.04 0.23 Gender*Math 2 8.06 0.17 0.8407 0.01 0.04 Reading*Math 3 1.41 0.03 0.9701 0.01 0.02 Gender*Reading*Math 1 14.59 0.31 0.5756 0.00 0.04 Note: N=237, 2 = omega squared, f=Cohens f measure of effect size. *p<.05. 229 PAGE 245 in the study being underpowered and unable to rule out chance as the cause of the differences that do exist. The F test serves to indicate whether an effect is significant, but it does not tell whether all differences among the means of all levels of the variable are significant. Because the main effect of reading ability was significant but the interaction effect was not significant, the Tukey test was used in the posthoc analysis. This test helps clarify the practical significance of the relationship between reading ability and the quality of students linguistic explanations as found in the Ftest, while maintaining the overall Type I error level at .05. The results of the followup test indicated that the difference between the means of all ability groups was significant. On average, high reading ability students earned total linguistic explanation scores that were 4.28 points higher than middle ability students. In turn, middle ability students scores were an average of 3.27 points higher than those of low ability students. These results indicate that students of high reading ability provided significantly more complete and accurate linguistic explanations of their problem solving processes than did students of middle or low reading ability, and the middle ability students explanations were significantly more complete and accurate than those of low ability students. The results of the Tukey followup tests for mathematics ability yielded results that were different than those found for reading ability. The difference in group means between the high and middle mathematics ability groups (5.34) was significant as was the difference in means between the high and low ability groups (7.64). Therefore, the observed mean differences in the linguistic 230 PAGE 246 explanation scores between the high and middle and high and low ability groups were large enough to conclude that these differences truly exist in the population. The difference in means between the middle and low ability groups (2.29) was too small to reach significance. Multiple Regression Analysis Introduction. The sample for this analysis consisted of 108 girls and 129 boys for a total of 237 students. The three main effect predictor variables considered for the analysis were gender (dummy coded with girl=1 and boy=0), FCAT mathematics scale score (abbreviated as FCATM and ranging from 100500), and FCAT reading scale score (abbreviated as FCATR and ranging from 100500). The outcome variable for this analysis was students total linguistic explanation score with a range of 036 points and an abbreviation of TOTALLE. An analysis of the descriptive statistics for the predictor variables was conducted for question two, and because the same variables were used for this question, that analysis will not be repeated here. To more closely follow the format of the factorial ANOVA, the multiple regression was first run with the three main effects and two additional interaction effects (gender*FCATM and gender*FCATR). Because the interaction effects were not significant, and the inclusion of the interactions unduly complicates the interpretation of the results, they were excluded from the analysis. The following results and discussion are based solely on the three main effects, but a table summarizing the results of the model with the interaction effects is provided in Appendix T. 231 PAGE 247 Multiple regression results. A summary of the multiple regression results is presented in Table 25. Students FCAT mathematics and reading scores were found to contribute significantly to the predictive utility of the regression equation. The obtained R 2 value for the model was .1826, indicating that about 18% of the variability in students total linguistic explanation scores was explained using this set of predictors. Because R 2 tends to be an overly optimistic estimate of the variability in the population that would be accounted for by this set of predictors, the adjusted R 2 was examined. At a value of .1720, it showed a small amount of shrinkage. Table 25 Summary of Multiple Regression Analysis for Linguistic Explanations Variable b SE t p Intercept 13.4190 0.0 3.8604 3.48* 0.0006 Gender 1.2113 0.0826 0.9073 1.34 0.1832 FCAT Mathematics Score 0.0323 0.1910 0.0139 2.33* 0.0209 FCAT Reading Score 0.0476 0.2735 0.0141 3.39* 0.0008 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. Although gender did not provide a significant contribution to the prediction equation, because the data was easily obtained, and would be gathered if a 232 PAGE 248 study of this type was repeated, there is no disadvantage to leaving the variable in the equation. Cohens effect size f 2 = R 2 /(1R 2 ) was computed to be 0.2234, a medium to large effect size. The standard error of measure was 6.6577, indicating that on average, the prediction equation will yield a predicted total linguistic explanation score that errs by a little more than 6.6 points. The obtained prediction equation was: TOTALLE = 13.4190 +1.2113*GENDER + 0.0323*FCATM + 0.0476*FCATR To interpret the outcome of the regression equation, consider two students of like gender who have identical FCATR scores. A 100point increase in the FCATM score would result in a 3.23 higher predicted linguistic explanation score. Similarly, for two students of the same gender with identical FCATM scores, a 100point increase in the FCATR score would result in a predicted increase in the linguistic explanation score of 4.76 points. Because the dummy coding for gender assigned girls the score of , and boys the score of , the prediction equation will yield a 1.21 point higher predicted TOTALLE score for girls, holding FCATM and FCATR constant. The standardized regression coefficient of 0.0826 for gender indicates that girls predicted TOTALLE score will be a small 0.0826 standard deviations higher than boys. The standardized regression coefficient for FCATM of 0.1910 means that a 1 standard deviation increase in FCATM would result in a predicted 0.1910 standard deviation increase in TOTALLE. The coefficient of 0.2735 for FCATR indicates that a 1 standard deviation increase in FCATR would result in a predicted 0.2735 standard deviation increase in TOTALLE. Standardizing the 233 PAGE 249 coefficients allows for a direct comparison of the strength of the relationship of the predictors on the outcome variable, and shows that FCATR was more influential on TOTALLE than FCATM. The squared semipartial regression coefficients for each predictor reveal that gender uniquely accounted for only 0.6% of the variability in TOTALLE, whereas FCATM and FCATR accounted for 1.9% and 4.0% respectively. Assumptions. Regression analyses are based on many assumptions that should not be violated. The data were screened and an evaluation of each assumption was conducted. To assess the assumption of homoscedasticity of errors, the residuals were plotted with the predicted values. The resulting plot indicated that the assumption was not violated. The residuals were also found to be relatively normal with skewness and kurtosis values of 0.14 and 0.92, respectively. A further analysis of the residual plot shows a linear relationship between the values, and based on the design of the study, there is no reason to believe that the residuals are not independent. Because the outliers were removed based on the descriptive statistics prior to the analysis, the Cooks D values did not detect any additional outliers. Another assumption is that the predictors were measured without error. The variable of gender was selfreported, and no incidence of error has been detected. Due to the rigorous testing procedures association with the FCAT test, the measurement error associated with these variables is not deemed to be in violation of the assumption. The predictors of FCATM and FCATR cannot be considered fixed, but this is the case in many studies and multiple regression is considered robust 234 PAGE 250 to violations of this assumption. In conclusion, based on the analysis of the assumptions, multiple regression was an appropriate procedure for analyzing the data for this question. Conclusion. Students FCAT mathematics and reading scores were found to be significant predictors of the quality and completeness of students linguistic explanations of their problemsolving processes on the study test. Their inclusion in the regression equation significantly increased the equations ability to predict the outcome variable successfully. Although the variable of gender was not a significant contributor to the regression equations predictive ability, it did uniquely account for some of the variability in the outcome and will therefore remain in the model. If there was a significant cost or time factor in gathering this data, it would not be advisable to leave gender in the regression equation. However, because gender data is key to other elements of the study, and would be collected if this study were replicated, there is a small benefit, and certainly no harm in leaving it in the regression equation. The results of the regression analysis confirm the findings of the factorial ANOVA for mathematics ability and for reading ability, even though the variables of reading ability and mathematics ability were slightly different for the two analyses. For the factorial ANOVA, students ability level was a categorical variable determined through a combination of their FCAT level scores and the teachers ratings of their ability. For the multiple regression, students FCAT scale scores, a continuous variable, were used as the measure of their ability. Both analyses confirm that students reading ability and mathematics ability are 235 PAGE 251 significantly related to the quality of the linguistic explanations of their problemsolving processes. Reading ability was also shown to be more strongly related to the outcome variable than mathematics ability. Qualitative Analysis of Students Linguistic Explanations Every item on the study test ended with a statement similar to Show your work, then explain how you know you are correct. During the pretest training discussions, students were told that the explanations of their mathematical problemsolving strategies and procedures were as important as their computational work. Whenever mathematical symbols and numerals were included in the text of the explanation, they were regarded as text for evaluation and scoring. Students linguistic explanations revealed a wide range of perceptions as to what it means to explain how you know you are correct. The purpose of this section is to use unedited samples of student work to illustrate the various approaches students took to providing explanations of their work. Some students were very conscientious about explaining every step of the problemsolving process and relating their explanations to the problem context. Each of the following examples received the highest possible score of 3 points for linguistic explanation. For each of the examples, the name of the problem precedes the students explanation and is followed by the childs gender. Ms. Hernandez Problem: There were two substitutes. I know Im right because if there are eight people in each team and 34 students you think of how many time eight goes into 34 without 236 PAGE 252 going over. Which is four times. Eight times four equals thirtytwo. Plus two equals thirtyfour. Thats two substitutes. Girl. Pencil Problem: I knew of twenty was 5 so I knew Brendan had 15 pencils so add them together and you get 20. Boy. Flour Problem: I got this answer by first multiplying 1 whole cup x 3 which equals 3 cups. Then I did 1/3 times 3 which equals one whole. 3+1=4. There where 4 cups needed to make 3 batches. Girl. Some students attempted to explain their work, but did not relate the explanation back to the context of the problem. A linguistic explanation that described all the computational steps but lacked context received a linguistic explanation score of 02 points. The first two examples received a score of two points, but the third example did not. This student attempted to explain what he had done, but his explanation used incorrect mathematical terminology and did not address the problem context. String Problem: It is right because 3/4 is equal to 6/8 and 1/8 goes into 6/8 6 times. Girl. Ms. Hernandez Problem: I divided 8 out of 34 and I got 4r 2 ! Boy. Ms. Hernandez Problem: I subtrated [sic] the factor of 34 and 8 and I got the sum of 26. Boy. Other examples of student work illustrate that some children were very confident about the work they had produced even though the work was incorrect. For both of the next examples, the students gave themselves the highest 237 PAGE 253 possible selfassessment score. The first sample contains some correct work, even though the work does not contribute to solving the problem. Making sense of the explanation proves to be as difficult a task as writing it must have been. The second example shows that the child completely misunderstood the question and believed it to be asking what is the difference in the number of pounds instead of what is the difference in the cost per pound. String Problem: I figured this in a difficult way. First, I found 3/4 of a yard is 2 1/4. I took the 1/4 and made it 2/8. I found that 8 1/8 was 1. Double that makes 16 1/8 which is 2. 16+2=18. Boy. Birdseed Problem: There 1 pound difference. Boy For some students, explaining how they know they are correct evokes a statement about their competence in the computational skill required to solve the problem. Statements such as I know my answer is correct because Im very good at mutulbling [sic], or I know its right because Im verry [sic] good with division and subtraction bear this out. Others explained their problemsolving strategies with generic statements such as I looked for keywords and got an idea of what I was supposed to do, or I know Im right I did the math. Some of the most interesting responses were those that provided a glimpse of a childs personality, such as his or her poetic nature, unusual food tastes, or notions about the inherent differences between boys and girls. Ms. Hernandez Problem: How I got this answer was I divided 8 and got to [sic] so then I decided to write it out for myself and you. Girl. 238 PAGE 254 Pizza Problem: Jose coule [sic] be right because if he has onions, broccoli and cheese but Ella has just cheese than [sic] Jose is right. Boy. Pizza Problem: One of there pizzas could be big or small. In this case I make Ellas small because she is a girl and girls some times dont eat as much as boys. So Jose could have eaten more then Ella. Girl. The last examples illustrate an unusual approach to explaining how you know your answer is correct. For these students, an overall feeling of confidence in their abilities, a positive attitude, or faith in a higher power shows them that they are correct in their thinking. Calculator Problem: My two ways are correct because I can belive [sic] in myself to get it right. Girl. Balloon Problem: I know I did this right because I had a really good feeling about this. Boy. Birdseed Problem: The power of math gives me the correct answer I know is right. But also the power of my brain that can work the problem. Girl. Question Four: Findings for SelfEfficacy The findings in this section address the following research question: To what extent is the relationship between students feelings of selfefficacy when commencing work on a mathematical problem, as measured by a selfreported 239 PAGE 255 rating scale, and their actual performance on a mathematical problem solving test as measured by a holistic scoring rubric, related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? The data for this question, students selfefficacy mean bias scores, were obtained through administration of the study test instrument consisting of twelve constructedresponse mathematical problemsolving items. Mean bias is a measure calibration; the accuracy of students selfefficacy feelings. The mean bias was calculated as the average difference between students selfefficacy selfrating and their actual performance. A positive mean bias indicates a situation wherein a students selfefficacy exceeds his/her actual performance on the test overall. A negative meanbias score indicates that the students selfefficacy was, on average, lower than her/his actual performance on the test. To obtain the selfefficacy data, each item contained a selfreporting likerttype scale immediately following the question. The scale, formatted using a range of five faces from very happy to very sad, asked students to rate how well they thought they could solve the problem before they began working on the problem. After the test administration, the students responses were converted to numerical scores ranging from zero to four, with four indicating the highest level of selfefficacy. To obtain each students mean bias score, the average of all twelve mathematical performance scores was subtracted from the average of all twelve selfefficacy scores. The selfefficacy mean bias scores had a potential range of 4 to +4. 240 PAGE 256 Descriptive Statistics Descriptive Analysis of Total SelfEfficacy Although the variable of interest for this research question was selfefficacy mean bias, a clear picture of the data cannot be achieved without an understanding of students total selfefficacy scores. The first of the following three sections details how the students total selfefficacy scores varied according to the grouping variable. The second section explains how the total samples scores varied by test question. The third section looks at group data for each of the twelve questions. Overall analysis of groups. Table 26 shows the mean and standard deviation of students total selfefficacy organized by gender, by reading ability, and by mathematics ability. Boys total scores surpassed girls by 4.89 points, and the standard deviation of their scores was much lower at 6.70 as compared to 8.02 for girls. When mathematics ability was used as the grouping variable, the results show that the students of highest ability had the highest average total self efficacy scores followed by students of middle and low ability. The variability of the group scores was the highest for the low ability group and the lowest for the high ability group. When the sample was grouped according to reading ability, the mean scores decreased as student ability decreased. The variability of scores was the highest for the middle reading ability group. Analysis of overall sample by question. Although the data being analyzed inferentially in this question are students selfefficacy mean bias scores, a look at students actual selfefficacy ratings for each question provides information 241 PAGE 257 Table 26 Total SelfEfficacy Scores by Group Group n M SD By Gender Boys 129 37.95 6.70 Girls 108 33.06 8.02 By Mathematics Ability High 57 40.02 6.32 Middle 129 35.30 7.32 Low 51 32.00 7.93 By Reading Ability High 73 39.11 7.14 Middle 134 34.58 7.62 Low 30 32.60 6.79 Note. N=237, max. score = 48 points. that clarifies the overall selfefficacy picture. Table 27 shows the mean, standard deviation, skewness, and kurtosis for students total selfefficacy scores (possible range of 048) and students selfefficacy scores by question (possible range of 04). The data for students total selfefficacy scores yielded a mean of 35.73, a standard deviation of 7.71, and a relatively normal distribution with skewness and kurtosis values of 0.38 and 0.51 respectively. As expected, the distributions for 242 PAGE 258 Table 27 Descriptive Statistics for SelfEfficacy Scores by Question Question M SD Skewness Kurtosis Flowerpots 3.75 0.52 1.99 3.09 Ms. Hernandez 3.23 1.00 1.46 1.79 Calories 3.41 0.76 1.14 0.68 Jeans Class 3.65 0.73 2.23 4.89 Pizza 2.65 1.27 0.63 0.58 String 1.73 1.30 0.14 1.08 Balloons 3.19 0.83 0.98 1.04 Flour 2.96 1.13 0.92 0.01 Calculator 2.67 1.39 0.79 0.65 School Lunch 3.15 1.05 1.26 0.96 Pencils 2.54 1.42 0.58 0.97 Birdseed 2.79 1.27 0.90 0.17 Total 35.73 7.71 0.38 0.51 Note. N=237, max. score for individual questions = 4 points, max. total score = 48 points. individual questions for which students indicated relatively high feelings of selfefficacy were negatively skewed. Several of the distributions for individual questions were noticeably leptokurtic, a characteristic that is indicative of a spiked curve with many more extreme scores than would be found in a normal distribution. 243 PAGE 259 Students level of selfefficacy for individual questions was a fairly accurate indicator of their success on the problems. The items for which students felt the most and least selfefficacious were the flowerpot and string problems, respectively. These are also the items for which students actual scores were the highest and lowest. For six of the twelve items, the mean selfefficacy score was over 3.0, indicating that students felt fairly confident in their ability to be successful for half of the items. Analysis of groups by question. Descriptive statistics for students bygroup scores on each question are presented in Appendix P. Boys feelings of selfefficacy exceeded that of girls for all twelve test items. There was more variability in girls selfefficacy scores for ten of the twelve items. The only exceptions were the balloon and string questions, the second easiest and the most difficult items, respectively. When the students were grouped by mathematics and reading ability, all groups were consistently the least selfefficacious about the string question. The question that showed the largest rankorder difference when mathematics ability was the grouping variable was the pizza item. When the questions were ranked according to level of selfefficacy for each group, with a rank of 1 being given to the question for which the students in that group had the highest level of selfefficacy, the high, middle and low ability students ranked the pizza question 10 th 3 rd and 9 th respectively. The question that showed the greatest rank order differences when reading ability was the grouping variable was the Ms. Hernandez item. High, middle, and low ability students ranked it 3 rd 5 th and 8 th respectively. 244 PAGE 260 Descriptive Analysis of SelfEfficacy Mean Bias Overall analysis of groups. Table 28 illustrates the relationships between students selfefficacy mean bias scores and their group membership. The bias score for boys, at 0.89, was over threetenths of a point higher than that of girls. Table 28 SelfEfficacy Mean Bias Scores by Group Group n M SD By Gender Boys 129 0.89 0.60 Girls 108 0.58 0.64 By Mathematics Ability High 57 0.49 0.46 Middle 129 0.77 0.63 Low 51 0.99 0.73 By Reading Ability High 73 0.56 0.54 Middle 134 0.78 0.64 Low 30 1.07 0.71 Note. N=237, selfefficacy mean bias scores range from 4 to +4. The variability of boys and girls scores was nearly equal with standard deviations of 0.60 and 0.64. For this sample, although both boys and girls 245 PAGE 261 average mean bias scores were positive and indicated overconfidence in ability to complete the subject test items, girls were more accurate in judging their ability to complete the questions successfully. When the sample was organized according to mathematics ability, the students in the high ability group stood out as the most accurate assessors of selfefficacy with an average mean bias of 0.49. The low ability group rated their own selfefficacy an average of 0.99 points higher than their actual performance on each test item. The standard deviations of the groups were ranked according to ability level with the highest group having the least variability (SD=0.46), and the lowest group having the most variability (SD=0.73). When the grouping variable was changed to reading ability, students of high ability were shown to be much more accurate assessors of their own ability to complete a task than students of middle or low ability. The average mean bias score in the low reading ability group was 1.07, indicating that poor readers tended to rate their ability to complete a task successfully more than one point higher than their actual performance on that task. The standard deviations of the scores among the reading ability groups ranged from a low of 0.54 for the high group to a high of 0.71 for the low group. Although these patterns appear to be strong in the sample data, descriptive statistics alone cannot be used to infer that these differences are also present in the population. The box and whisker plots presented in Figure 4 illustrate that when the students were grouped by gender, the shape of the score distributions was similar. The Hspread was slightly larger for girls, but the first quartile 246 PAGE 262 10.500.511.522.53BoysGirlsH MathM MathL MathH RdgM RdgL RdgGroupSelfEfficacy Mean Bias Scores Q1 min median max Q3 Figure 4. Box and whisker plot of selfefficacy mean bias scores grouped by gender, mathematics ability, and reading ability. represented a wider range for boys. The shape of the plots for the grouping variable of mathematics ability was very different for the high ability group than for the other two levels. Although the Hspread of the three plots is similar, the whisker representing the upper quartile for the high ability group is only a fraction of the length of the other two plots. Even the most overconfident of the high ability students was not nearly as overconfident as some of the members of the middle and low ability groups. The fact that the distribution of scores for the high ability group is centered closer to zero means that these students were much more accurate in their feelings of selfefficacy than the lower ability students. Observation of the plots for the grouping variable of reading ability shows that the 247 PAGE 263 best readers were less likely to be extremely overor underconfident than students of middle or low ability. Analysis of overall sample by question. Table 29 shows the selfefficacy mean bias score averages for the entire test as well as for each individual test item. The mean bias was found by subtracting the mean mathematical performance score from the mean selfefficacy score. As a result, a positive bias score indicates students overconfidence in their ability to complete the item successfully, and a negative score shows that students were underconfident. As shown by the results, students were overconfident in their ability to complete all of the test items successfully. Students were the most accurate in their selfefficacy assessments for the balloons and pizza questions which ranked second and sixth, respectively, in difficulty according to the mathematical performance scores. The birdseed problem, ranked as the second most difficult problem, represented the most extreme level of overconfidence among all the test items. This item asked students to determine the difference in price per pound between two types of seeds. To solve, students had to divide two different bulk prices by the number of pounds to determine a price per pound, then find the difference between the two perpound prices. The analysis of problemsolving processes determined that fully 43% of all students erroneously solved the problem simply by subtracting the two bulk prices to find the difference. If these students believed this to be the correct method, it would explain the extremely high level of overconfidence. 248 PAGE 264 Table 29 SelfEfficacy Mean Bias Scores by Question Question M SD Flowerpots 0.67 0.93 Ms. Hernandez 0.55 1.11 Calories 0.58 1.05 Jeans Class 0.96 1.11 Pizza 0.35 1.60 String 0.84 1.38 Balloons 0.28 1.16 Flour 0.92 1.19 Calculator 0.88 1.47 School Lunch 0.94 1.28 Pencils 0.68 1.41 Birdseed 1.33 1.54 Total 0.75 0.64 Note. N=237, selfefficacy mean bias scores range from 4 to +4. Analysis of groups by question. Appendix Q presents the mean and standard deviation of students scores for individual questions organized by group. A review of the means and standard deviations of boys and girls selfefficacy mean bias scores by question yields some interesting patterns. Boys average mean bias scores were higher than girls on all twelve of the study test 249 PAGE 265 items. The standard deviations of girls scores were higher for nine of the twelve items. Boys standard deviations of scores were higher for the flour, birdseed, and string questions, ranked 8 th, 11 th and 12 th in overall order of difficulty. These results indicate that boys were consistently more overconfident than girls across test items of varying difficulty and context, and that boys bias scores were more tightly gathered around the group mean. Although girls were more accurate in their judgments of selfefficacy for all questions, their higher deviations from the mean indicate less conformity within their group. Inferential Statistics Factorial ANOVA Results In order to suggest that differences in gender, mathematics ability, and reading ability would result in differences in students selfefficacy scores in the population, chance must be ruled out as a plausible explanation for the observed differences in the sample. To assess the possibility that the differences are merely a matter of chance, a 2x3x3 factorial ANOVA was conducted with the alpha level set at .05 for each effect. The degree to which the Type I error rates are actually controlled to the specified alpha level depends on how adequately the data meet the assumptions of independence, normality, and equal variances. The assumption of independence was met by ensuring that different observations came from different students. The study test was administered individually and each student completed her/his work alone. The descriptive statistics indicate that the data were distributed relatively normally, and the assumption of normality was not violated. To assess the assumption of equal 250 PAGE 266 variances, the Levenes test was conducted with a result of F(13, 226)=1.32, p=0.2091. The null hypothesis which states that there are no significant differences in the variances of any of the groups cannot be rejected, indicating that the variances were substantially equal. Based on this analysis of the assumptions, it appeared reasonable to conduct the factorial ANOVA. The cell sizes for this study were not equal. If the disparity in cell sizes was not deemed to be the result of differences in the population, the regression approach (Type III) to analyzing the sums of squares would have been appropriate. However, in accordance with the population, it was expected that most of the students in the sample would be categorized similarly for mathematics and reading, resulting in few students in the cells representing a high rating for one subject and a low rating for another. Because the sample cell distribution was judged to be indicative of the distribution in the population, and not the result of the methodology of this study, either the Type I (hierarchical) or Type II (experimental) method of computing sums of squares was in order. The hierarchical method is reserved for situations in which there is an established order for the effects. That was not the case for this question. It is clear that gender, as a demographic variable, would be placed first in the ordering because, although gender may affect students mathematics and reading abilities, the reverse argument is not viable. The issue is that the literature on the relationship between reading ability, mathematics ability, and selfefficacy is nearly nonexistent, resulting in the lack of a theoretical basis for ordering the 251 PAGE 267 variables. For this reason, the Type II experimental method for computing sums of squares was utilized for the factorial ANOVA for this question. The obtained F(12,224)=4.26, p<.0001 for the overall Ftest indicates that one or more of the independent variables is significantly related to students selfefficacy mean bias scores. The Rsquare value of 0.1857 indicates that almost 19% of the variance in the scores can be explained by a combination of one or more of the independent variables. The results of the main and interaction effects for the threeway factorial ANOVA are presented in Table 30. The results of the factorial ANOVA for selfefficacy mean bias indicate that the main effects of gender F(1,236)=23.59, p=<.0001 and mathematics ability F(2,236)=5.13, p=0.0066 were both statistically significant. As a result, the null hypotheses that gender and mathematics ability were not related to students selfefficacy mean bias scores are both rejected. The Ftest failed to reject all of the remaining null hypotheses for main and interaction effects. The omega squared values indicate that the combination of these two effects accounts for an estimated 11% of the variance in selfefficacy mean bias in the population. The Cohens f value for gender is mediumtolarge at 0.31, and the Cohens f for mathematics ability, at 0.21, shows a mediumsized effect. Two of the interaction effects (gender*reading ability and reading*mathematics ability) showed small effect sizes according to the calculation of Cohens f. However, the F test did not determine that these differences were statistically significant. Whether or not the observed differences in selfefficacy mean bias were related to these interactions cannot be determined in this study. 252 PAGE 268 Table 30 Factorial ANOVA Results for SelfEfficacy Mean Bias Scores Source df MS F p 2 f Gender 1 8.23 23.59 <.0001 0.08 0.31 Reading 2 0.29 0.85 0.4309 0.00 0.08 Gender*Reading 2 0.54 1.55 0.2143 0.00 0.11 Mathematics 2 1.79 5.13 0.0066 0.03 0.21 Gender*Math 2 0.05 0.15 0.8640 0.01 0.04 Reading*Math 3 0.47 1.36 0.2587 0.00 0.13 Gender*Reading*Math 1 0.24 0.68 0.4094 0.00 0.05 Note. N=237, 2 = omega squared, f=Cohens f measure of effect size. *p<.05. In the context of this study, there is no way to know whether an increased sample size, and the resulting increased power, would have resulted in these interactions proving to be statistically significant. Because two of the main effects were significant but the interaction effects were not significant, the Tukey followup procedure is an appropriate measure for determining whether the differences between all levels of mathematics ability were significant. As with the results for linguistic explanations, the followup test for selfefficacy indicates that the differences in the means between the high and low mathematics ability students (0.51) and between the high and middle ability students (0.28) were significant. The group means for selfefficacy mean bias of students of high, middle, and low mathematics ability were 0.49, 0.77, and 0.99, 253 PAGE 269 respectively. By combining the information gleaned from the Tukey test and the group means, the results indicate that students of high mathematics ability were significantly more accurate in their feelings of selfefficacy than were students of middle or low mathematics ability. Although students of all ability groups were overconfident in their feelings of selfefficacy, students of lower mathematics ability were more overconfident than students of high ability. Because the variable of gender only has two levels, a review of the means for boys and girls selfefficacy mean bias is the only followup to the Ftest that is required. The average mean bias for boys was 0.89, indicating that, on average, boys rated their selfefficacy 0.89 points higher than their actual performance. In contrast, the average of girls selfefficacy mean bias of 0.58 indicates that, on average, girls rated their selfefficacy 0.58 points higher than their actual performance. These results show that both boys and girls were overconfident in their ability to complete the test items successfully, but boys were significantly more overconfident than girls. Although reading ability was related to students selfefficacy mean bias scores in the sample, the main effect for reading ability in the factorial ANOVA was not significant at F(2, 236)=0.85, p=.4309. The size of the sample may have been too small for the effect to reach significance. As a result, the null hypothesis that reading ability is not related to selfefficacy mean bias cannot be rejected. 254 PAGE 270 Multiple Regression Analysis Introduction. The predictors for this analysis of 237 students were gender, FCAT reading scores, and FCAT mathematics scores. Both of the FCAT scores range from 100500, and the abbreviations used for this analysis were FCATR and FCATM for reading and mathematics respectively. The outcome variable was selfefficacy mean bias, abbreviated as SEMBIAS, and had a potential range of 4 to +4. Students mean bias scores were found by subtracting their average selfefficacy rating from their average mathematics performance score. A negative score indicates underconfidence, and a positive score shows overconfidence. The closer a mean bias score is to zero, the closer the childs selfefficacy ratings matched their actual performance. Because the prediction variables are the same for this analysis as for that of question one, the descriptive data is not repeated here. The multiple regression was initially run with the three main effects and two interaction effects, gender*FCATM and gender*FCATR. Because the interactions were not significant and do not serve to illuminate the relationships among the main effects and the outcome variable, they were not included in the final analysis or discussion. A table showing the results of the multiple regression with the interaction effects is provided in Appendix T. Multiple regression results. Table 31 presents a summary of the multiple regression analysis. All three variables, students gender, FCAT mathematics scores, and FCAT reading scores, contributed significantly to the ability of the regression equation to accurately predict students selfefficacy mean bias. The 255 PAGE 271 R 2 value for the model was 0.2068, indicating that nearly 21% of the variability in students selfefficacy mean bias was explained by the combination of the three predictors. Because R 2 tends to be an overly optimistic estimation of the models ability to predict the outcome variable in the population, the adjusted R 2 was observed. The adjusted value, at 0.1966, represents about a one percentage point decrease in the degree to which the regression equation is believed to be able to account for the variability of SEMBIAS in the population. Table 31 Summary of Multiple Regression Analysis for SelfEfficacy Mean Bias Variable b SE t p Intercept 3.0503 0.0 0.3315 9.20* <.0001 Gender 0.3381 0.2646 0.0779 4.34* <.0001 FCAT Mathematics Score 0.0026 0.1746 0.0012 2.16* 0.0319 FCAT Reading Score 0.0038 0.2484 0.0012 3.13* 0.0020 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. Cohens effect size f 2 = R 2 /(1R 2 ) was computed to be 0.2607, signifying a medium to large effect size. The standard error of estimate was 0.5717, which means that when the three predictors are used to predict a students selfefficacy mean bias, the resulting outcome will err an average of 0.5717 points. 256 PAGE 272 The obtained prediction equation was: SEMBIAS = 3.0503 + .3381*Gender + .0026*FCATM + .0038*FCATR To interpret the equation, it is useful to examine the impact on the outcome when one variable is changed and the other two remain constant. For example, given one boy and one girl student with identical FCATM and FCATR scores, the girls predicted SEMBIAS score will be 0.3381 points lower than the boys. Considering that the intercept is a large positive number, the girls predicted SEMBIAS will be closer to zero, indicating a more accurate level of selfefficacy than the boys. If gender and FCATR are held constant, a 100point increase in a students FCAT mathematics score will result in a 0.26 point decrease in the predicted selfefficacy mean bias score. This reduction represents an improvement in the level of accuracy of the childs selfefficacy ratings. If gender and FCATM are held constant, a 100point increase in a students FCAT reading score will result in a predicted 0.38 point decrease in his/her SEMBIAS score. The standardized regression coefficients for the predictors provide a means of comparing their relative impact on the outcome variable by expressing the coefficients in terms of standard deviation units. The variable with the highest standardized coefficient was gender (0.2646), meaning that other things being equal, boys SEMBIAS scores will be 0.2646 standard deviations higher than girls. The second highest coefficient was 0.2484 for FCATR, indicating that a 1 standard deviation increase in the FCAT reading score will result in a 257 PAGE 273 0.2484 standard deviation decrease in the students predicted SEMBIAS score, holding the other variables constant. This also represents an improvement in the accuracy of selfefficacy. The smallest coefficient was found for FCATM (0.1746), and can be interpreted to mean that a 1 standard deviation increase in a students FCAT mathematics score will result in a predicted 0.1746 standard deviation decrease in a students SEMBIAS score. The squared semipartial correlations were also examined to find each predictors unique ability to account for the variability in the outcome variable. The correlations were 0.06412 for gender, 0.01586 for FCATM, and 0.03328 for FCATR. These numbers mean that these variables uniquely accounted for 6.4%, 1.6%, and 3.3% of the variability in the SEMBIAS scores, respectively. Assumptions. The data were screened to ensure that none of the assumptions on which multiple regression analyses are based were violated. The assumption of homoscedasticity of errors says that the errors produced by the prediction equation should be no bigger, on average, at one end of the regression line than they are at the other. An examination of a plot of the residuals against the predicted values confirms that this assumption was not violated. The residuals were found to be normally distributed with very low skewness and kurtosis values (sk=0.01, ku=0.08). Because the residual plot did not form a curve, the assumption of linearity was not violated. There was nothing in the design of the study that would lead to the conclusion that the residuals were not independent. A review of the Cooks D values did not lead to the removal of any outliers beyond those that were eliminated through the descriptive 258 PAGE 274 statistics analysis. The predictor variables are not believed to have a level of error that would violate that assumption. Finally, although the predictors may not be fixed, regression analysis is robust to violations of this assumption. In conclusion, the analysis of assumptions appears to have been an appropriate means of analyzing the data for this question. Conclusion. All three predictor variables were found to contribute significantly to the regression equations predictive ability, with gender being the variable of the highest contribution, followed by FCAT reading scores and FCAT mathematics scores, in that order. The findings related to gender and mathematics ability are not surprising as they confirm the results of the factorial ANOVA. The findings related to students FCAT reading scores were unexpected, and lead to further questions. Because the measure of reading ability for the factorial ANOVA was based on a combination of FCAT scores and teacher ratings, and because students were forced into three discrete levels of ability regardless of the range of their differences, the single continuous variable of FCAT reading scores may have proven to have a stronger relationship to SEMBIAS than the categorical variable of reading ability level. Qualitative Analysis of Students Rationale for SelfEfficacy Ratings The student interviews revealed some interesting observations about the source of some students selfefficacy. The verbal explanations of one female student indicated a stronger relationship between her confidence in her ability to solve the problem and her familiarity with the context of the problem, than with the mathematics involved in solving the problem. Several excerpts from her 259 PAGE 275 interview illustrate this point. When asked why she circled the happy face for selfefficacy for the pizza problem she explained: I mean, like, almost like all pizzas are different so like they could have ate at a different restaurant or one could have had more toppings than the other so thats how I kinda know that I might do really well on this one cause I eat pizza a lot. The flour problem involves calculating the amount of flour required for three batches of cookies, given the amount used for one batch. She explained her choice of the middle face for selfefficacy by saying: Well, one, one and one third cups of flour are needed in each batch of cookies. Well, in one and one third I think that like cause I dont do like a lot of cooking and sometimes its like a little bit taller than like what it should be so for like each batch of cookies The calculator problem tests students understanding of place value more so than the use of the calculator, but this student lacked confidence in her ability to solve the problem because I dont normally use a calculator so I didnt really know that much about it. Her lack of confidence was not indicative of her lack of understanding of the underlying mathematical concept as evidenced by her correct response of solving the problem by adding one hundred. She also lacked confidence in her ability to solve the school lunch money test item because I dont normally do money with my mom like for lunch so like she just puts it on a check like or something and like how could a mother have only one dollar bills? Mathematically, this problem involved multiplication and addition, and the student 260 PAGE 276 had no difficulty completing all the work correctly. According to her verbal responses, her low level of selfefficacy was based not on her ability to do the math but on her comfort level and familiarity with the context of the problem. Another female interview subjects selfefficacy was primarily tied to her experiences of working on math with her father. In explanation for circling a happy face for selfefficacy she answered, I just thought it was easy and I usually do math at home with my dad all the time. For another problem where she circled the kindofsad face, she explained Because when I used to, when I told you I used to do my math with my dad, this one got a little trickier and then I knew I was going to have trouble on this one. Her remarks indicate that her level of selfefficacy was somehow tied to an experience of doing a similar math problem with her father. One male interview subject expressed overconfidence before, during, and after providing explanations of faulty problemsolving processes. For example, he circled the slightly happy face for the string problem and provided this explanation for his work: I did three fourths minus one eighth which would equal two fourths because eight minus four is four and three minus one is two. And it says like he has Jim has three fourths of a yard of string and he wishes to divide it into one eighth of a yard of string so I just subtracted those two and got two fourths. He circled a happy face for selfassessment because, After I did it I was pretty sure I knew what I did. 261 PAGE 277 For the flour problem, the same male student circled the kindofhappy face for selfefficacy because I wasnt really sure what I was doing. His explanation of his problemsolving processes proved that reply to be an understatement. Well it says she needs three batches of cookies and one and three, one third, one whole and one third, and its saying what, what number you have to add it by so I did if you have to make it into a whole it would have to be nine and seven, because if you put like eight and six it would come up to be nine sevenths, and it said she wanted to make three batches. Based on his written work and verbal explanation, it appears that he perceived this as a missing addend addition problem. Not only was his strategy incorrect, but his computational methods were also flawed. His selfassessment was another kindofhappy face because I wasnt sure if I got it exactly right. This same student circled the really happy face for selfefficacy and selfassessment for the birdseed problem even though his strategy of subtracting the two prices was inaccurate. In explanation for his selfassessment, he responded I was positive I knew what I was doing on this and I was positive I got the answer correct. The other five interview subjects provided explanations for their measures of selfefficacy that were closely tied to their perceptions of their ability to answer the specific problems successfully. 262 PAGE 278 Question Five: Findings for SelfAssessment The findings in this section address the following research question: To what extent is students ability to assess their own performance on mathematical problem solving tasks as measured by a selfreported rating scale related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? The data for this question, students selfassessment mean bias scores, were obtained through administration of the study test instrument consisting of twelve constructedresponse mathematical problemsolving items. Students selfreported their selfassessment by circling one of five faces at the bottom of the work space when they were done working each problem. Mean bias is a measure of calibration which shows the accuracy of students selfassessments by finding the average difference between their selfassessment ratings and their actual performance. A positive mean bias indicates a situation where a students average rating of his/her own performance exceeds his/her actual performance average. A negative meanbias score indicates that the students selfassessment was, on average, lower than her/his actual performance on the test. Each item contained a selfreporting likerttype scale at the bottom of the page following the work space to obtain selfassessment data. The scale, formatted using a range of five faces from very happy to very sad, asked students to rate how well they thought they solved the problem after their work was complete. After the test administration, the students responses were 263 PAGE 279 converted to numerical scores ranging from zero to four, with four indicating the highest level of selfassessment. To obtain each students mean bias score, the average of all twelve mathematical performance scores was subtracted from the average of all twelve selfassessment scores. The selfassessment mean bias scores had a potential range of 4 to +4. Descriptive Statistics Descriptive Analysis of Total SelfAssessment Scores Although the variable of interest for this research question was selfassessment mean bias, a clear picture of the data cannot be achieved without an understanding of students total selfassessment scores. The first of the following three sections details how the students total selfassessment scores varied according to the grouping variable. The second section explains how the total samples scores varied by test question. The third section looks at group data for each of the twelve questions. Overall analysis of groups. The means and standard deviations of students total self assessment scores, organized by gender, reading ability, and mathematics ability, are presented in Table 32. Based on twelve items with a maximum selfassessment score of four points on each item, a student who judged his/her work to be perfect would have a total score of 48. The data show that boys total scores were an average of more than four points higher than girls. Boys lower standard deviation also means that their scores were less dispersed than girls. When the students were grouped according to mathematics ability, the data show that students categorized as having high mathematics ability 264 PAGE 280 scored their work higher than students of lower ability. The variability of the scores follows the same trend with the lowest standard deviation being evidenced in the high ability group, and the highest deviation being found in the low ability group. When the grouping variable was changed to reading ability, the data show that better readers rated their work higher than poor readers, however, the dispersion of scores was the highest for the middleability readers. Table 32 Total SelfAssessment Scores by Group Group n M SD By Gender Boys 129 38.91 6.48 Girls 108 34.67 8.20 By Mathematics Ability High 57 41.40 5.57 Middle 129 36.78 7.24 Low 51 32.53 7.80 By Reading Ability High 73 40.42 6.36 Middle 134 35.83 7.77 Low 30 33.70 6.73 Note. N=237, max. score = 48 points. 265 PAGE 281 Analysis of overall sample by question. Table 33 shows the mean, standard deviation, skewness and kurtosis for students total selfassessment Table 33 Descriptive Statistics for SelfAssessment Scores by Question Question M SD Skewness Kurtosis Flowerpots 3.87 0.45 4.41 21.87 Ms. Hernandez 3.51 0.87 2.13 4.78 Calories 3.41 1.07 2.00 3.17 Jeans Class 3.69 0.71 2.84 8.78 Pizza 2.90 1.21 0.99 0.11 String 1.70 1.52 0.25 1.41 Balloons 3.29 0.88 1.29 1.46 Flour 2.94 1.27 1.07 0.05 Calculator 2.67 1.47 0.79 0.79 School Lunch 3.40 0.98 1.83 3.02 Pencils 2.64 1.60 0.72 1.14 Birdseed 2.96 1.33 1.07 0.05 Total 36.97 7.60 0.63 0.12 Note. N=237, max. score for individual items = 4 points, max. total score = 48 points. scores (possible range of 048) and their selfassessment scores for each question (possible range of 04). The distribution of students total selfassessment scores is roughly normal with a skewness value of 0.63 and a 266 PAGE 282 kurtosis value of 0.12. Most of the distributions for individual questions are negatively skewed and leptokurtic, indicating a preponderance of high scores, and a spiked distribution. This is especially true for the items for which students rated their work the highest. The items for which students selfassessment scores were the highest (flowerpot) and lowest (string) were identical to those for which students received the highest and lowest mathematical performance scores. The flowerpot item is also characterized by the smallest level of variability, with the pencil item showing the greatest variability of selfassessment scores. Analysis of groups by question. Descriptive statistics for students bygroup scores for each question are presented in Appendix R. Boys were more confident than girls about the success of their problemsolving efforts for eleven of the twelve questions, but had the same selfassessment mean for the pizza item. In addition, the variability of girls scores was higher for ten of the twelve test items, the only exceptions being the pizza and string questions. Both boys and girls felt the most confident about their work on the flowerpot question and the least confident about their work on the string question. When grouped according to mathematics ability, students had the most varied rank order for the balloon question. With a rank order of one representing the question for which a group felt the most confident in their work, the high, middle, and low ability groups ranked the question 9 th 6 th and 3 rd respectively. When students were grouped according to reading ability, the greatest variability 267 PAGE 283 in rank order was found for the calories question with the high, middle, and low ability groups ranking it 5 th 3 rd and 7 th respectively. Descriptive Analysis of SelfEfficacy Mean Bias Overall analysis of groups. Students selfassessment mean bias scores behaved in a consistent pattern when sorted by groups (Table 34). As with the selfefficacy mean bias results, boys were more overconfident in their selfassessments than girls. Girls showed themselves to be more accurate selfassessors, but their bias scores had about the same level of dispersion about the mean as the boys. When the sample was grouped according to mathematics ability, the mean bias for the highability students was only slightly more than half that of the low ability students. The variability of the scores was also the highest for the low ability group. These results illustrate that students of higher mathematics ability were more accurate in their selfassessments than lowability students. When viewed through the lens of reading ability, the data show that better readers were more accurate in their selfassessment of their mathematical problemsolving efforts than were middle or low readers. Low readers, on average, scored their work over one point higher than their actual performance scores, and the variability of low readers scores was higher than for middle or high ability readers. Whether or not these differences are large enough that inferences can be made about the population remains to be seen in the inferential analysis. 268 PAGE 284 Table 34 SelfAssessment Mean Bias Scores by Group Group n M SD By Gender Boys 129 0.97 0.56 Girls 108 0.71 0.58 By Mathematics Ability High 57 0.60 0.39 Middle 129 0.89 0.59 Low 51 1.04 0.66 By Reading Ability High 73 0.67 0.49 Middle 134 0.88 0.58 Low 30 1.16 0.67 Note. N=237. Range of mean bias scores is from 4 to +4. The box and whisker plots shown in figure 5 illustrate the differences in the distributions of the selfassessment mean bias scores when the sample was grouped according to three different grouping variables: gender, mathematics ability, and reading ability. The shape of the distribution of scores for boys and girls is very similar, with girls distribution being situated closer to zero (representing perfectly accurate mean bias) than boys. The plots showing the sample as grouped by mathematics ability illustrate a consistent pattern with the Hspread of the high ability students being closer to zero than either the middle 269 PAGE 285 or low ability groups. The whiskers for the three groups illustrate the same phenomenon observed in the selfefficacy plots, with the upper and lower quartiles for the middle and low ability group reaching farther into the underand overconfidence ranges than for high ability students. When the sample was regrouped according to reading ability, the Hspread for the three groups followed the same pattern as was found when the students were grouped according to mathematics ability; the higher ability students were more accurate in their selfassessments. In addition the box representing the Hspread of the high ability students was narrower than the other two boxes. The most underconfident 1.510.500.511.522.53BoysGirlsH MathM MathL MathH RdgM RdgL RdgGroupSelfAssessment Mean Bias Scores Q1 min median max Q3 Figure 5. Box and whisker plots of selfassessment mean bias scores grouped by gender, mathematics ability, and reading ability. 270 PAGE 286 student was found in the middle ability group, and the most overconfident student was situated in the lowest ability group. It is interesting to note that none of the students in the low readingability group were underconfident. Analysis of overall sample by question. Table 35 shows the selfassessment mean bias score average and standard deviation for the entire test Table 35 SelfAssessment Mean Bias Scores by Question Question M SD Flowerpots 0.80 0.86 Ms. Hernandez 0.83 0.93 Calories 0.58 1.15 Jeans Class 1.01 1.10 Pizza 0.59 1.36 String 0.80 1.38 Balloons 0.38 1.14 Flour 0.90 1.26 Calculator 0.88 1.40 Lunch 1.18 1.08 Pencils 0.76 1.29 Birdseed 1.50 1.47 Total 0.85 0.58 Note. N=237, mean bias scores can range from 4 to +4. 271 PAGE 287 as well as for each individual test item. The mean bias was found by subtracting the mean mathematical performance score from the mean selfassessment score. As a result, a positive bias score indicates students overconfidence in how successfully they solved an item, and a negative score indicates students underconfidence. As with the results from the selfefficacy data which measured students taskspecific confident before commencing work on each problem, students were also overconfident in their selfassessment of how well they actually performed for all twelve test items. Also, as with the selfefficacy results, students were the most overconfident about their performance on the birdseed problem, with a selfassessment score a point and a half higher than the average mathematical performance score. Students were the most accurate in their selfassessment on the balloon item, followed by the calories and pizza items. Interestingly, the mean bias score for selfassessment was higher than the mean bias score for selfefficacy for eight of the items. The measures were the same for the calories and calculator items, and the selfefficacy mean bias was higher than the selfassessment mean bias for the string and flour items, both of which involved fractions. These results indicate that, on average, students felt more confident about the outcome of their problemsolving efforts than they felt about their ability to solve the problem before they began working. This is especially true for the Ms. Hernandez, pizza, and school lunch items whose average selfassessment mean bias scores were more than 0.20 points higher than their selfefficacy mean bias scores. 272 PAGE 288 Analysis of groups by question. Appendix S presents the mean and standard deviation of students selfassessment mean bias scores organized according to group. Girls were overconfident about their performance on all twelve test items. Girls were the most overconfident about their work on the birdseed item, followed by the school lunch and Ms. Hernandez questions. Boys were also overconfident for all test items, but their selfassessments were the most accurate for the balloon item. Like girls, boys highest level of overconfidence was exhibited on the birdseed and school lunch items. Inferential Statistics Factorial ANOVA Results In order to suggest that differences in gender, mathematics ability, and reading ability would result in differences in students selfassessment mean bias scores in the population, chance must be ruled out as a plausible explanation for the observed differences in the sample. To assess the tenability of a chance explanation, a 2x3x3 factorial ANOVA was conducted with an alpha level set at .05 for each effect. The degree to which the Type I error rates are actually controlled to the specified alpha level depends on how adequately the data meet the assumptions of independence, normality, and equal variances. The assumption of independence was met by ensuring that different observations came from different individuals. The study test was administered individually and each student completed her/his work alone. The descriptive statistics indicate that the assumption of normality was not violated. To assess the assumption of equal variances, the Levenes test for equal variance was 273 PAGE 289 conducted with a resulting F(13, 224)=1.35, p=.1904. Because the pvalue is above the alpha level of .05, the null hypothesis that the variances of the group scores are equal cannot be rejected, and the assumption of equal variances is not violated. Based on this analysis of the assumptions, it appeared reasonable to conduct the factorial ANOVA. The cell sizes for this study were not equal. If the disparity in cell sizes was not judged to be the result of differences in the population, the regression approach (Type III) to analyzing the sums of squares would have been appropriate. However, in accordance with the population, it was expected that most of the students in the sample would be categorized similarly for mathematics and reading, resulting in few students in the cells representing a high rating for one subject and a low rating for another. Because the sample cell distribution was judged to be indicative of the distribution in the population, either the Type I (hierarchical) or Type II (experimental) method of computing sums of squares would be more accurate. The hierarchical method is only used for situations in which there is an established order for the effects. That is not the case for this question. It is clear that gender, as a demographic variable, would be placed first in the ordering because although gender may affect students mathematics and reading abilities, the reverse argument does not hold true. The issue is that the literature on the relationship between reading ability, mathematics ability, and selfassessment is so undeveloped that there is no theoretical basis for ordering the variables. For this reason, the Type II 274 PAGE 290 experimental method for computing sums of squares was utilized for the factorial ANOVA for this question. The obtained F(13,224)=4.22, p<.0001 for the overall Ftest indicates that one or more of the independent variables was significantly related to the mathematics performance scores. The Rsquare value of 0.1843 indicates that over 18% of the variance in the selfassessment mean bias scores can be explained by one or more of the independent variables. The results of the main and interaction effects for the threeway factorial ANOVA are presented in Table 36. The findings of the factorial ANOVA for selfassessment mean bias indicate that the main effects of gender with F(1,236)=18.92, p=<.0001, and mathematics ability with F(2,236)=3.85, p=0.0226 were statistically significant. The low pTable 36 Factorial ANOVA Results for SelfAssessment Mean Bias Scores Source df MS F p 2 f Gender 1 5.51 18.92 <.0001 0.07 0.28 Reading 2 0.65 2.22 0.1111 0.02 0.18 Gender*Reading 2 0.18 0.63 0.5345 0.00 0.11 Mathematics 2 1.12 3.85 0.0226 0.01 0.14 Gender*Math 2 0.41 1.42 0.2428 0.00 0.07 Reading*Math 3 0.21 0.63 0.4877 0.01 0.09 Gender*Reading*Math 1 0.01 0.03 0.8644 0.00 0.01 Note. N=237, 2 = omega squared, f=Cohens f measure of effect size. *p<.05. 275 PAGE 291 values of <.0001 and .0226 for gender and mathematics ability suggest that if the null hypotheses were true, the probability of obtaining an F as large or larger than the one obtained would be <.0001 for gender and .0226 for mathematics ability. Because these probabilities are so small (less than .05), the null hypotheses are rejected in favor of alternative hypotheses. Because the variable of gender has only two levels, it can be concluded that the population group means differ. Mathematics ability is a threelevel variable, therefore further analysis is required to determine which pair or pairs of group means differ by a significant amount. The omega squared values of 0.07 and 0.02 for gender and mathematics ability, respectively, indicate that an estimated combined total of 9% of the variation in the population selfassessment mean bias is associated with these two variables. Gender exhibits a medium effect size with a Cohens f value of 0.28, and the Cohens f value for mathematics ability is small to medium at 0.18. Because the main effect for mathematics ability was significant, but the interaction effect was not, the Tukey followup test is an appropriate means of identifying the significance of differences in group means. The group means for selfassessment mean bias for the high, middle, and low mathematics ability groups were 0.60, 0.89, and 1.04, respectively. The difference between the mean of the high and the low groups was significant at 0.44 points, as was the difference in the means between the high and the middle mathematics ability groups at 0.29. The difference between the means of the middle and low groups was too low to be significant. These results show that students who were in the high mathematics ability group were significantly more accurate assessors of 276 PAGE 292 their own work than students in the middle and low ability groups. Accordingly, the students of lower mathematics ability were more likely to be overconfident in their mathematical performance. The descriptive data for the main effect of reading ability indicated a large difference in group means for the sample. Although the factorial ANOVA did not find reading ability to be significant at an alpha of .05, its effect size was larger than that of mathematics ability. A larger sample size for this study would have increased the power of the F test to detect the significance of this effect, which may or may not have resulted in it being classified as significant. Multiple Regression Analysis Introduction. The three predictors for this analysis were gender, FCAT reading scores, and FCAT mathematics scores. The FCAT scores ranged from 100500 points, and were abbreviated as FCATR and FCATM for reading and mathematics scores, respectively. The sample consisted of 237 students, 108 girls and 129 boys. The purpose of the analysis was to determine what level of the predictors would provide the most accurate prediction of students selfassessment mean bias scores. This outcome variable was abbreviated as SAMBIAS, and had a potential range of 4 to +4. A score of 4 indicates a student who scored their own work an average of four points lower (on a 4point scale) than her/his actual performance. A score of +4 could only be produced if a student scored his/her work for every problem a perfect 4, but had a actual score of zero for every problem. The most accurate selfassessors had selfassessment mean bias scores close to zero, with a positive score indicating 277 PAGE 293 overconfidence, and a negative score signifying underconfidence. Because the descriptive data for these three predictors was analyzed in the multiple regression analysis for question two, that information will not be repeated here. The multiple regression test was initially run with the three main effect predictors and two additional interaction predictors, gender*FCATM and gender*FCATR. Because the interactions were not significant, and their inclusion obscured the relationships among the remaining variables, they were excluded from the analysis and discussion that follows. A table summarizing the multiple regression results with the interactions is provided in Appendix T. Multiple regression results. A summary of the multiple regression results is presented in Table 37. The variables of gender and FCAT reading scores were found to contribute significantly to the predictive utility of the regression equation. Students FCAT mathematics scores did not provide a significant contribution. If there was any significant cost or time expenditure required to obtain this data, it should be eliminated from the model. However, collecting FCATM in addition to FCATR would not require much additional effort, and because FCATM does have a slight impact on the equations predictive utility, it will remain in the model. The R 2 value for the model was 0.2017, indicating that about 20% of the variability in students selfassessment mean bias scores is accounted for by the combination of the three predictor variables. Because R 2 tends to be an overly optimistic estimation of the regression equations ability to predict the outcome variable in the population, the adjusted R 2 was observed. The adjusted value, at 278 PAGE 294 0.1915, represents shrinkage of one percentage point. Cohens effect size f 2 = R 2 /(1R 2 ) was calculated to be 0.2527, indicating a medium to large effect size. The standard error of estimate, as a measure of the typical prediction error, was 0.52325 points. Table 37 Summary of Multiple Regression Analysis for SelfAssessment Mean Bias Variable b SE t p Intercept 2.9636 0.0 0.3034 9.77* <.0001 Gender 0.2720 0.2333 0.0713 3.81* 0.0002 FCAT Mathematics Score 0.0019 0.1398 0.0011 1.72 0.0863 FCAT Reading Score 0.0040 0.2868 0.0011 3.60* 0.0004 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. The obtained prediction equation was: SAMBIAS = 2.9636 + .2720*Gender + .0019*FCATM + .0040*FCATR To aid in interpreting the equation, it is helpful to examine the impact on SAMBIAS of changing one of the predictor variables while holding the other two constant. To understand the impact of gender on the outcome, compare the SAMBIAS scores of two students, one boy and one girl, whose FCATM and 279 PAGE 295 FCATR scores were identical. The girls predicted SAMBIAS would be .2720 points lower than the boys. Because the intercept is a large positive number, and because SAMBIAS scores closer to zero indicate a greater level of selfassessment accuracy, this shows that, other things being equal, girls will have predicted SAMBIAS scores that indicate a more accurate level of selfassessment than boys. If gender and FCATR are held constant, a 100point increase in FCATM will result in a predicted 0.19 decrease in SAMBIAS, again indicating a more accurate level of selfassessment. If gender and FCATM are held constant, a 100point increase in FCATR will result in a 0.40 point predicted decrease in SAMBIAS. The standardized regression coefficients are expressed in terms of standard deviation units and therefore allow a direct comparison of each predictors level of impact on the predicted outcome. The variable with the highest standardized coefficient was FCATR, indicating that a 1 standard deviation increase in FCATR will result in a predicted 0.2868 point decrease in SAMBIAS. The next variable of greatest impact was gender, with a standardized regression coefficient of 0.2333. The variable of least impact was FCATM, with a regression coefficient of 0.1398. This means that a 1 standard deviation increase in FCATM would result in a 0.1398 standard deviation decrease in predicted SAMBIAS. The squared semipartial correlations were examined to determine each predictor variables unique contribution to the models ability to account for the variability in the outcome variable. The correlations were .04984 for gender, .01017 for FCATM, and .04433 for FCATR. These figures can be 280 PAGE 296 interpreted to mean that the variables uniquely accounted for 5.0%, 1.0%, and 4.4% of the variability in SAMBIAS, respectively. Assumptions. Regression analyses are based on many assumptions that should not be violated. The data were screened and an evaluation of the tenability of each of the assumptions was conducted. To assess the assumption of homoscedasticity of errors, the residuals were plotted with the predicted variables. No patterns appeared in the plot, indicating that this assumption was not violated. The residuals were also found to represent a normal distribution with skewness and kurtosis values of 0.06 and 0.18, respectively. A further analysis of the residual plot shows a linear relationship between the values, and based on the design of the study, there is no reason to believe that the residuals are not independent. Because the outliers were removed prior to the regression analysis through observation of the descriptive data, the evaluation of the students Cooks D values did not identify any additional outliers. Another assumption is that the predictors were measured without error. The variable of gender was selfreported, and no incidence of error has been detected. Due to the stringent testing conditions for the FCAT test, the measurement error associated with these two predictor variables is not deemed to be in violation of the assumption. The predictors of FCATM and FCATR cannot be considered fixed because the values may not be the same if the study is replicated. However, this is the case in many studies and multiple regression is considered robust to violations of this assumption. In conclusion, based on the analysis of 281 PAGE 297 the assumptions, multiple regression was an appropriate procedure for analyzing the data for this question. Conclusion. Students gender and FCAT reading scores were found to be significant predictors of their selfassessment mean bias scores. Their inclusion in the regression equation significantly increased the equations ability to predict the outcome variable. Although the variable of FCAT mathematics scores was not a significant contributor, it did produce a slight increase in the overall R 2 value. Because there was not a significant cost or time factor in gathering this data, there is no harm and a small benefit in leaving it in the model. The regression analysis confirms the finding of the factorial ANOVA for gender. However, the factorial ANOVA found mathematics ability level to be significantly related to SAMBIAS, and reading ability level not to be significantly related. The differences in the outcomes may be due to the different ways that ability levels were defined for the two analyses. For the factorial ANOVA, students FCAT level scores were combined with their teachers ratings of their ability to arrive at categorical ability level of high, middle, or low. For the regression analysis, the continuous FCAT scale scores were used as a measure of ability level. The categorization of ability into three discrete categories likely resulted in some students whose FCAT scale scores were nearly identical but fell on two different sides of the dividing line to be categorized in different ability levels. 282 PAGE 298 Teachers Assessment of Their Students Performance The data gathered from the teachers regarding their students preparedness to be successful on the test items was used to calculate a mean bias score. The actual mean of students mathematical performance score for each question (04 scale) was subtracted from the mean of teachers assessment of student preparedness (04 scale). The results indicate that the teachers, as well as the students, were overconfident, but not to the same degree. The students average mean bias score was 0.85, compared to the teachers average mean bias score of 0.40. Just as partitioning the student sample by gender revealed that boys were more overconfident than girls in their work, a partitioning of the teachers by gender showed that male teachers overconfidence in their students work (mean bias score of 0.57) exceeded that of female teachers overconfidence (mean bias score of 0.32). 283 PAGE 299 CHAPTER 5 SUMMARY, CONCLUSIONS, IMPLICATIONS AND RECOMMENDATIONS The purpose of this study was to examine how gender, reading ability, and mathematics ability differentially moderated students mathematical problemsolving processes, linguistic explanations of those solution processes, achievement on a mathematical problemsolving test, selfefficacy on a selfreported rating scale, and selfassessment on a selfreported rating scale. This chapter contains four sections. The first section presents a summary of the study. The second section describes the conclusions and implications derived from the research findings, and is organized according to research question. The third section discusses the recommendations for practice based on the study conclusions and implications. The fourth and final section offers recommendations for future research. Summary of the Study Mathematical problem solving is a complex process that involves much more than simple calculation. For a child to complete a problemsolving task successfully, he/she must read and understand the problem situation, evaluate what the problem is asking, make a plan for what mathematical procedure(s) 284 PAGE 300 must be used to solve the problem, complete the plan, assess the reasonableness of the answer, then communicate the results. Previous research shows that the entire process is impacted by how well students can understand the problem they read, whether their mathematical knowledge provides them with the necessary tools to solve the problem, their confidence in their own ability to solve the problem, their ability to evaluate their own work, and their ability to communicate to the assessor what they have done. This study utilized a twelveitem constructedresponse problem solving test to gather data about students problemsolving processes, their written linguistic communication of those processes, their selfefficacy when commencing work on a problem, and their accuracy in assessing their own work. The intent of the study was to answer the following research questions: 1) To what extent are students choices of mathematical problemsolving processes related to gender? 2) To what extent is childrens performance in mathematical problem solving related to gender, reading ability as measured by teacher ratings and Florida Comprehensive Assessment Test (FCAT) reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 3) To what extent is the quality of students linguistic explanations of their problemsolving processes, as measured by a holistic rubric, related to gender, reading ability as measured by teacher ratings and FCAT 285 PAGE 301 reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 4) To what extent is the relationship between students feelings of selfefficacy when commencing work on a mathematical problem, as measured by a selfreported rating scale, and their actual performance on a mathematical problem solving test as measured by a holistic scoring rubric related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 5) To what extent is students ability to assess their own performance on mathematical problem solving tasks as measured by a selfreported rating scale related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? The study test questions were gleaned from released items of the FCAT and NAEP exams. Because problemsolving tasks span such a wide range of mathematical content, the scope of the test instrument was narrowed by limiting the items to those that assessed the Number and Operations standard. Problems that were not initially in constructedresponse format were edited, and each question was followed by a request for the students to show their work then explain how they knew they were correct. This request provided an inducement for students to write down their computational work and provide a linguistic explanation of their problemsolving processes. Each problem statement was 286 PAGE 302 followed by a fiveface smiley face likert scale that measured students level of selfefficacy before their work on the item commenced. Each problem was presented on a separate page, leaving a large work space for children to show their work and explain their thinking. Following the work space was a second smileyface scale that asked students to rate how well they thought they did on the problem. The processes students chose to solve the problem were analyzed primarily through qualitative means. The first step was to sort the student responses for each test item according to the process used. A total of 232 processes were identified for the whole test, with individual questions having from twelve to twentyeight unique processes. A review of all processes led to a broad categorization scheme of successful, partially successful, and unsuccessful responses. For each of these categories, five to six themes emerged to serve as an organizing framework for the 232 processes. Table 13 (page 191) lists all seventeen of the themes, and notes which of the three broad categories each one falls under. Descriptive statistics were used to paint a broad picture of the themes of the processes used by boys and girls. Chisquare statistical tests were conducted with the aid of Statistical Analysis Software (SAS) to determine whether any gender differences existed in the themes of the processes students used to solve the problems. An alpha level of .05 was used to test for significance for all of the inferential statistical tests. Students mathematical performance was scored using the FCAT fourpoint holistic rubric, resulting in a possible score range of 048 points for the test. 287 PAGE 303 Itemspecific rubrics were also created for all test questions and are presented in Appendix K. The quality of their linguistic explanations was rated according to a researchercreated threepoint holistic rubric, resulting in a 036 point score range. Itemspecific rubrics were created for the test questions to aid in the process of scoring the linguistic explanations and are included in Appendix L. Student responses on the selfefficacy and selfassessment scales were converted to a numerical score ranging from 04, resulting in a 048 point possible score range. Selfefficacy mean bias, as a measure of the accuracy of students feelings of selfefficacy, was calculated as the average difference between students performance scores and their selfefficacy scores. A positive mean bias indicated overconfidence, and a negative mean bias signified underconfidence. Selfassessment mean bias scores were calculated in the same manner. All four of the variables of performance, linguistic explanations, selfefficacy mean bias, and selfassessment mean bias were analyzed through descriptive statistics, factorial ANOVAs, and multiple regression analyses. The study sample was limited to the fifthgrade students in three schools in a westcentral Florida school district. The schools had relatively homogeneous demographic profiles, and were in a part of the county populated by uppermiddle class families. The eligibility for participation was extended to all 396 students in the sixteen sample classrooms, but only those students whose parents provided signed permission were allowed to participate. The test was administered in the schools to 286 students, and a total of 284 students completed the study test. Information about the participants reading ability, mathematics ability, and 288 PAGE 304 gender was obtained through teacher ratings, previous FCAT scores, and for gender, students selfreport. Reading and mathematics ability were treated as categorical variables for the factorial ANOVAs, and were determined through a combination of students FCAT level scores and teacher ratings (Table 7). For the multiple regression analyses, reading ability and mathematics ability were continuous variables represented by students FCAT scale scores. FCAT level scores were available for a total of 242 students, and FCAT scale scores were obtained for 239 students. An analysis of the descriptive data led to the exclusion of three outliers from analysis, one of whom would have already been eliminated due to missing test scores; this resulted in a total sample of 237 students, 129 boys and 108 girls, for the factorial ANOVA and multiple regression analyses. Previous research has been conducted on various pieces of the problemsolving process, but there are holes in the literature for every element of the process that this study attempted to address. Most of the research related to problemsolving processes has been at the early elementary level, with a limited amount of research at the high school level. The research conducted with young children (Carr et al., 1999; Carr and Davis, 2001; Fennema et al., 1998) found significant gender differences in their strategies, but the problems were limited to singlestep, noncontextualized addition and subtraction problems. The research conducted with highschool students (Gallagher & DeLisi, 1994; Gallagher et al., 2000) utilized complex problems, but the sample was limited to high ability students. Prior to this research, there was virtually no information as to how 289 PAGE 305 upperelementary aged boys and girls differed in their problemsolving processes. There is extensive research that verifies the connection between childrens reading ability and mathematical performance, but a majority of that research focuses on children with reading disabilities and how those disabilities can best be accommodated in testing situations. Because problem solving is becoming an increasingly more important focus of mathematics, and also because problem solving is dependent on textual communication, further research into the readingproblem solving connection was still needed. The body of literature on gender differences in mathematical performance is extensive. Gender differences in performance in this country have declined over time, and are now rarely found below the middle grades. However, the relationship between reading ability and boys and girls mathematical problemsolving performance is uncharted territory and was of interest in this study. Previous research demonstrates that children benefit from the process of writing about their mathematical endeavors (Johnson et al., 1998; Pugalee, 2001; Rudnitsky at al., 1995), and girls have consistently outperformed boys in measures of writing performance (Gabrielson at al., 1995; Gambell & Hunter; 1999; Knudson, 1995; Malecki & Jewell, 2003; Pajares et al., 1999; Peterson, 1998; Peterson, 2000). However, although the research shows the benefits of writing about mathematics, none of the aforementioned studies attempted to link the quality of that writing to students ability. For the research devoted to gender differences in writing quality, the studies did not focus on writing in the context of 290 PAGE 306 mathematical problem solving. Also, although reading ability is known to be intrinsically linked to writing, students reading ability was not included as a factor in any of these analyses. This research sought to add to the body of knowledge about the relationship between gender, reading ability, mathematics ability, and the quality of boys and girls linguistic explanations of their problemsolving processes. Selfregulated learning is linked to strong performance in mathematics, and selfefficacy, as an element of selfregulation, mediates the relationship between students attitudes about mathematics and their mathematical achievement across age ranges. A preponderance of the selfefficacy research in the body of literature was conducted with high school and undergraduate college students. Only two studies dealt with the same population that this study endeavored to understand (Schunk & Gunn, 1986; Zimmerman & MartinezPons, 1990). Both of these studies are relatively old, and both only examined total selfefficacy, not the accuracy of that selfefficacy. In addition, neither study included a measure of student ability as an independent variable. To date, there is no research that clarifies the relationship between students reading ability and the accuracy of their feelings of selfefficacy. This research used reading ability as an independent variable in the study of childrens mathematical problemsolving performance, the quality of their written linguistic explanations of their problemsolving processes, and their selfefficacy and selfassessment. Previous research on selfassessment shows that, regardless of the accuracy of students selfassessment, the simple act of assessing their own 291 PAGE 307 work improved students performance. Selfassessment is a step in the problemsolving process that, when omitted, can lead to a student submitting a final answer that he/she would have known to be incorrect had he/she performed even a cursory review to check for reasonableness. For these reasons, selfassessment should be, but has not been, an active area of research. Only one study has examined selfassessment at the elementary level, and the only independent variable was gender (van Kraayenoord & Paris, 1997). The remainder of the scant research has been conducted with high school and college students. This research endeavored to add to the existing body of knowledge on gender differences in the accuracy of childrens selfassessments, and to generate new understandings about the relationships between reading and mathematics ability and childrens selfassessment of their problemsolving efforts. In conclusion, this research sought to use the data collected through the study test of mathematical problem solving to add to the body of knowledge on various elements of childrens problemsolving behaviors. The analysis of students problemsolving processes used gender as the only independent variable. The remaining analyses of performance, linguistic explanations, selfefficacy, and selfassessment, examined the data through the lens of childrens gender, their reading ability, and their mathematics ability. 292 PAGE 308 Conclusions and Implications Research Question One: ProblemSolving Processes ProblemSolving Process Conclusions The data for this research question were generated through a qualitative analysis of students problemsolving processes. The processes for each of the test questions were categorized into seventeen emergent themes and are presented in Appendix M. The data were analyzed descriptively (Table 14), and ChiSquare tests were used to determine whether there were any significant gender differences in the processes that students chose to solve each of the questions. The ChiSquare results are presented in Table 15 (page 197). The descriptive data in Table 14 (page 195) suggests that there were gender differences in the themes of the processes students used to solve the problem in the study sample. To test for the significance of these differences and maintain the overall alpha level at .05 across the thirteen chisquare tests, the Bonferroni adjustment was used to adjust the peritem alpha level downward to 0.004. Table 15 (page 197) shows that none of the differences were significant at the .004 alpha level. As a result, even though the data for the sample shows gender differences, those differences were not large enough to conclude that they were representative of differences in the population. The pencil problem processes, as presented in Table 16 (page 198), represented the largest effect size of all the problems. For this item, more girls than boys were coded with themes categorized as unsuccessful. Many more boys than girls used a novel approach to solve the problem, and more girls than boys solved the problem 293 PAGE 309 using a drawing or diagram. The results for this question mirrored the overall finding that the girls in the sample were more prone to solving the problems through the use of drawings or diagrams. Because none of the inferential results were significant, no conclusions can be drawn about the population as a whole. In the earlier work of Carr et al. (1999), Carr and Davis (2001), and Fennema et at. (1998) with first through third grade students, significant differences were found with much smaller samples. There are many potential reasons why the results of this study did not confirm the work of previous researchers. To begin, the aforementioned studies used noncontextualized singlestep addition and subtraction computation problems with a dichotomous coding scheme. The complex nature of the problemsolving items on this test led to a much more detailed coding scheme that, given the large number of themes, was not able to detect differences that may have existed. A larger sample size, and the increased power associated with a larger sample, may have led to different results. There is also a possibility that no differences reached significance because no real differences exist in the population. Regardless of the lack of significant findings regarding gender, the descriptive statistics still provide valuable information about childrens problemsolving processes. Less that half of all student responses fell under the theme of successful solutions using a traditional method. By collapsing the data for the remaining themes across categories, such as combining the data for theme #2 Novel method successful with that of theme #10 Novel method partially 294 PAGE 310 successful, a clearer picture is created. In all, 6.1% of the students used a novel method, 4.9% of the responses were solved with a drawing or diagram, 6.7% of the responses were not successful due to computational problems, 4.3% had problems with mathematical logic, and fully 12.4% were unsuccessful due to a misunderstanding of the question or inattention to the question. ProblemSolving Process Implications The descriptive statistics for the collapsed themes provide information vital for teacher educators and those who teach mathematics. The data show that students often chose a novel solution method, and those who sought a novel solution process were successful most of the time. Students also frequently solved, or attempted to solve, problems through the use of a drawing or diagram. In other words, many students were creative in their computational approach to problem solving, whether or not they arrived at a correct solution. The mathematics education system of this country is sometimes characterized as being a mile wide and an inch deep. Teachers are too often satisfied with finding and discussing one productive means of solving a problem. These results show that children inherently choose a variety of methods for solving problems, and should be supported in pursuing and becoming successful in those methods. In some countries where children consistently score well on international tests, a distinguishing characteristic of their classroom instruction is the depth of their coverage of material (Fierros, 1999). Instead of doing ten problems one way in a session, they may do one problem ten different ways. This instructional style acknowledges childrens multiple ways of thinking and 295 PAGE 311 encourages creative approaches to problem solving. The results of this study indicate that an approach of less breadth and more depth may help ensure that children are better equipped to be successful in using a variety of problemsolving methods. Of even greater consequence is the result that one of every eight responses (381 total) was not successful due to students lack of understanding of the question, or inattention to the question. This situation may be the result of reading comprehension difficulties or a lack of selfdiscipline in reading the entire problem carefully. Many more responses (553 total) were coded with unsuccessful themes for flawed mathematical logic or incoherence (Cannot be determined). The data for this study does not allow any conclusions to be drawn about how many of these responses were directly related to reading difficulty, but reading difficulty cannot be ruled out as a precursor to the failure. In any case, these numbers are too large to be ignored. Regardless of how teachers of mathematics choose to view their responsibility vis a vis reading, they must acknowledge and be willing to act on the fact that students who are poor or careless readers cannot be successful problemsolvers. Research Question Two: Mathematical Performance Mathematical Performance Conclusions The data for this research question were generated through the holistic assessment of student work on the twelveitem problem solving test. All responses were scored using the FCAT fourpoint rubric along with itemspecific rubrics that were created using data from the pilot administration of the test. The 296 PAGE 312 data were analyzed descriptively, and through factorial ANOVA and multiple regression tests. The descriptive results are presented in Tables 17 (page 201) and 18 (page 204), the factorial ANOVA results are in Table 20 (page 213), and the multiple regression results are in Table 21 (page 218). The independent variables for the inferential statistical tests were gender, reading ability, and mathematics ability. For the factorial ANOVA, reading and mathematics ability were categorical variables whose values were determined using a matrix (Table 7, page 142) based on teacher ratings and students previous years FCAT level scores. For the multiple regression, reading and mathematics ability were measured solely by the continuous variable of students previous years FCAT scale scores. The genderrelated results of this study serve to confirm earlier research findings that show no significant differences in the mathematics performance of boys and girls at the elementary level. The factorial ANOVA did not find gender to be significantly related to performance, and the predictor of gender did not contribute significantly to the predictive utility of the regression equation. An interesting result that came out of the factorial ANOVA was the significance of the interaction effect between gender and reading ability for mathematical performance (see Figure 2). Gender was not a significant variable, yet the interaction of gender with reading ability reached significance. Girls appear to have profited more from high reading ability than boys, and suffered more from having only medium reading ability than boys. Among low ability readers, boys and girls scores were nearly identical. 297 PAGE 313 As expected, all statistical tests showed a strong relationship between mathematics ability and performance on the study test. These results do not add to the body of knowledge and therefore will not be discussed here. For the factorial ANOVA and multiple regression analyses, reading ability was shown to be significantly related to mathematical performance. Taking all the descriptive and inferential results together leads to the conclusion that although mathematics ability had by far the strongest relationship to mathematical performance among the variables, reading ability also plays a significant role in students mathematical problemsolving performance. Mathematical Performance Implications Both the descriptive and inferential results indicate a strong relationship between mathematics ability and mathematics performance. The implication that better mathematics students score better on a mathematical test is not noteworthy. Due to the lack of significant findings about the relationship between gender and mathematical performance, no implications can be made from the results for gender. The relationship between reading ability and mathematical performance bears further discussion. Together with the findings for question one, the results indicate that mathematical competence alone does not determine students success in problemsolving endeavors. Mathematical story problems are communicated through print. Before a student can begin the mathematical part of the problemsolving process, he or she must be able to read and understand the problem situation and what the problem is asking. The text plays a mediating 298 PAGE 314 role between the student and the mathematics, and can only be activated by competent reading comprehension. This reality does not receive enough attention in the elementary classroom or in the mathematics methods courses in preservice teacher programs. The factorial ANOVA result indicating a significant interaction between gender and reading ability at the middleability level is also important. Gender alone was not found to be significantly related to performance, but as gender interacted with reading ability, especially among middleability readers, significant differences in performance appeared. Unlike highability readers, boys of middle reading ability had a distinct advantage over girls in mathematical problem solving. The strength of the results allows for the conclusion that this interaction exists in the population. In an age where equal treatment of all children, regardless of demographic characteristics, is so important, this result deserves consideration in the classroom. Teachers must be aware that their female students of middle reading ability may struggle more with comprehension of mathematical story problems than male students of similar ability. Research Question Three: Linguistic Explanations Linguistic Explanation Conclusions The data for this research question were gathered through analysis of students explanations of their problemsolving processes on the twelveitem problemsolving test. Each student response was scored using the researchercreated holistic threepoint rubric supported by itemspecific rubrics. The itemspecific rubrics used student responses from the pilot administration of the test to 299 PAGE 315 aid in identification of the qualities of answers for each question that would earn each score. The data were first analyzed descriptively (Table 22, Table 23, Figure 3, pages 222, 225, and 226, respectively), and then inferentially via a factorial ANOVA (Table 24, page 230) and multiple regression (Table 25, page 233). The dependent variable for this question was students total linguistic explanation score (ranging from 036), and the independent variables were gender, reading ability, and mathematics ability. Although the quality and completeness of girls linguistic explanations were judged superior to boys for threefourths of the questions, the differences were not large enough for the factorial ANOVA to conclude that the differences exist in the population. Gender was the variable that showed the lowest correlation with the outcome variable for the multiple regression. Including it in the model did improve the predictive power of the regression equation, but not by a significant amount. The results from the factorial ANOVA and the multiple regression show that reading ability and mathematics ability were significantly related to students linguistic explanations. In both cases, the relationship was strong enough to conclude that it holds true for the population as well. The factorial ANOVA pvalue for reading ability was lower than the pvalue for mathematics ability. Also, the squared semipartial regression coefficients showed that reading ability accounted for more than twice the variability in the outcome as mathematics ability. In essence, of the three independent variables, the variable with the 300 PAGE 316 strongest relationship to the quality of students linguistic explanations was reading ability. Linguistic Explanation Implications The finding that gender was not significantly related to the linguistic explanation scores runs counter to previous research findings that girls are consistently better writers than boys (Gabrielson, et al., 1995; Gambell & Hunter, 1999; Gormely, 1993; Knudson, 1995; Malecki & Jewell, 2003; Pajares et al., 1999; Peterson, 1998; Peterson, 2000). Every single study about gender differences in writing ability clearly found girls writing skills to surpass that of boys. The difference between this research and previous studies is that this study was the only one to measure writing in the context of mathematical problem solving. There were no significant gender differences in performance, yet given girls proven superiority over boys in written communication, it is surprising that the mathematical context of the writing would strip away the gender differences in the outcomes for linguistic explanations. The descriptive statistics showed that, for both genders, the scores were much higher for mathematics performance than for linguistic explanations. These findings taken together would suggest that girls and boys would both benefit from instruction and practice in writing that is specifically targeted to mathematics. The notion that reading ability and writing ability are two sides of the same coin has gained widespread acceptance in recent years. However, most of the discussion about the relationship between these two literacy skills has not moved beyond the field of language arts. What is interesting about the results of this 301 PAGE 317 study stems from the fact that the scores for students linguistic explanations were based on students writing about their mathematical problem solving. A student who was unable to work the problem mathematically would not have been able to explain his/her problemsolving processes. As a result, the childs mathematical difficulty would have resulted in a failure to achieve a successful score for linguistic explanation. As such, the finding that mathematics ability was significantly related to students linguistic explanation score makes logical sense. However, the results for reading ability show that not only was reading ability significantly related to students linguistic explanation scores, its relationship was even stronger than that of mathematics ability. These findings indicate that to improve students ability to explain their mathematical problemsolving processes successfully is more a matter of enhancing their mathematical reading ability than their mathematics skills. Research Question Four: SelfEfficacy SelfEfficacy Conclusions The outcome variable for this research question was students selfefficacy mean bias scores as a measure of calibration; the accuracy of their feelings of selfefficacy. Students selfreported their selfefficacy for each of the studys twelve test items by circling one of five faces on a likerttype smileyface scale. Their selection was then converted to a numerical score ranging from four for the highest level of selfefficacy to zero for the lowest level. The average of their mathematical performance scores was deducted from their average selfefficacy score to arrive at a mean bias score. A positive score indicated over302 PAGE 318 confidence, and a negative score showed underconfidence. The independent variables for this analysis were gender, reading ability, and mathematics ability. The descriptive statistics for students selfefficacy scores show that boys were more selfefficacious than girls for all twelve of the test items. The selfefficacy mean bias scores were similarly patterned with boys bias score of 0.88 dwarfing girls score of 0.58. Both genders were overconfident, but boys overconfidence eclipsed that of girls. The factorial ANOVA results indicate that the gender differences in mean bias for the sample were large enough to conclude that the differences truly exist in the population. The multiple regression findings confirm this result, with gender being the most significant predictor of selfefficacy mean bias among the study variables. For the independent variable of reading ability, the descriptive statistics showed that students of higher reading ability were generally more accurate in their selfassessments than lowerability readers. However, the factorial ANOVA results were not significant, meaning that chance cannot be ruled out as the explanation for the differences. In contrast, the multiple regression found that reading ability was able to predict more of the variability in students selfefficacy mean bias scores than either gender or mathematics ability. These differing results may have been the result of the difference in how the variable of reading ability was defined for each of the two statistical tests. Because of the strength of the relationship between reading ability and selfefficacy mean bias found in the multiple regression, the study conclusion is that reading ability is a significant 303 PAGE 319 predictor of selfefficacy mean bias, with better readers exhibiting more accurate feelings of selfefficacy than lowerability readers. The factorial ANOVA and multiple regression results for mathematics ability were significant. This finding leads to the conclusion that the sample students of higher mathematics ability were more accurate in their feelings of selfefficacy than students of lower ability, and that these relationships exist in the population as well. There was also a moderately strong negative correlation between selfefficacy mean bias and total performance, indicating that students with lower selfefficacy mean bias scores achieved higher mathematics performance scores. SelfEfficacy Implications No previous research has studied the accuracy of students selfefficacy in the elementary grades. The only two studies dealing with this age group observed total selfefficacy as it related to gender (Zimmerman & MartinezPons, 1990), and the relationship between selfefficacy and performance (Pajares & Miller, 1995). Zimmerman and MartinezPons (1990) found there to be no significant difference between the selfefficacy of boys and girls, contradicting the results of this study. However, three studies with middleand highschool students that included gender as a variable (Laveault et al., 1999; Pietsch et al., 2003; Randhawa et al., 1993) found males to have higher levels of selfefficacy than females. All of the studies that observed the relationship between selfefficacy and performance found a significant positive relationship (Cooper & Robinson, 1991; Hackett & Betz, 1989; Lent, Brown, & Gore, 1997; Multon et al., 304 PAGE 320 1991; Pajares & Miller, 1994; Pietsch et al., 2003; Randhawa et al., 1993; Schunk & Gunn, 1986; Williams, 1994). A students level of selfefficacy is positively related to her or his deployment of the selfregulatory strategies and strength of persistence necessary for successful problem solving (Multon et al., 1991. By combining the results of this study with findings of previous research, it can be concluded that boys higher selfefficacy gives them a performance advantage over girls. Because boys and girls performance did not differ significantly in this study, and because boys were consistently more selfefficacious, it could be said that if there had been no gender differences in selfefficacy, boys may not have scored as high as girls. The interpretation of the results for selfefficacy mean bias is not as clear. For this study, both genders were overconfident, but boys were more overconfident than girls. There has been no research in the field of mathematics that addresses the relationship between differing levels of overconfidence and performance. Therefore, although the results of this study add to what is known about how accurately boys and girls can judge their own selfefficacy, no implications can be drawn from these findings. No previous research has attempted to link reading ability to selfefficacy mean bias. The finding that reading ability is a significant predictor of students mathematical problemsolving selfefficacy mean bias is an important one. It implies that the students who were better able to read and comprehend the test items were also better able to judge how well they could perform on the 305 PAGE 321 problems. When these results are added to the correlation results, it can be concluded that because students of higher reading ability are more accurate in their selfassessment, and because students with more accurate selfassessments scored higher on the subject test, that improved reading ability leads to improved performance. The viability of this conclusion is verified by the mathematical performance results discussed earlier. The factorial ANOVA and multiple regression results for mathematics ability were significant, confirming that higherability students were more accurate in their feelings of selfefficacy, and that this relationship exists in the population. This finding implies that when students are better equipped to handle the mathematics involved in a story problem, they are better able to judge how well they will perform on that problem. Research Question Five: SelfAssessment SelfAssessment Conclusions The data for this question were gathered in much the same way as the selfefficacy data. Whereas selfefficacy was measured after students had read the problem but before they began working, selfassessment was measured at the completion of the work for each problem. For selfassessment, students were asked to indicate, via the circling of one of five faces, how well they thought they had done on each problem. The student responses were converted to numerical scores with four representing the highest level of selfassessment, and zero representing the lowest level. The data were analyzed descriptively and inferentially through factorial ANOVA and multiple regression analyses. The 306 PAGE 322 independent variables for the question were gender, reading ability, and mathematics ability, and the dependent variable was selfassessment mean bias. The mean bias score was found by subtracting students average mathematics performance score from their average selfassessment score. A positive mean bias indicated overconfidence, and a negative mean bias score showed underconfidence. The descriptive results show that boys rated their work higher than girls, but that they were less accurate in their ratings. For the sample, both reading ability and mathematics ability were negatively related to selfassessment mean bias. Students with higher ability had lower mean bias scores, indicating more accurate selfassessment. The factorial ANOVA results show that the relationship between gender and selfassessment mean bias was strong enough to conclude that the differences exist in the population. Mathematics ability was found to be significantly related to selfassessment mean bias, with higherability students being more accurate in their selfassessments. The relationship between reading ability and selfassessment mean bias was not strong enough to infer that the relationship holds true in the population. As with selfefficacy mean bias, the results for the multiple regression for selfassessment mean bias appear to contradict the findings of the factorial ANOVA. As discussed earlier, this seeming contradiction may be a reflection of the different ways that ability was measured for the two inferential tests. Reading ability was the variable found to be the strongest single predictor of selfassessment mean bias. The variable of gender was also found to increase the 307 PAGE 323 ability of the regression equation to predict selfassessment mean bias. Mathematics ability made a small, but not significant, contribution to the equation. SelfAssessment Implications Just as with selfefficacy, both boys and girls were overconfident in the assessment of their work. The findings about the relationship between gender and selfassessment also mirrored that of selfefficacy with boys being more overconfident than girls. Both inferential tests, the factorial ANOVA and the multiple regression, concurred that gender was significantly related to selfassessment mean bias. These findings confirm earlier research about the relationship between gender and selfassessment (Pallier, 2003; van Kraayenoord & Paris, 1997; Vermeer et al., 1994; Wright & Houck, 1995), although not all of these studies focused on mathematics. Only one small study dealt with elementaryaged students (van Kraayenoord & Paris, 1997), and it was not focused on mathematics. For these reasons, this study contributes to the body of knowledge by focusing on selfassessment in the area of mathematical problem solving and by researching an understudied population. The findings related to gender imply that, regardless of ability or actual performance, boys will walk away from a mathematical problemsolving activity with more confidence than will girls. If a teacher were to desire to predict how accurately his/her students would be able to assess their own mathematical problemsolving work, he/she should look first to students reading ability. This is an interesting finding, because teachers of mathematics would likely assume that accuracy of self308 PAGE 324 assessment would vary primarily according to mathematics ability. However, the multiple regression analysis did not find a significant relationship between mathematics ability and selfassessment mean bias. An important phase of the problemsolving process is looking back to assess the reasonableness of the answer. In fact, the most enduring problemsolving heuristic of all time (Polya, 1957) includes selfassessment as the fourth of four problemsolving steps. The only two studies that have examined the relationship between selfassessment and performance found that simply asking students to assess their own work led to improved performance (Hassmen & Hunt, 1994; Koivula et al., 2001). It is during the selfassessment process that students are able to compare their answer to the results they may have expected, allowing them to detect any potential computational or logical errors. Because of the importance of selfassessment, and the fact that reading ability is such a strong predictor of the accuracy of students selfassessment, these results serve to further highlight the importance of reading in the mathematics classroom. Recommendations for Practice The results of this study, coupled with the understandings provided in existing research, lead to some recommendations for teachers and teacher educators. As discussed in the review of literature, the language, vocabulary, and symbols used in mathematics are unique among the content areas. For this reason, teachers of mathematics are better equipped to teach the reading of 309 PAGE 325 mathematics than are regular reading teachers. The descriptive results for problemsolving processes clearly show that students lack of comprehension of the problem statement was negatively related to performance. The results for mathematical performance, selfefficacy, and selfassessment all confirm the strength of the relationship between students reading ability and their mathematical problemsolving behaviors. Students who are better readers score higher, are more accurate in their feelings of selfefficacy, and are better assessors of their own work. These results serve as an impetus for mathematics teachers to alter the view of their role to include teacher of reading of mathematics and devote more of their classroom instructional time to helping students become more competent readers of mathematics. Lest all the blame be placed on the shoulders of teachers, these results further suggest that undergraduate programs in mathematics teacher education, be they at the elementary or secondary level, must ensure that their preservice teachers are fully prepared to be teachers of mathematics reading. To provide a full preparation requires more than just supplying the requisite pedagogical content knowledge; it also requires that teacher educators adopt new attitudes and beliefs about their role as mathematics reading teachers. Although the addition of a content area reading course to some secondary mathematics education programs is a positive indicator of change, the addition of a single course may lull program designers into believing they have addressed the issue. However, because the reading of mathematics is critical to all problem solving endeavors, regardless of the content strand, so too should instruction in reading 310 PAGE 326 mathematics be woven through all mathematical content in preservice teacher preparation programs. Lorties (1975) theory of the apprenticeship of observation serves as a caution to teachers about their tendency to teach as they were taught. The cycle will not be broken until teacher educators take it upon themselves to become educated about the role of reading in their content area and make that new knowledge, and the resulting beliefs, a central tenet of all courses in their teacher education programs. The current system of reporting a students highstakes mathematics test score as a single number may be misleading. The results of this study show that when mathematical problems are presented in context, a students reading ability is significantly related to his/her ability to complete the problem successfully. For this question format, reading ability is a confounding variable and may lead to spurious conclusions regarding a students remediation needs. Contextualized problems are an important element of mathematics assessment and should remain a part of high stakes tests. However, to increase the utility of the score reporting and teachers ability to remediate students having difficulty, perhaps students performance on contextualized story problems should also be reported separately. If a students scores for this problem type are lower than her/his scores for noncontextualized problems, and he/she also scored low for reading comprehension, intensive support for reading comprehension may serve to improve both areas. In this case, time spent on mathematical computation remediation may be a wasted effort. 311 PAGE 327 The results of this study related to gender confirm that gender differences in overall performance are not significant in the elementary grades. However, gender differences exist in other facets of problem solving. The inferential tests confirmed that regardless of ability or actual performance, boys commence problemsolving activities with much stronger feelings of selfefficacy than girls. Because selfefficacy is positively correlated with persistence (Multon et al., 1991), boys high level of efficacy may give them an advantage over girls for difficult tasks. This may further explain why boys outscored girls on the most difficult items. Armed with this knowledge, teachers should monitor girls selfefficacy during problemsolving tasks and seek ways to encourage them to persist through challenges. These differences in selfefficacy may also be an important teacher focus when preparing students to take highstakes tests. These tests tend to be composed of questions covering a wide range of difficulty. All students, but especially girls, should be coached in the test format and testing strategies, including ways to handle difficult questions. If girls are reminded that a difficult question may be followed by an easy question, they may be more inclined to persist through failure. Knowing that girls tend to be less confident, teachers should regularly seek ways to reaffirm girls confidence in what they know and can do. The gender results for selfassessment indicate that girls are more accurate in their selfassessments than boys. The results of previous research show that selfassessment is a vital element of the problem solving process and 312 PAGE 328 is positively related to performance. Selfassessment has received very little research attention, it is not prevalent in mathematics curricula, and it is not, in the researchers experience, a major topic of discussion in either the elementary classroom or the preservice teacher classroom. The results of this study indicate that because boys and girls differ greatly in the accuracy of their selfassessment, and because there was a moderately strong correlation between the accuracy of students selfassessments and their mathematical problemsolving performance, more effort needs to be made to provide opportunities for selfassessment in the classroom. If students are frequently asked to think about the quality of their work, they may become more reflective in their thinking and be more apt to uncover and remedy any errors they have made. For highstakes testing, students should be provided with enough experience in using the actual scoring rubrics to assess their own work and the work of others that they fully understand the qualities of superior problemsolving work. They will then be equipped to monitor, assess, and edit their own work in a testing environment. The results related to mathematics ability were largely expected. Higherability students performed significantly better, provided more accurate and complete linguistic explanations of their problemsolving processes, and were more accurate in their selfefficacy and selfassessment ratings. These results mirror previous research findings and do not contribute to the body of knowledge other than to reaffirm that students general mathematics ability is reflected in all areas of their problemsolving performance. Teachers should be reminded that 313 PAGE 329 their lowerlevel mathematics students will need greater support not only for the computational aspects of problem solving, but also for clearly explaining their thinking, and accurately gauging their selfefficacy and selfassessment. The recommendations that are directed to mathematics teacher educators about the inclusion of literacy in mathematics instruction and the nurturing of a variety of problemsolving processes also apply to regular classroom teachers. Admittedly, these recommendations may be received by classroom teachers as unrealistic requests to add even more material to an already overcrowded curriculum. In todays highstakes testing environment, many teachers are completely overwhelmed by the standards they are to cover and the material their students must know. Also, in some school districts, teachers have little or no flexibility in determining the mathematics content they must teach each day or the materials that must be used. How then are they to integrate more literacy into mathematics or encourage a variety of problem solving processes? A potential answer lies in issue of breadth vs. depth in the mathematics curriculum. Without realizing the impact of their traditional pedagogy, many teachers begin their math lesson by teaching all of the sample problems, demonstrating how to solve each type, then hoping that their students will remember the steps in their proper sequence. The teaching portion of the mathematics time is often followed by a time for students to practice what they have learned. At no point in this type of lesson are students required to read any mathematical text to comprehend a new concept or determine what type of problem they are being asked to solve. At no point are students asked to think about how they might use 314 PAGE 330 their prior knowledge in an attempt to solve a new problem. Finally, this lesson format does not provide students with an opportunity to communicate their thinking, either orally or in writing. The outcome of this pedagogy may be the development of a classroom of mathematical imitators, rather than mathematical thinkers and problem solvers. The risk is that this pedagogy may be sacrificing depth for breadth with the result that students gain a wide range of lowlevel knowledge that is fragile, in that it is potentially damaged rather than strengthened by the acquisition of new knowledge, and cannot be applied to novel problemsolving tasks. A more conceptually based pedagogical alternative sacrifices breadth for depth, but can result in students securing a narrow breadth of highlevel understanding that provides a strong foundation for future learning and can be flexibly applied to novel problem situations. Teachers who lament that they cannot possibly teach everything that is on a highstakes test are justified in their concerns. It is unfeasible for any teacher to teach students all the possible problems they may encounter on a comprehensive test. Also, if students are not routinely expected to tackle mathematical text, new concepts, and mathematical communication on their own, they will not be equipped to handle these tasks on a highstakes test. As hopeless as the situation may appear, there is a potential solution. Even in the most strictly controlled environment where teachers are told what pages of the mathematics text to teach each day of the week, they can create their own space by using pedagogy that scaffolds conceptual development of the required content, along with students ability to read mathematical text and write 315 PAGE 331 about their mathematical thinking. A conceptual approach to mathematics instruction integrates all of the literacy skills of reading, writing, listening, and speaking, and nurtures the development of mathematical thinkers. A conceptually based lesson might begin by having students brainstorm what they already know that may help them understand the days new concept. They could then read the mathematics text that explains the new concept either alone, in pairs, or in groups. Students may then be given a few minutes to discuss the reading in groups or as a class, relating the new concept(s) back to their prior knowledge. The teacher could then pose a problem that involves the new concept and give students time to work on the problem. As the teacher circulates around the room, he/she can note which students have made significant headway in solving the problem, and ask a few students who have used different solution methods to present their work on the board. As each student presents and explains his/her work, the teacher can guide the discussion through questioning strategies to ensure that the student has explained not just the how of their work, but the why of their work. All solution methods that utilize sound mathematical logic should be valued equally. Students can then practice with a few new problems using whatever method(s) support their conceptual understanding. The lesson can conclude with a journal writing assignment wherein each student either explains whose solution method he/she liked the best and why, or explains the new concept in everyday language. As described, this lesson covers the same content as the more traditional approach, with less breadth but more depth. Students do not have time to solve 316 PAGE 332 as many problems, but they have the opportunity to link their new understanding to prior knowledge, reducing the amount of knew knowledge and practice required. The conceptual approach allows students to communicate about the mathematics, integrating reading, writing, listening, and speaking into the lesson. The conceptual approach promotes and values childrens different ways of thinking instead of expecting that all children should think and solve problems the same way. Finally, by teaching conceptually, teachers are preparing their students to be able to independently read a novel problem, think about how their prior knowledge can be used to solve the problem, then complete the work and communicate their answer. Essentially, conceptual teaching allows teachers to work within the confines of their school districts curriculum while helping to prepare students to be mathematical thinkers and problem solvers who can be successful on highstakes tests. Recommendations for Future Research As much as this study answers some questions about childrens problemsolving processes, it leads to new questions. Previous research shows distinct differences among boys and girls in the early elementary years that were detected with samples only a third the size of the sample for this study. If the lack of significant findings in the area of problemsolving processes for this study was the result of the study being underpowered, no conclusions can be drawn. However, if the results indicate that the differences that exist among young 317 PAGE 333 students decline over time and are no longer detectable by the fifth grade, the implications would be enormous. A retrospective power analysis was conducted to determine whether the nonsignificant findings were more likely the result of the study being underpowered or a reflection of the fact that no significant differences exist in the population. Because there were 13 chisquare tests, in order to maintain the overall alpha level at .05, the Bonferroni adjustment was applied, resulting in an alpha of .0038 for each test. For the study to have a power of .80 to detect differences at this alpha level, the study would need 1523 boy and 1279 girl participants (Fleiss, 1981). The lack of significant differences was likely the result of the sample size being too small. Although the differences found in the primary grades by other researchers may no longer exist at the fifthgrade level, further research is needed to draw any conclusions. A longitudinal study could follow students through their elementary years and monitor the strength of their genderrelated differences in problemsolving process preferences as they progress through the system. A replication of this study with a much larger sample would also help to clarify the extent to which gender differences in problemsolving processes exist in the fifth grade. Although the data from this study could not clarify the extent to which the unsuccessful response themes of faulty mathematical logic or incoherent responses were related to reading, the data showed that reading issues were strongly related to students performance on the problemsolving test. Further research that is designed to incorporate followup interviews with all students 318 PAGE 334 whose responses are coded with these themes may help to illuminate the full extent of the relationship between reading difficulties and problemsolving performance. This study found that among the three independent variables, reading ability had the strongest relationship to students linguistic explanation scores. This research cannot, however, delineate what elements of the reading process are most strongly related to students ability to explain their mathematical problemsolving processes. Because this study was not experimental in nature, it cannot shed any light on what pedagogical practices for reading would have the greatest impact on students linguistic explanations. Additional research is needed in both of these areas. The students in this sample tended to be overconfident in their feelings of selfefficacy, with boys being more overconfident than girls. Previous research has shown that selfefficacy is positively related to performance, yet this study found that students who scored higher mathematically tended to be more accurate in their feelings of selfefficacy. Future research is needed to determine whether there is a causal relationship between selfefficacy bias and performance. Is there a point at which a students level of overconfidence begins to impede performance? Is it better for a student to be overly optimistic about his/her ability to be successful on a task than to be more realistic? Would students benefit from routinely reporting their selfefficacy at the onset of mathematical tasks and checking that against their actual performance? 319 PAGE 335 Similar questions remain unanswered for selfassessment. The act of selfassessment has been previously shown to improve performance, but no studies have clarified the relationship between the accuracy of the selfassessments and the impact on performance. The moderate correlation between the two variables found in this study indicates that a relationship exists, but the correlation alone does not clearly define the relationship. Also, because this study was nonexperimental, it does not provide evidence as to what classroom activities would prove to be the most beneficial for improving the accuracy of students selfassessments. Selfassessment is an understudied phase of the problemsolving process and deserves more attention so that teachers can be sufficiently equipped to guide their students to becoming competent assessors of their own work. For some of the outcome variables, the descriptive results show differences between groups that the inferential tests did not find to be significant. A replication of this study with a similar but larger sample would increase the power of the study to detect smaller differences and smaller effect sizes. If the new results were added to the results of this study, the researcher could more clearly interpret whether nonsignificant results were indicative of the lack of differences in the population. Finally, the test items used for data collection for this study were limited to the mathematics standard of number and operations. Therefore, the results of this study cannot be generalized across all mathematics content. A replication of this study using test items that cover other mathematics standards would add to 320 PAGE 336 these findings about the relationships among the independent and dependent variables. 321 PAGE 337 REFERENCES Ackerman, P. T., & Dykman, R. A. (1995). Readingdisabled students with and without comorbid arithmetic disability. Developmental Neuropsychology, 11, 351371. Adams, T. L. (2003). Reading mathematics: More than words can say. The Reading Teacher, 56, 786795. Ai, X. (2002). Gender differences in growth in mathematics achievement: Threelevel longitudinal and multilevel analyses of individual, home, and school influences. Mathematical Thinking and Learning, 4, 122. Alexander, P. (1995). Superimposing a situationspecific and domainspecific perspective on an account of selfregulated learning. Educational Psychologist, 30, 189193. Alloway, N., & Gilbert, P. (1997). Boys and literacy: Lessons from Australia. Gender and Education, 9, 4958. American Association of University Women (2002). Coursetaking patterns. In The JosseyBass reader on gender in education (pp. 277300). San Francisco, CA: JosseyBass. American Psychological Association (1985). Standards for educational and psychological testing. Washington, DC: Author. 322 PAGE 338 Anderson, R. C., Pichert, J. W., Goetz, E. T., Schallert, D. L., Stevens, K. V., Trollip, S. R. (1976). Instantiation of general terms. Journal of Verbal Learning and Verbal Behavior, 15, 667679. Artzt, A. F., & ArmourThomas, E. (1992). Development of a cognitivemetacognitive framework for protocol analysis of mathematical problem solving in small groups. Cognition and Instruction, 9, 137175. Ary, D., Jacobs, L. C., & Razavieh, A. (1996). Introduction to research in education (5 th ed.). Fort Worth, TX: Harcourt Brace College Publishers. Baker, D. P., & Jones, D. P. (1993). Creating genderequality: Crossnational gender stratification and mathematical performance. Sociology of Education, 66, 1103. Bandura, A. (1986). Social foundations of thought and action: A social cognitive theory. Englewood Cliffs, NJ: Prentice Hall. Bandura, A. (1997). Selfefficacy: The exercise of control. New York: Freeman. Barrs, M. (2000). Gendered literacy? Language Arts, 77, 287293. Barton, M. L., Heidema, C., & Jordan, D. (2002). Teaching reading in mathematics and science. Educational Leadership, 60(3), 2428. Basurto, I., (1999). Conditions of reading comprehension which facilitate word problems for second language learners. Reading Improvement, 36, 142148. Bernardo, A. B. (1999). Overcoming obstacles to understanding and solving word problems in mathematics. Educational Psychology, 19, 19163. 323 PAGE 339 Betz, N. E., & Hackett, D. (1983). The relationship of mathematics careerrelated selfefficacy expectations to the selection of sciencebased college majors. Journal of Vocational Behavior, 23, 329345. Boekaerts, M., Seegers, G., & Vermeer, H. J. (1995). Solving math problems: Where and why does the solution process go astray? Educational Studies in Mathematics, 28, 241262. Boggs, M. L. (1999). Reading boys: A poststructuralist inquiry into firstgrade literacy and masculinity during the language arts block. Unpublished doctoral dissertation, University of South Florida, Tampa. Bong, M. (2002). Predictive utility of subject, task, and problemspecific selfefficacy judgments for immediate and delayed academic performances. The Journal of Experimental Education, 70, 133162. Bottge, B. A. (1999). Effects of contextualized math instruction on problem solving of average and belowaverage achieving students. The Journal of Special Education, 33, 8192. Borasi, R., Siegel, M., & Fonzi, J. M. (1998). Using transactional reading strategies to support sensemaking and discussion in mathematics classrooms: An exploratory study. Journal for Research in Mathematics Education, 29, 275305. Braten, I. (1998). Cognitive strategies in mathematics, part 1: On childrens strategies for solving simple addition problems. Scandinavian Journal of Educational Research, 42, 524. 324 PAGE 340 Bratina, T. A., & Lipkin, L. J. (2003). Watch your language! Recommendations to help students communicate mathematically. Reading Improvement, 40, 312. Breland, H. M., Danos, D. O., Kahn, H. D., Kubota, M. Y., & Bonner, M. W. (1994). Performance versus objective testing and gender: an exploratory study of an advanced placement history examination. Journal of Educational Measurement, 31, 275293. Burns, M. (1995). Writing in math class? Absolutely! Instructor, 104, 4047. Capps, L. R. & Pickreign, J. (1993). Language connections in mathematics: A critical part of mathematics instruction. Arithmetic Teacher, 41, 812. CardelleElawar, M. (1994). Effects of metacognitive instruction on low achievers in mathematics problems. Teaching and Teacher Education, 11, 8195. Carr, M., & Davis, H. (2001). Gender differences in arithmetic strategy use: A function of skill and preference. Contemporary Educational Psychology, 26, 330347. Carr, M., Jessup, D. L., & Fuller, D. (1999). Gender differences in firstgrade mathematics strategy use: Parent and teacher contributions. Journal for Research in Mathematics Education, 30, 2046. Charles, R., Lester, F., & ODaffer, P. O. (1988). How to evaluate progress in problem solving. Reston, VA: National Council of Teachers of Mathematics. 325 PAGE 341 Charmaz, K. (2000). Grounded theory: Objectivist and constructivist methods. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (pp. 509535). Thousand Oaks, CA: Sage. Chipman, S. F. (1996). Female participation in the study of mathematics, the US situation. In G. Hanna (Ed.), Towards gender equity in mathematics education (pp. 285296). Netherlands: Kluwer Academic Publishers. Chipman, S. F., Marshall, S. P., & Scott, P. A., (1991). Content effects on word problem performance: A possible source of test bias? American Educational Research Journal, 28, 897915. Choi, J., & Hannafin, M. (1997). The effects of instructional context and reasoning complexity on mathematics problemsolving. Educational Technology Research & Development, 45, 4355. Ciochine, J. G., & Polivka, G. (1997). The missing link? Writing in mathematics class! Mathematics Teaching in the Middle School, 2, 316320. Clarkson, S. P., & Williams, W. H. (1994, November). Are you assessing reading or mathematics? Paper presented at the Annual Meeting of the American Mathematics Association of TwoYear Colleges, Tulsa, OK. Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155159. Commeyras, M. (1999). How interested are literacy educators in gender issues? Survey results from the United States. Journal of Adolescent and Adult Literacy, 42, 352362. 326 PAGE 342 Conlin, M. (2003). The new gender gap: From kindergarten to grad school, boys are becoming the second sex. Business Week, May 26, 7482. Cook, J. (1995). Integrating math and writing. Teaching PreK8, 25, 2223. Cooper, S. E., & Robinson, D. A. G. (1991). The relationship of mathematics selfefficacy beliefs to mathematics anxiety and performance. Measurement and Evaluation in Counseling and Development, 24, 411. County School Board (2004). Elementary school FCAT scores: 20022003 school accountability report. Retrieved May 7, 2004, from county school board website. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297334. Croom, L. (1997). Mathematics for all students: Access, excellence, and equity. In J. Trentacosta & M. J. Kenney (Eds.), Multicultural and gender equity in the mathematics classroom: 1997 yearbook (pp. 19). Reston, VA: National Council of Teachers of Mathematics. Crowley, M. L. (1990). Criterionreferenced reliability indices associated with the van Hiele geometry test. Journal for Research in Mathematics Education, 21, 238241. Dale, E., & Chall, J. L. (1949). A formula for predicting readability. Educational Research Bulletin, 27, 1120, 28. Desoete, A., Roeyers, H., & Buysse, A. (2001). Metacognition and mathematical problem solving in grade 3. Journal of Learning Disabilities, 34, 435449. Dobson, J. (2001). Bringing up boys. Wheaton, IL: Tyndale. 327 PAGE 343 Dowling, D. M. (1978). The development of a mathematics confidence scale and its application in the study of confidence in women college students. Unpublished doctoral dissertation, Ohio State University, Columbus. Drake, B. M., & Amspaugh, L. B. (1994). What writing reveals in mathematics. Focus on Learning Problems in Mathematics, 16, 4350. Draper, R. J. (2002). School mathematics reform, constructivism, and literacy: A case for literacy instruction in the reformoriented math classroom. Journal of Adolescent and Adult Literacy, 45, 520529. Dusterhoff, M. (1995). Why write in mathematics? Teaching PreK8, 25, 4849. Earp, N. W. & Tanner, F. W. (1980). Mathematics and language. Arithmetic Teacher, 28, 3234. Eccles, J., Wigfield, A., Harold, R. D., & Blumenfeld, P. (1993). Age and gender differences in childrens selfand task perceptions during elementary school. Child Development, 64, 830847. Ethington, C. A. (1992). Gender differences in a psychological model of mathematics achievement. Journal for Research in Mathematics Education, 23, 166181. Fan, N., Mueller, J. H., & Marini, A. E. (1994). Solving difference problems: Wording primes coordination. Cognition and Instruction, 12, 355369. Farrand, M. R. (2002). Gender and achievement in mathematical problem solving and attributions for mathematical achievements in grade three students. Unpublished doctoral dissertation, University of Windsor, Canada. 328 PAGE 344 FCAT results: Ranking local schools. (2004, May 11). The Bradenton Herald, p. A8. Feinberg, A. B., Shapiro, E. S. (2003). Accuracy of teacher judgments in predicting oral reading fluency. School Psychology Quarterly, 18, 5265. Fennema, E. (1974). Mathematics learning and the sexes: A review. Journal for Research in Mathematics Education, 5, 126139. Fennema, E., & Carpenter, T. P. (1981). Sexrelated differences in mathematics: Results from the National Assessment. Mathematics Teacher, 74, 554559. Fennema, E., Carpenter, T. P., Jacobs, V. R., Franke, M. L., & Levi, L. W. (1998). A longitudinal study of gender differences in young childrens mathematical thinking. Educational Researcher, 27(5), 611. Fierros, E. G. (1999, April). Examining gender differences in mathematics achievement on the Third International Mathematics and Science Study. Paper presented at the Annual Meeting of the American Educational Research Association, Montreal, Quebec, Canada. Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions (2 nd ed.). New York: John Wiley & Sons. Flem, A., & Sovik, N. (1993). Children with motor coordination problems and learning disabilities in reading, spelling, writing and arithmetic. European Journal of Special Needs Education, 8, 8198. Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 30, 221233. 329 PAGE 345 Forgasz, H. J., & Leder, G. C. (1996). Mathematics classrooms, gender and affect. Mathematics Education Research Journal, 8, 153173. Fortescue, C. M. (1994). Using oral and written language to increase understanding of math concepts. Language Arts, 71, 576580. Fouad, N. A., Smith, P. L., & Enochs, L. (1997). Reliability and validity evidence for the middle school selfefficacy scale. Measurement and Evaluation in Counseling and Development, 30, 1731. Fry, E. B. (1969). The readability graph validated at primary levels. Reading Teacher, 22, 534538. Fry, E. (1990). A readability formula for short passages. Journal of Reading, 33, 594597. Fuchs, L. S., & Fuchs, D. (2002). Mathematical problemsolving profiles of students with mathematics disabilities with and without comorbid reading disabilities. Journal of Learning Disabilities, 35, 563573. Fuchs, L. S., Fuchs, D., Eaton, S. B., Hamlett, C., & Karns, K. (2000). Supplementing teacher judgments of mathematics test accommodations with objective data sources. School Psychology Review, 29, 6585. Fuchs, L. S., Fuchs, D., Prentice, K., Burch, M., Hamlett, C. L., Owen, R., & Schroeder, K. (2003). Enhancing thirdgrade students mathematical problem solving with selfregulated learning strategies. Journal of Educational Psychology, 95, 306315. Fuentes, P. (1998). Reading comprehension in mathematics. The Clearing House, 72, 8188. 330 PAGE 346 Gabrielson, S., Gordon, B., & Engelhard, G. (1995). The effects of task choice on the quality of writing obtained in a statewide assessment. Applied Measurement in Education, 8, 273290. Gallagher, A. M., & DeLisi, R. (1994). Gender differences in scholastic aptitude test: Mathematics problem solving among highability students. Journal of Educational Psychology, 86, 204211. Gallagher, A. M., DeLisi, R., Holst, P. C., McGillicuddyDeLisi, A. V., Morely, M., & Cahalan, C. (2000). Gender differences in advanced mathematical problem solving. Journal of Experimental Child Psychology, 75, 165190. Gambell, T. J., & Hunter, D. M. (1999). Rethinking gender differences in literacy. Canadian Journal of Education, 24, 116. Gambell, T. J., & Hunter, D. M. (2000). Surveying gender differences in Canadian school literacy. Journal of Curriculum Studies, 32, 689719. Gee, J. P. (1996). Social linguistics and literacies: Ideology in discourses. Philadelphia: Falmer Press. Glass, G. V., & Hopkins, K. D. (1996). Statistical methods in education and psychology. Boston: Allyn and Bacon. Glass, G., Peckham, P., & Sanders, J. (1972). Consequences of failure to meet assumptions underlying the fixed effects analyses of variance and covariance. Review of Educational Research, 42, 237288. Goldman, S. R., Hasselbring, T. S. (1997). Achieving meaningful mathematics literacy for students with learning disabilities. Journal of Learning Disabilities, 30, 198208. 331 PAGE 347 Goldsby, D. S., & Cozza, B. (2002). Writing samples to understand mathematical thinking. Mathematics Teaching in the Middle School, 7, 517520. Gormley, K. A. (1993, April). Gender and ability differences in childrens writing. Paper presented at the Annual Meeting of the American Educational Research Association, Atlanta, GA. Guilford, J. P., & Fruchter, B. (1978). Fundamental statistics in psychology and education. New York: McGrawHill. Gurganus, S., & Del Mastro, M. (1998). Mainstreaming kids with reading and writing problems: Special challenges of the mathematics classroom. Reading and Writing Quarterly, 14, 117125. Guthrie, J. T. (1981). Reading in New Zealand: Achievement and volume. Reading Research Quarterly, 15, 337357. Hackett, G. (1985). The role of mathematics selfefficacy in the choice of mathrelated majors of college women and men: A path analysis. Journal of Counseling Psychology, 32, 4756. Hackett, G., & Betz, N. E. (1989). An exploration of the mathematics selfefficacy/mathematics performance correspondence. Journal for Research in Mathematics Education, 20, 261273. Halpern, D. (1992). Sex differences in cognitive abilities. Hillside, NJ: Lawrence Erlbaum. Halpern, D. F., & LaMay, M. L. (2000). The smarter sex: A critical review of sex differences in intelligence. Educational Psychology Review, 12, 229246. 332 PAGE 348 Hanlon, E. H., & Schneider, Y. (1999, April). Improving math proficiency through selfefficacy training. Paper presented at the Annual Meeting of the American Educational Research Association, Montreal. Hartman, M. (1994). Making sense of math through writing. Perspectives in Education and Deafness, 12, 69. Hassmen, P., & Hunt, D. P. (1994). Human selfassessment in multiplechoice testing. Journal of Educational Measurement, 31, 149160. Hater, M. A., & Kane, R. B. (1975). The cloze procedure as a measure of mathematical English. Journal for Research in Mathematics Education, 6, 121127. Hebert, J. C., Carpenter, T. P., Fennema, E., Fuson, K. C., Human, P. G., Murray, H. G., Olivier, A. I., & Wearne, D. (1996). Problem solving as a basis for reform in curriculum and instruction: The case of mathematics. Educational Researcher, 25, 1221. Hecht, S. A., Torgeson, J. K., Wagner, R. K., & Rashotte, C. A. (2001). The relations between phonological processing abilities and emerging individual differences in mathematical computation skills: a longitudinal study from second to fifth grades. Journal of Experimental Child Psychology, 79, 192227. Hegarty, M., Mayer, R. E., & Monk, C. A. (1995). Comprehension of arithmetic word problems: A comparison of successful and unsuccessful problem solvers. Journal of Educational Psychology, 87, 1832. 333 PAGE 349 Helwig, R., Anderson, L., & Tindal, G. (2001). Influence of elementary student gender on teachers perceptions of mathematics achievement. The Journal of Educational Research, 95, 93102. Helwig, R., RozekTedesco, M. A., Tindal, G., Heath, B., & Almond, P. (1999). Reading as an access to mathematics problem solving on multiplechoice tests for sixthgrade students. Journal of Educational Research, 93, 113125. Helwig, R., RozekTedesco, M. A., & Tindal, G. (2002). An oral versus a standard administration of a largescale mathematics test. The Journal of Special Education, 36, 3947. Hildebrand, C., Ludeman, C. J., & Mullin, J. (1999). Integrating mathematics with problem solving using the mathematicians chair. Teaching Children Mathematics, 5, 434441. Hill, P. W., & Rowe, K. J. (1998). Modeling student progress in studies of educational effectiveness. School Effectiveness and School Improvement, 7, 310333. Homan, S., Hewitt, M., & Linder, J. (1994). The development and validation of a formula for measuring singlesentence test item readability. Journal of Educational Measurement, 31, 349358. Hong, E. (1995). Mental models in word problemsolving: A comparison between American and Korean sixthgrade students. Applied Cognitive Psychology, 9, 123142. 334 PAGE 350 Hunsader, P. D. (2002a). Mathematical language: Comprehension, readability, and testing. Florida Reading Quarterly, 39(2), 610. Hunsader, P. D. (2002b). Why boys fail and what we can do about it. Principal, 82(2), 5254. Hyde, J. S., Fennema, E., & Lamon, S. J. (1990). Gender differences in mathematics performance: A metaanalysis. Psychological Bulletin, 107, 139155. Jackson, D. (1998). Breaking out of the binary trap: Boys underachievement, schooling, and gender relations. In E. Epstein, J. Elwood, V. Hey, & J. Maw (Eds.), Failing boys? Issues in gender and achievement (pp. 7795). Philadelphia: Open University Press. Jitendra, A. K., & Xin, Y. P. (1997). Mathematical wordproblemsolving instruction for students with mild disabilities and students at risk for math failure: A research synthesis. The Journal of Special Education, 30, 412438. Johnson, E. S. (2000). The effects of accommodations on performance assessments. Remedial and Special Education, 21, 261267. Johnson, T. M., Jones, G. A., Thornton, C. A., Langrall, C. W., & Rous, A. (1998). Students thinking and writing in the context of probability. Written Communication, 15, 203229. JohnSteiner, V. (1989). Is mathematics a language? In P. Connolly and T. Vilardi (Eds.), Writing to learn mathematics and science, (pp. 285289). Teachers College Press: New York. 335 PAGE 351 Jones, C. J. (2001). CBAs that work: Assessing students math contentreading levels. Teaching Exceptional Children, 34, 2428. Jordan, N. C., & Hanich, L. B. (2000). Mathematical thinking in secondgrade children with different forms of LD. Journal of Learning Disabilities, 33, 567578. Jordan, N. C., Kaplan, D., & Hanich, L. B. (2002). Achievement growth in children with learning difficulties in mathematics: Findings of a twoyear longitudinal study. Journal of Educational Psychology, 94, 586597. Jordan, N. C., & Montani, T. O. (1997). Cognitive arithmetic and problem solving: A comparison of children with specific and general mathematics difficulties. Journal of Learning Disabilities, 30, 624634. Junge, M. E., & Dretzke, B. J. (1995). Mathematical selfefficacy gender differences in gifted/talented adolescents. Gifted Child Quarterly, 39, 2228. Kahneman, D., & Tversky, A. (1996). On the reality of cognitive illusions. Psychological Review, 103, 582591. Kane, R. B. (1970). The readability of mathematics textbooks revisited. Mathematics Teacher, 63, 579581. Kang, H., & Pham, K. T. (1995, March). From 1 to Z: Integrating math and language learning. Paper presented at the 29 th annual convention of the Teachers of English to Speakers of Other Languages, Long Beach, CA. Kelly, R. R., Mousley, K. (2001). Solving word problems: More than reading issues for deaf students. American Annals of the Deaf, 146, 251262. 336 PAGE 352 Kenyon, R. W. (1989). Writing is problem solving. In P. Connolly and T. Vilardi (Eds.), Writing to Learn Mathematics and Science, (pp. 7394). Teachers College Press: New York. King, J. R. (in press). What can he want? Male teachers, young children, and teaching desire. In W. Martino and M. Mills (Eds.), Masculinities and schooling. Mahwah, NJ: Earlbaum. Kiplinger, V. L., Haug, C. A., & Abedi, J. (2000, April). Measuring mathnot readingon a math assessment: A language accommodations study of English language learning and other special populations. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. Knudson, R. E. (1995). Writing experiences, attitudes, and achievement of first to sixth graders. Journal of Educational Research, 89 (2), 9097. Koivula, N., Hassmen, P., & Hunt, D. P. (2001). Performance on the Swedish scholastic aptitude test: Effects of selfassessment and gender. Sex Roles, 44, 629633. Kramarski, B., Mevarech, Z. R., & Arami, M. (2002). The effects of metacognitive instruction on solving mathematical authentic tasks. Educational Studies in Mathematics, 49, 225250. Kroll, L., & Halaby, M. (1997). Writing to learn mathematics in the primary school. Young Children, 52, 5460. Krulik, S., & Rudnick, J. A. (1987). Problem solving: A handbook for teachers. (2 nd ed.). Boston: Allyn and Bacon. 337 PAGE 353 Krulik, S., & Rudnick, J. A. (1994). Reflectfor better problem solving and reasoning. Arithmetic Teacher, 41, 334335. Krulik, S., & Rudnick, J. A. (1996). The new sourcebook for teaching reasoning and problem solving in junior and senior high school. Boston: Allyn and Bacon. Landi, M. A. G. (2001). Helping students with learning disabilities make sense of word problems. Intervention in School and Clinic, 37, 1318, 30. Lane, S., Wang, N., & Magone, M. (1996). Genderrelated differential item functioning on a middleschool mathematics performance assessment. Educational Measurement, 15 (4), 2127. Laveault, D., Leblanc, R., & Leroux, J. (1999, August). Selfregulated learning of young adolescents in a mathematics activity. Paper presented at the Annual Conference of the American Psychological Association, Boston, MA. Leahey, E., & Guo, G. (2001) Gender differences in mathematical trajectories. Social Forces, 80, 713732. LeBlanc, J. F., Proudfit, L., & Putt, I. J. (1980). Teaching problem solving in the elementary school. In S. Krulik & R. E. Reys (Eds.), Problem solving in school mathematics: 1980 NCTM yearbook, (pp. 104116). Reston, VA: National Council of Teachers of Mathematics. Leder, G. C. (1990). Gender differences in mathematics: An overview. In E. Fennema & G. C. Leder (Eds.), Mathematics and gender (pp. 1026). New York: Teachers College Press. 338 PAGE 354 Leinhardt, G., & Schwartz, B. B. (1997). Seeing the problem: An explanation from Polya. Cognition and Instruction, 15, 395434. Lemke, M., Calsyn, C., Lippman, L., Jocelyn, L., Kastberg, D., Liu, T., Roey, S., Williams, T., Kruger, T., & Bairu, G. (2001). Outcomes of learning: Results from the 2000 Program for International Student Assessment of 15yearolds in reading, mathematics, and science literacy. Washington, DC: U. S. Department of Education, National Center for Education Statistics. Lent, R. W., Brown, S. D., & Gore, P. A. (1997). Discriminant and predictive validity of academic selfconcept, academic selfefficacy, and mathematicsspecific selfefficacy. Journal of Counseling Psychology, 44, 307315. Leong, C. K., & Jerred, W. D. (2001). Effects of consistency and adequacy of language information on understanding elementary mathematics word problems. Annals of Dyslexia, 51, 277298. Lichtenstein, S., & Rischhoff, B. (1977). Do those who know more also know more about how much they know? Organizational Behavior and Human Performance, 20, 159183. Lundeberg, M. A., Fox, P. W., & Puncochar, J. (1994). Highly confident but wrong: Gender differences and similarities in confidence judgments. Journal of Educational Psychology, 86, 114121. 339 PAGE 355 Maccoby, E. E. (1966). Sex differences in intellectual functioning. In E. E. Maccoby (Ed.), The development of sex differences. Stanford: Stanford University Press. MacGregor, M. (1990). Reading and writing in mathematics. In J. BickmoreBrand (Ed.), Langage in mathematics (pp. 100108). Portsmouth, NH: Heinemann. Maddux, S. (2002). Selfefficacy: The power of believing you can. In C. R. Synder & S. J. Lopez (Eds.), Handbook of positive psychology (pp. 277287). London: Oxford University Press. Malecki, C. K., & Jewell, J. (2003). Developmental, gender, and practical consideration in scoring curriculumbased measurement writing probes. Psychology in the Schools, 40, 379390. Malpass, J. R. (1996, April). Selfregulation, goal orientation, selfefficacy, and math achievement. Paper presented at the Annual Meeting of the American Educational Research Association. New York. Manning, M. M. (1999). Building reading skills in math. Teaching PreK8, 29, 8586. McCombs, B. L., & Marzano, R. J. (1990). Putting the self in selfregulated learning; The self as agent in integrating will and skill. Educational Psychologist, 25, 5169. McCoy, L. P. (1994). Mathematical problemsolving processes of elementary male and female students. School Science and Mathematics, 94, 266270. 340 PAGE 356 McIntosh, M. E. (1997). Guide students to better comprehension of word problems. Clearing House, 71, 2632. McIntosh, M. E., & Draper, R. J. (1996). Using the questionanswer relationship strategy to improve students reading of mathematics texts. Clearing House, 69, 154162. McLeod, D. B. (1989). The role of affect in mathematical problem solving. In D. B. McLeod & V. M. Adams (Eds.), Affect and mathematical problem solving: A new perspective (pp. 2036). New York: SpringerVerlag. Michaels, R. (2002). The relationships among problem solving performance, gender, confidence, and attributional style in thirdgrade mathematics. Unpublished doctoral dissertation, University of San Francisco. Miller, L. D. (1993). Making the connection with language. Arithmetic teacher, 40, 311316. Montague, M. (1997). Cognitive strategy instruction in mathematics for students with learning disabilities. Journal of Learning Disabilities, 30, 164177. Morales, R. V. (1998, February). Comprehension and solution patterns of simple math word problems by MexicanAmerican, bilingual, elementary school students. Paper presented at the annual meeting of the National Association for Bilingual Education, Dallas, TX. Mosenthal, P. B., & Kirsch, I. S. (1993). Profiling students quantitative literacy abilities: An approach with precision. Journal of Reading, 36, 668674. Mulrine, A. (2001). Are boys the weaker sex? U.S. News and World Report, 131, 4047. 341 PAGE 357 Multon, K. D., Brown, S. D., & Lent, R. W. (1991). Relation of selfefficacy beliefs to academic outcomes: A metaanalytic investigation. Journal of Counseling Psychology, 38, 3038. Muth, K. D. (1997). Using cooperative learning to improve reading and writing in mathematical problem solving. Reading and Writing Quarterly, 13, 7182. National Assessment of Educational Progress (1986). The mathematics report card: Are we measuring up? (ETS Publication No. 17M01). Educational Testing Service, Princeton. National Assessment Governing Board (1995). Mathematics framework for the 1996 National Assessment of Educational Progress. Washington, DC: Author. National Center for Education Statistics (2000). Trends in educational equity of girls and women. Washington, DC: U. S. Department of Education. National Center for Education Statistics (2003). National Assessment of Educational Progress Released Items. Retrieved January 16, 2004 from https://www.nces.ed.gov/nationsreportcard/itmrls/printproduce.asp. National Council of Teachers of Mathematics (1989). Curriculum and evaluation standards for school mathematics. Reston, VA: Author. National Council of Teachers of Mathematics (1995). Assessment Standards for School Mathematics. Reston, VA: Author. National Council of Teachers of Mathematics (2000). Principles and standards for school mathematics. Reston, VA: Author. 342 PAGE 358 Neef, N. A., Nelles, D. E., Iwata, B. A., & Page, T. J. (2003). Analysis of precurrent skills in solving mathematics word problems. Journal of Applied Behavior Analysis, 36, 2133. Newkirk, T. (2002). Misreading masculinity. Portsmouth, NH: Heinemann. OBrien, V., Kopala, M., & MartinezPons, M. (1999). Mathematics selfefficacy, ethnic identity, gender, and career interests related to mathematics and science. The Journal of Educational Research, 92, 231235. Pajares, F. (1996). Selfefficacy beliefs in academic settings. Review of Educational Research, 66, 543578. Pajares, F., & Miller, M. D. (1994). Role of selfefficacy and selfconcept beliefs in mathematical problem solving: A path analysis. Journal of Educational Psychology, 86, 193203. Pajares, F., & Miller, M. D. (1995). Mathematics selfefficacy and mathematics performance: The need for specificity of assessment. Journal of Counseling Psychology, 42 (2), 190198. Pajares, F., & Miller, M. D. (1997). Mathematics selfefficacy and mathematical problem solving: Implications of using different forms of assessment. The Journal of Experimental Education, 65, 213228. Pajares, F., Miller, M. D., & Johnson, M. J. (1999). Gender differences in writing selfbeliefs of elementary school students. Journal of Educational Psychology, 91, 5061. Palardy, J. M. (1998). The effects of teachers expectations on childrens literacy development. Reading Improvement, 35, 184186. 343 PAGE 359 Pallier, G., Wilkinson, R., Danthir, V., Kleitman, S., Knezivic, G., Stankov, L., & Roberts, R. D. (2002). The role of individual differences in the accuracy of confidence judgments. The Journal of General Psychology, 129, 257299. Pallier, G. (2003). Gender differences in the selfassessment of accuracy on cognitive tasks. Sex Roles, 48, 265276. Paris, S. G., & Paris, A. H., (2001). Classroom applications of research on selfregulated learning. Educational Psychologist, 36, 89101. Park, H., Bauer, S. C., & Sullivan, L. M. (1998). Gender differences among top performing elementary school students in mathematical ability. Journal of Research and Development in Education, 31, 133141. Passolunghi, M. C., & Siegel, L. S. (2001). Shortterm memory, working memory, and inhibitory control in children with difficulties in arithmetic problem solving. Journal of Experimental Child Psychology, 80, 4457. Patton, M. Q. (2002). Qualitative research and evaluation methods (3 rd ed.). Thousand Oaks, CA: Sage. Pau, C. S. (1995). The deaf child and solving problems of arithmetic: The importance of comprehensive reading. American Annals of the Deaf, 140, 287290. Pengelly, H. (1990). Acquiring the language of mathematics. In J. BickmoreBrand (Ed.), Langage in mathematics (pp. 1026). Carlton South, Victoria, Australia: Heinemann. 344 PAGE 360 Peressini, D., & Bassett, J. (1996). Mathematical communication in students responses to a performanceassessment task. In P. S. Elliott & M. J. Kenney (Eds.), Communications in mathematics, K12 and beyond: 1996 NCTM yearbook, (pp. 146158). Reston, VA: National Council of Teachers of Mathematics. Peterson, S. (1998). Evaluation and teachers perceptions of gender in sixthgrade student writing. Research in the Teaching of English, 33, 181208. Peterson, S. (2000). Grades four and eight students and teachers perceptions of girls and boys writing competencies. Reading Horizons, 40, 253271. Peterson, S., & Bainbridge, J. (1999). Teachers gendered expectations and their evaluation of student writing. Reading Research and Instruction, 38, 255271. Phillips, L. M., Norris, S. P., Osmond, W. C., & Maynard, A. M. (2002). Relative reading achievement: A longitudinal study of 187 children from first through sixth grades. Journal of Educational Psychology, 94, 313. Pietsch, J., Walker, R., & Chapman, E. (2003). The relationship among selfconcept, selfefficacy, and performance in mathematics during secondary school. Journal of Educational Psychology, 95, 589603. Pintrich, P. R., & Schrauben, B. (1992). Students motivational beliefs and their cognitive engagement in classroom academic tasks. In D. H. Schunk & J. L. Meece (Eds.), Student perceptions in the classroom (pp. 149183). Hillsdale, NJ: Erlbaum. 345 PAGE 361 Polya, G. (1957). How to solve it. (2 nd ed.). Princeton, NJ: Princeton University Press. Pomplun, M., & Capps, L. (1999). Gender differences for constructedresponse mathematics items. Educational and Psychological Measurement, 59, 597614. Pottorff, D. D., PhelpsZientarski, D., & Skovera, M. E. (1996). Gender perceptions of elementary and middle school students about literacy at school and home. Journal of Research and Development in Education, 29, 203209. Powell, A. B. (1997). Capturing, examining, and responding to mathematical thinking through writing. The Clearing House, 71, 2125. Pressley, M. (1995). More about the development of selfregulation: Complex, longterm, and thoroughly social. Educational Psychologist, 30, 207212. Pugalee, D. (1997). Connecting writing to the mathematics curriculum. Mathematics Teacher, 90, 308310. Pugalee, D. (1998). Promoting mathematical learning through writing. Mathematics in School, 27, 2022. Pugalee, D. (2001). Writing, mathematics, and metacognition: Looking for connections through students work in mathematical problem solving. School Science and Mathematics, 101, 236245. Quinn, R. J., & Wilson, M. M. (1997). Writing in the mathematics classroom: Teacher beliefs and practices. The Clearing House, 71, 1420. 346 PAGE 362 Randhawa, B. S., Beamer, J. E., & Lundberg, I. (1993). Role of mathematics selfefficacy in the structural model of mathematics achievement. Journal of Educational Psychology, 85, 4148. Reed, S. K. (1999). Word problems: Research and curriculum reform. Mahwah, NJ: Lawrence Erlbaum. Reehm, S. P., & Long, S. A. (1996). Reading in the mathematics classroom. Middle School Journal, 27(5), 3541. Rosenblatt, L. M. (1978). The reader, the text, the poem: The transactional theory of the literary work. Carbondale, IL: Southern Illinois University Press. Rosenholtz, S. J., & Simpson, C. (1984). The formation of ability conceptions: Developmental trend or social construction? Review of Educational Research, 54, 3163. Routman, R. (1994). Invitations: Changing as teachers and learners, K12. Portsmouth, NH: Heinemann. Rowe, K. J. (August, 2000). Problems in the education of boys and exploring real effects from evidencebased research: Useful findings in teaching and learning for boys and girls. Background paper of keynote address presented at the Teaching Boys Developing Fine Men Conference, Brisbane, Australia. Ruddell, M. R. (1997). Teaching content area reading and writing (2 nd ed.). Boston: Allyn & Bacon. 347 PAGE 363 Rudnitsky, A., Etheredge, S., Freeman, S. J. M., & Gilbert, T. (1995). Learning to solve addition and subtraction word problems through a structurepluswriting approach. Journal for Research in Mathematics Education, 26, 467486. Schell, V. J. (1982). Learning partners: Reading and mathematics. The Reading Teacher, 35, 544548. Schoenfeld, A. H. (1983). Episodes and executive decisions in mathematical problem solving. In R. Lash & M. Landau (Eds.), Acquisition of mathematics concepts and processes (pp. 345395). New York: Academic. Schoenfeld, A. H. (1989). Teaching mathematical thinking and problem solving. In L. B. Resnick & L. E. Klopfer (Eds.), Toward the thinking curriculum: Current cognitive research (pp. 83103). [ASCD Yearbook]. Alexandria, VA: Association for Supervision and Curriculum Development. Schraw, G. (1995). Measures of feelingofknowing accuracy: A new look at an old problem. Applied Cognitive Psychology, 9, 321332. Schunk, D. H. (1989). Social cognitive theory and selfregulated learning. In B. J. Zimmerman and D. Hl. Schunk (Eds.), Selfregulated learning and academic achievement (pp. 83110). New York: SpringerVerlag. Schunk, D. H. (1995, April). Selfmonitoring and skill acquisition through selfevaluation of capabilities. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco, CA. 348 PAGE 364 Schunk, D. H. (1996, October). Selfevaluation and selfregulated learning. Paper presented at the Graduate School and University Center, City University of New York, New York, NY. Schunk, D. H., & Gunn, T. P. (1986). Selfefficacy and skill development: Influence of task strategies and attributions. Journal of Educational Research, 79, 238244. Seegers, G., & Boekaerts, M. (1996). Genderrelated differences in selfreferenced cognitions in relation to mathematics. Journal for Research in Mathematics Education, 27, 215240. Shepard, R. G. (1993). Writing for conceptual development in mathematics. Journal of Mathematical Behavior, 12, 287293. Smith, D. (1973). Yes, American schools are feminized. Phi Delta Kappan, 54, 703704. Sovik, N., Frostrad, P., & Heggberget, M. (1999). The relation between reading comprehension and taskspecific strategies used in arithmetical word problems. Scandinavian Journal of Educational Research, 43, 371398. Sovik, N., Frostad, P. & Lie, A. (1994). Can discrepancies between IQ and basic skills be explained by learning strategies? British Journal of Educational Psychology, 64, 389405. Spivey, N. N., & King, J. R. (1994). Readers as writers composing from sources. In R. B. Ruddell, M. R. Ruddell, and H. Singer (Eds.), Theoretical models and processes of reading (pp. 668694). Newark, DE: International Reading Association. 349 PAGE 365 State of Florida Department of Education (1996). Sunshine State Standards: Mathematics. Retrieved July 16, 2004, from http://www.firn.edu/doe/curric/prek12/pdf/math3.pdf State of Florida Department of Education (2001). FCAT performance task scoring practice for educators: Grade 5 mathematics. Florida Department of Education, Tallahassee, FL. State of Florida Department of Education (2001). Florida Comprehensive Assessment Test, Grade 5 Mathematics, Released Items. Retrieved January 16, 2004 from http://www.firn.edu/doe/sas/fcat/pdf/fc5mib1r.pdf. State of Florida Department of Education (2004). Understanding FCAT reports 2004. Retrieved May 31, 2004 from http://www.firn.edu/doe/sas/fcat/pdf/fc_ufr2004.pdf State of Florida Department of Education (2005). Evaluation and Reporting. Retrieved May 14, 2005 from http://www.firn.edu/doe/evaluation/home0018.htm. Stevens, C. A., & Zaharias, J. A. (1997). Problemsolution relationship instruction: A method for enhancing students comprehension of word problems. Focus on Learning Problems in Mathematics, 19, 1442. Stevens, J. (1986). Applied multivariate statistics for the social sciences. Hillsdale, NJ: Lawrence Erlbaum Associates. Stevens, J. (1999). Intermediate statistics (2 nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. 350 PAGE 366 Stipek, D. J., & Gralinksi, J. H. (1991). Gender differences in childrens achievementrelated beliefs and emotional responses to success and failure in mathematics. Journal of Educational Psychology, 83(3), 361371. Tartre, L. A., & Fennema, E. (1995). Mathematics achievement and gender: A longitudinal study of selected cognitive and affective variables. Educational Studies in Mathematics, 28, 199217. Thompson, D. R., & Rubenstein, R. N. (2000). Learning mathematics vocabulary: Potential pitfalls and instructional strategies. Mathematics Teacher, 93, 568574. Thompson, G. W., Strackbein, D., & Williams, J. D. (1992). Gender differences in an experimental program on arithmetic problem solving and computation. MidWestern Educational Researcher, 5, 2023. Thorndike, E. G. (1917). Reading as reasoning: A study of mistakes in paragraph reading. Journal of Educational Psychology, 8, 323332. Tierney, R. J., & Mosenthal, J. (1985). Discourse comprehension and production: Analyzing text structure and cohesion. In H. Singer and R. B. Ruddell (Eds.), Theoretical models and processes of reading (pp. 55104). Newark, DE: International Reading Association. Tindal, G., Heath, B., Hollenbeck, K., Almond, P., & Harniss, M. (1998). Accommodating students with disabilities on largescale tests: An empirical study of student response and test administration demands. Exceptional Children, 64, 439450. 351 PAGE 367 Tovani, C. (2000). I read it, but I dont get it: Comprehension strategies for adolescent readers. Portland, ME: Stenhouse Publishers. Usiskin, Z. (1996). Mathematics as a language. In P. S. Elliott & M. J. Kenney (Eds.), Communications in mathematics, K12 and beyond: 1996 NCTM yearbook, (pp. 231243). Reston, VA: National Council of Teachers of Mathematics. Van de Walle, J. A. (2001). Elementary school mathematics. Boston, MA: Pearson Custom Publishing. Van Dijk, T. A., & Kintsch, W. (1977). Cognitive psychology and discourse: Recalling and summarizing stories. In W. U. Dressler (Ed.), Current trends in textlinguistics (pp. 794812). New York: De Gruyter. van Kraayenoord, C. E., & Paris, S. G. (1997). Australian students selfappraisal of their work samples and academic progress. The Elementary School Journal, 97, 523537. Vermeer, H. J., Boekaerts, M., & Seegers, G. (2000). Motivational and gender differences: Sixthgrade students mathematical problemsolving behavior. Journal of Educational Psychology, 93(2), 308315. Vermeer, H., Seegers, G., & Boekaerts, M. (1994). Solving math problems: Degree of association between confidence ratings and effective strategy use. Paper presented at the Annual Conference of the Dutch Educational Research Association, Amsterdam. 352 PAGE 368 Vogel, S. (1990). Gender differences in intelligence, language, visualmotor abilities, and academic achievement in students with learning disabilities: A review of the literature. Journal of Learning Disabilities, 23, 4452. Vygotsky, L. S. (1987). Thinking and speech. In A. S. Carton (Ed.), The collected works of L. S. Vygotsky (p. 39243). New York: Plenum Press. Wagemaker, H. (1993). Reading achievement of New Zealand 9and 14yearolds. In H. Wagemaker (Ed.), Achievement in reading literacy: New Zealands performance in a national and international context (pp. 3358). Wellington, New Zealand: Ministry of Education. Wakefield, D. V. (2000). Math as a second language. The Educational Forum, 64, 272279. Walkerdine, V. (1998). Counting girls out: Girls and mathematics. London: Falmer Press. Walsh, M., Hickey, C., & Duffy, J. (1999). Influence of item content and stereotype situation on gender differences in mathematical problem solving. Sex Roles, 41, 219239. Wardrop, H. (1993). Mathematics language problems. The Australian Mathematics Teacher, 49, 1013. Webster, A., & Henriksson, W. (2000). The interaction between item format and gender differences in mathematics performance based on TIMSS data. Studies in Educational Evaluation, 26, 7990. 353 PAGE 369 Wilkinson, I. A. G. (1998). Dealing with diversity: Achievement gaps in reading literacy among New Zealand students. Reading Research Quarterly, 33, 144167. Williams, J. E. (1994). Gender differences in high school students efficacyexpectation/performance discrepancies across four subject matter domains. Psychology in the Schools, 31, 232237. Williams, R. L., & Yore, L. D. (1985). Content, format, gender, and grade level differences in elementary students ability to read science materials as measured by the cloze procedure. Journal of Research in Science Teaching, 22, 8188. Willingham, W. W., & Cole, N. S. (1997). Gender and fair assessment. Mahwah, NJ: Lawrence Erlbaum. Wilson, L. D., & Zhang, L. (1998, April). A cognitive analysis of gender differences on constructedresponse and multiplechoice assessments in mathematics. Paper presented at the Annual Meeting of the American Educational Research Association, San Diego, CA. Winne, P. H. (1995). Inherent details in selfregulated learning. Educational Psychologist, 30, 173187. Winne, P. H. (1996). A metacognitive view of individual differences in selfregulated learning. Learning and Individual Differences, 8, 327353. Wong, R. M., Lawson, M. J., & Reeves, J. (2002). The effects of selfexplanation training on students problem solving in highschool mathematics. Learning and Instruction, 12, 233262. 354 PAGE 370 Woodward, J., Monroe, K., & Baxter, J. (2001). Enhancing student achievement on performance assessments in mathematics. Learning Disabilities Quarterly, 24, 3346. Wright, C. R., & Houck, J. W. (1995). Gender differences among selfassessments, teacher ratings, grades, and aptitude test scores for a sample of students attending rural secondary schools. Educational and Psychological Measurement, 55, 743752. Yates, J. F. (1990). Judgment and decision making. Englewood Cliffs, NJ: Prentice Hall. Young, J. P., & Brozo, W. G. (2001). Boys will be boys, or will they? Literacy and masculinities. Reading Research Quarterly, 36, 316325. Zambo, R. W. (1990). An investigation of possible genderrelated differences in the process of solving arithmetic word problems in the sixth and eighth grades. Unpublished doctoral dissertation, University of South Florida, Tampa. Zambo, R., & Follman, J. (1994). Genderrelated differences in problem solving at the 6 th and 8 th grade levels. Focus on Learning Problems in Mathematics, 16, 2038. Zambo, R., & Hess, R. K. (1996). The gender differential effects of a procedural plan for solving mathematical word problems. School Science and Mathematics, 96, 362370. 355 PAGE 371 Zhang, L., Wilson, L., & Manon, J. (1999, April). An analysis of gender differences on performance assessment in mathematics a followup study. Paper presented at the Annual Meeting of the American Educational Research Association, Montreal, Quebec, Canada. Zimmerman, B. J. (1989). A social cognitive view of selfregulated academic learning. Journal of Educational Psychology, 81, 329339. Zimmerman, B. J. (1990). Selfregulated learning and academic achievement. Educational Psychologist, 25, 317. Zimmerman, B. J., & MartinezPons, M. (1990). Student differences in selfregulated learning: Relating grade, sex, and giftedness to selfefficacy and strategy use. Journal of Educational Psychology, 82, 5159. 356 PAGE 372 APPENDIX A FCAT MATHEMATICS PERFORMANCE SCORING RUBRIC 357 PAGE 373 FCAT MATHEMATICS PERFORMANCE SCORING RUBRIC 4 Points: A score of four is a response in which the student demonstrates a thorough understanding of the mathematics concepts and/or procedures embodied in the task. The student has responded correctly to the task, used mathematically sound procedures, and provided clear and complete explanations and interpretations. The response may contain minor flaws that do not detract from the demonstration of a thorough understanding. 3 Points: A score of three is a response in which the student demonstrates an understanding of the mathematics concepts and/or procedures embodied in the task. The students response to the task is essentially correct with the mathematical procedure used and the explanations and interpretations provided demonstrate an essential, but less than thorough, understanding. The response may contain minor flaws that reflect inattentive execution of mathematical procedures or indications of some misunderstanding of the underlying mathematics concepts and/or procedures. 2 Points: A score of two indicates that the student has demonstrated only a partial understanding of the mathematics concepts and/or procedures embodied in the task. Although the student may have used the correct approach to obtaining a solution or may have provided a correct solution, the students work lacks the essential understanding of the underlying mathematical concept. The response contains errors related to misunderstanding important aspects of the task, misuse of mathematical procedures, or faulty interpretations of results. 1 Point: A score of one indicates that the student has demonstrated a very limited understanding of the mathematics concepts and/or procedures embodied in the task. The students response is incomplete and exhibits many flaws. Although the students response has addressed some of the conditions of the task, the student reached an inadequate conclusion and/or provided reasoning that was faulty or incomplete. The response exhibits many flaws or may be incomplete. 0 Points: A score of zero indicates that the student has provided no response at all, or a completely incorrect or uninterpretable response, or demonstrated insufficient understanding of the mathematics concepts and/or procedures embodied in the task. For example, a student may provide some work that is mathematically correct, but the work does not demonstrate even a rudimentary understanding of the primary focus of the task. Source: State of Florida Department of Education (2004), Understanding FCAT Reports 2004. 358 PAGE 374 APPENDIX B AMENDED RUBRIC FOR SCORING LINGUISTIC EXPLANATIONS 359 PAGE 375 AMENDED RUBRIC FOR SCORING LINGUISTIC EXPLANATIONS For all scoring, symbols included in a written explanation are regarded as text. 3 Points: A score of three indicates that the student has provided a thorough written explanation of the problem solving process that conveys conceptual understanding and includes the context of the problem. The student has explained the strategies and/or steps used to solve the problem in an organized, clear, and complete manner. The writing may contain minor flaws that do not detract from the overall quality of the explanation. 2 Points: A score of two indicates that the student has provided only a partial written explanation of the problem solving process that conveys some conceptual understanding. The student has explained some of the strategies used to solve the problem, but has omitted one or more important steps; the student has explained all of the computation, but has failed to include the problem context in the explanation; or the student has provided a reasonably clear explanation of seriously flawed mathematics. The writing may contain flaws that reflect inattention to detail or faulty interpretation of the process. 1 Point: A score of one indicates that the student has provided only a limited amount of accurate written text to explain the problem solving process or the operation used. This text may take the form of labels, descriptors, or minor explanations, but fails to convey the essence of the process. The writing may contain flaws. 0 Points: A score of zero indicates that the student has not provided any written text to explain the problem solving process, or has provided text that is incorrect and/or fails to relate to the problem. Restating the answer in words does not constitute text to explain the process. 360 PAGE 376 APPENDIX C PILOT STUDY RUBRIC FOR SCORING LINGUISTIC EXPLANATIONS 361 PAGE 377 PILOT STUDY RUBRIC FOR SCORING LINGUISTIC EXPLANATIONS 3 Points: The student provided a linguistic explanation that effectively summarized all steps of the problem in a logical manner. 2 Points: Either the student provided a logical explanation that failed to address one important step of the problem, or the student provided an explanation that did address all steps of the problem but was not entirely logical. 1 Point: Either the student provided effective labels and descriptors but no explanation, provided minimal explanation that failed to address most of the steps of the problem, or provided explanation that was not logical or did not support computation work or solution. 0 Points: The student did not provide any text to support the computational work. 362 PAGE 378 APPENDIX D SAMPLE STUDENT WORK FROM PILOT STUDY TO ILLUSTRATE SCORING 363 PAGE 379 SAMPLE STUDENT WORK FROM PILOT STUDY TO ILLUSTRATE SCORING All samples are taken from the Jeans class problem which follows: In Jeans class, there are twice as many boys as girls. If there are 10 girls in the class, how many boys and girls are there in the class? Show your work, then use words or pictures to explain how you got your answer. The following response scored a 3 for performance and a 0 for linguistic explanation. The student clearly understood the problem situation and arrived at a correct answer, but the response could not be given a score of 4 since it was lacking an explanation. The total lack of writing resulted in the 0 score for linguistic explanation. The following response earned a performance score of 4 because the student demonstrated a thorough understanding of the mathematics concepts embodied in the task, provided a correct response, and provided some text to explain the procedures used. The response scored a 1 for linguistic explanation because the student provided only a limited amount of text in the form of labels and minor descriptors. 364 PAGE 380 Appendix D (Continued) 365 The following response earned a performance score of 2. Although the student used a correct approach to determining the solution, the work contains an error related to misunderstanding an important aspect of the task, namely finding the total number of boys and girls. The linguistic explanation earned a score of 2 because the student explained one of the major strategies used to solve the problem, but omitted an important step. PAGE 381 APPENDIX E FCAT ACHIEVEMENT LEVELS 366 PAGE 382 FCAT ACHIEVEMENT LEVELS Achievement levels describe the success a student has achieved on the Florida Sunshine State Standards tested on the FCAT. Achievement levels range from 1 to 5, with Level #1 being the lowest and Level #5 being the highest. Level 5 This student has success with the most challenging content of the Sunshine State Standards. A student scoring in Level 5 answers most of the test questions correctly, including the most challenging questions. Level 4 This student has success with the challenging content of the Sunshine State Standards. A student scoring in Level 4 answers most of the test questions correctly, but may have only some success with questions that reflect the most challenging content. Level 3 This student has partial success with the challenging content of the Sunshine State Standards, but performance is inconsistent. A student scoring in Level 3 answers many of the test questions correctly but is generally less successful with questions that are the most challenging. Level 2 This student has limited success with the challenging content of the Sunshine State Standards. Level 1 This student has little success with the challenging content of the Sunshine State Standards. Source: State of Florida Department of Education (2004), Understanding FCAT Reports 2004. 367 PAGE 383 APPENDIX F PILOT STUDY 368 PAGE 384 PILOT STUDY Purpose The purpose of the pilot study was primarily to ensure that the testing instrument, testing procedures, and scoring/coding procedures would produce the data necessary to answer the five research questions. Additional purposes for the pilot study included ensuring that the process of obtaining informed consent would result in a sufficient sample size, that the ability grouping process would accurately and efficiently categorize the sample students into groups, and that the followup interviews with selected students would serve as a member check to verify students problem solving processes. Goals for Piloting the Test Instrument, Test Procedures, and Scoring/Coding Procedures Test Instrument Regarding the test instrument, the researcher wished to use information garnered from the smallscale pilot administration of the test to determine the following: 1) Appropriateness of the length of the12item constructed response test, 2) Appropriateness of the range of difficulty of the individual test items, 3) Appropriateness of the order of the test items, 4) Quality and quantity of the written work provided by students, 5) Diversity of responses that the test items would generate, 369 PAGE 385 Appendix F (Continued) 6) Ability of the selfefficacy and selfassessment scales to capture students thinking during the testing event. A maximum test length of one hour was deemed to be appropriate for upperelementary students given that mathematics lessons are generally one hour in length and recognizing that a test in excess of one hour may result in student fatigue and lead to spurious results. Regarding item difficulty, the goal was to include some items that would likely be solved correctly by most students, some which would be correctly solved by some students, and some items that would only be successfully solved by the highestlevel students. The easier items would give all students a chance to experience some success, and the more difficult items would allow for the scores to differentiate between students. The test items were ordered beginning with some of the easier items to encourage students to become engaged in the test. The remaining items were spaced to allow for a variety of mathematical and reading difficulty, content, and context throughout the test. Because one of the research questions involves gender differences in childrens problem solving processes, and another involves the quality of students linguistic explanation, the test items must elicit enough written work and explanation from the students for the researcher to be able to draw conclusions to answer these two questions. In order to differentiate between students problem solving processes, the test items must elicit a diverse set of responses that are clearly differentiated. The first test item, the flower pot problem, was 370 PAGE 386 Appendix F (Continued) chosen knowing that it was below grade level in difficulty and would not likely result in a wide variety of student responses. Its primary purpose was to provide students with an entrance point into the test. Unlike the other test items, its value will not be judged based on the diversity of student responses. Finally, the selfefficacy and selfassessment scales must be readily understood by students and effectively integrated into their testing experience. Students must be able to differentiate between the faces and be able to select the face that correctly represents their feelings at the appropriate time. Testing Procedures The desired goals for piloting the testing procedures included the following: 1) Assess whether using a sample test item to introduce the test format, selfefficacy scale, and selfassessment scale will sufficiently prepare students for the test administration, 2) Determine what testing location (in the classroom or in a separate room) would best serve the test administration, Scoring/Coding Procedures The goals for piloting the scoring procedures include the following: 1) Practice using the FCAT holistic fourpoint rubric for scoring test items, 2) Use the rubric scoring process to identify anchor papers: responses to each test item that exemplify the quality of an answer earning a score of 4, 3, 2, 1, or 0. 371 PAGE 387 Appendix F (Continued) 3) Use the researchercreated threepoint scoring rubric for assessing the quality of students linguistic explanations to determine the extent to which the rubric promotes both convergence of responses within a score and divergence of responses between scores, 4) Practice coding students problem solving processes to determine the extent to which responses can be classified into meaningful categories. Design The intent of the study is to answer the five research questions using a nonexperimental causalcomparative design with both qualitative and quantitative methods. The three independent variables are gender (2 levels), reading ability (2 levels: competent or low), and mathematics ability (2 levels, competent or low). The five dependent variables for each participant include their test score as assessed with the FCAT fourpoint rubric, the overall score for quality of linguistic explanations as assessed with a researchercreated threepoint rubric, a categorical score relating to students choice of process for solving each problem, a measure of selfefficacy for each test item, and a measure of selfassessment for each test item. Population and Sample Selection The population that this pilot study sampled consisted of the students from one fifthgrade and one fourthgrade class at an elementary school in Florida. 372 PAGE 388 Appendix F (Continued) The full study will take place in the first half of the school year with fifthgrade students, while the pilot study was conducted at the end of the school year. The concern with limiting the pilot to fifthgrade students was that because the students available for study at the time of the pilot had benefited from nearly the entire year of fifthgrade mathematics instruction, whereas the students in the full study will have only completed a fourth of the fifthgrade mathematics curriculum, the pilot sample may not be sufficiently representative of the full study sample. For that reason, the decision was made to conduct the pilot with endofyear students from both the fourthand fifthgrades to obtain a better measure of the responses the full study participants might produce. Parental permission was obtained through informed consent documents approved by the Institutional Review Board of the University of South Florida. Approximately half of the students obtained parental approval, resulting in a total of 10 fourthgrade (7 female and 3 male), and 13 fifthgrade (7 female and 6 male) participants. The two teachers informally evaluated their students reading and mathematics ability by rating them as high, middle, or low, for each of the two subject areas. Prior year FCAT scores for mathematics and reading were also obtained. The teacher ratings were assigned a numerical value of high=3, middle=2, and low=1. The FCAT scores were also assigned a numerical value of 3 (high) for a score of 5 or 4, the score of 2 (middle) for an FCAT score of 3, and a score of 1 (low) for an FCAT score of 2 or 1. Each students two scores (teacher rating and FCAT) were averaged to obtain a categorization of either high, middle, or low for the 373 PAGE 389 Appendix F (Continued) purposes of this study. Students whose averages resulted in either a 1.5 or 2.5 score were categorized based on teacher rating. The result of the categorization system was a total of 18 groups (2 levels of gender x 3 levels of reading ability x 3 levels of mathematics ability). Table 38 below shows the number of students in each group. Table 38 Categorization of Pilot Students Mathematics Ability Reading Ability High Middle Low High Males4 Males0 Males0 Females6 Females2 Females0 Middle Males0 Males4 Males0 Females2 Females2 Females1 Low Males0 Males0 Males1 Females0 Females0 Females1 Instrumentation The test to be administered to the participants includes 12 constructedresponse mathematics problemsolving items. A copy of the test is provided in Appendix H. A detailed description of the test, its development, and its validity is included in Chapter 3 Methods. 374 PAGE 390 Appendix F (Continued) Reliability The reliability of the pilot test was assessed through a measure of internal consistency. The Cronbach alpha coefficient for mathematical performance scores on the test was .89, a number considered satisfactory following the guidelines of Guilford and Fruchter (1978). The Cronbach alpha for linguistic explanations was slightly lower at .75. The Cronbach alpha coefficients for selfefficacy and selfassessment were lower, .55 and .68 respectively, primarily due to missing data. Data Collection In order to fulfill the goals of the pilot study, the study test was first administered to a group of fourthgrade students (n=10, 7 female and 3 male). One week following that administration, the test was given to a group of fifthgrade students (n=13, 7 female and 6 male). The purpose of the delay was to allow time for any changes in the test instruments length or level of difficulty deemed necessary after the first administration. The student work for each question was evaluated based on the FCAT fourpoint rubric (see Appendix A for a copy of the scoring rubric), and the quality of the linguistic explanation based on a researcherdeveloped threepoint rubric (see Appendix B for a copy of the scoring rubric). Students indications of selfefficacy and selfassessment on the fourface scale were converted to a numerical value ranging from one to four. Participation in the study was voluntary, and parental permission through informed consent was obtained in advance for each participant (see Appendix H 375 PAGE 391 Appendix F (Continued) for consent forms). Informed consent was obtained through a twostep process. First, a brief letter from the principal describing the study was sent home with all students in one fourthgrade class and one fifthgrade class (see Appendix I for a copy of the letter). At the bottom of the letter was a place for a parent to indicate no if they were not willing for their child to participate, or yes if they were interested in having their child participate and would like a copy of the informed consent documents sent home. For those parents who indicated yes, a packet containing a cover letter from the researcher and two copies of the informed consent document were sent home. All fourthgraders whose parents initially said yes returned a signed copy of the informed consent. All but one fifthgrade student whose parents said yes returned a signed copy of the informed consent. Organization of Testing The fourthgrade administration of the test was conducted in the regular classroom while nonparticipating students wrote in their journals or read silently. Participants sat at round tables with up to four students at each table. The fifthgrade administration of the test was conducted in a multipurpose room with students sitting at individual desks. Before the test began, each student was given a copy of the test booklet and asked to enter their name, teacher name, school, and gender on the cover page. When all students were finished, they were instructed to turn to the first question of the test, a practice item, and were asked to read the question silently. Next, the researcher explained the format of the test page, indicating that the remaining twelve test items were formatted in 376 PAGE 392 Appendix F (Continued) the same fashion. The instructions for the selfefficacy scale were read aloud by the researcher and explained. The students were then lead in an open discussion about what circumstances would result in them choosing each of the four faces. Students were encouraged to ask questions if they did not understand. Next, the instructions for the selfassessment scale were read aloud by the researcher and explained. The students were then lead in an open discussion of what level of performance would result in them choosing each of the four faces. Students were instructed to show all their work on all questions, and turn in their test booklet to the researcher when they were finished. No time limit was discussed or imposed. The only questions the researcher answered during the test administration were those pertaining to the test format, or scoring of the selfefficacy or selfassessment scales. As each student completed the test, the total testing time was recorded on his/her test booklet. Analysis of Data Upon data collection from the fourthgrade students, the researcher performed a cursory review of the students work and determined that while one student was highly unsuccessful, most students were able to attempt most of the test items, indicating an appropriate level of difficulty. In addition, since all students completed the test in less than 45 minutes, the test length was confirmed as being acceptable. For these reasons, no adjustments were made to the test before the fifthgrade administration. 377 PAGE 393 Appendix F (Continued) After the test was administered, each student response was first evaluated using the FCAT fourpoint rubric. Student responses that were indicative of each possible score were summarized and recorded to ensure consistency in scoring. The linguistic explanations were then scored using a threepoint rubric. Students indications of selfefficacy and selfassessment were converted to numerical scores and all scores were entered at the top of the page for each question. Descriptive statistics were calculated to summarize the data. The researcher then reviewed the mean scores for performance to determine the appropriateness of the order of the questions. The biggest concern was whether students would continue to expend effort after dealing unsuccessfully with a difficult question. When all scoring was completed, all student responses for the first question were sorted according to the process used to solve the problem. Categories were continually reviewed to ensure both convergence and divergence. Each process category was summarized and assigned a numerical value that was entered at the top of each test page, and frequencies for each process category were tallied. No judgments were made as to the superiority of one process over another. This process was repeated for each of the twelve questions. Upon completion of preliminary data analysis, four students (2 male and 2 female) were selected for followup interviews. Because the primary purpose of the interview was to ensure that the researcher had correctly interpreted the 378 PAGE 394 Appendix F (Continued) students work when categorizing the processes, the most critical students to interview were those for whom there existed the greatest possibility of categorization error. For this reason, the students selected for interview were those who scored at least an average of two points on the FCAT fourpoint rubric but scored the lowest for linguistic explanation. The one boy and one girl in each of the fourthgrade and fifthgrade group who best fit these criteria were selected for interview. One additional boy and girl from each grade were also selected in case the firstchoice student was absent on the day of the interview. Student interview data were reviewed to determine the extent to which the researcher accurately categorized the students work, and the extent to which the student explanations for selection of faces for selfefficacy and selfassessment resulted in both convergence among and divergence between the ratings. FollowUp Interviews Interviews were conducted with four students, one girl and one boy each from fourth and fifth grade. The goals for the interviews were to verify the researchers interpretation of the process used to solve each problem, and ensure that the students had correctly interpreted the fourface scales for selfefficacy and selfassessment. All interviews were videotaped and later transcribed. The interviews were held privately in the schools conference room with only the student and the researcher present. 379 PAGE 395 Appendix F (Continued) The interview began with the researcher explaining the purpose of the interview and the need for the video camera. Students were asked to give their consent before the actual interview began, and all students consented. During the interview, the student was presented with his/her own test paper and asked to read the first test question aloud. The student was asked why he/she circled the face he/she circled for the selfefficacy measure. The researcher then asked the student to take as much time as necessary to refamiliarize him/herself with his/her work, then to explain to the researcher what the student did. To conclude each question, the researcher asked the child to explain why he/she chose the particular face for the selfassessment scale. All twelve questions were handled in the same fashion, resulting in an interview of 3540 minutes in duration. Transcription from video tape was a cumbersome process, but the video images provided helpful information that aided the researcher in recognizing what specific part of the students work was being discussed. For the full study, the interviews will be both audiotaped and videotaped. The audiotape will be used to produce a transcript of dialogue. The videotape will be used to supplement the transcript with additional descriptions of the action taking place during the interview. 380 PAGE 396 Appendix F (Continued) Results Results Involving the Test Instrument and Pilot Procedures Test Length, Item Difficulty, and Item Placement The mean amount of time students required to take the test was 33 minutes, with a range of 16 minutes to 56 minutes. This length was considered appropriate, confirming the use of twelve items for the test. The mean performance score for all student responses for all twelve questions was 2.54 on a fourpoint scale with a standard deviation of 1.24. Table 39 presents the mean and standard deviation of the scores for each of the twelve questions, in their order of appearance on the test. The flower pot problem was thought to be the easiest problem on the test based on NAEP data, and it proved to be the highest scoring item with the lowest standard deviation on the pilot test. Students had the most difficulty with the string problem involving division of fractions. Surprisingly, according to NAEP data this was only of midrange difficulty. The range of mean scores (1.39 to 3.35) was deemed acceptable for the purposes of this study. In addition, students performed relatively well (M=2.70) on the balloon problem that immediately followed the most difficult problem. This indicates that students were not overly discouraged by the difficult problem and were able to continue to give a strong effort; hence, its placement was appropriate. The birdseed problem proved to be the second most difficult problem on the test (M=1.96), and was 381 PAGE 397 Appendix F (Continued) Table 39 Pilot Study Summary of Performance Rubric Scores by Item Test Question M SD Flower pot 3.35 0.57 Ms. Hernandez 2.87 1.14 Calories 2.30 1.06 Jeans class 2.61 1.08 Pizza 2.48 1.24 String 1.39 1.20 Balloons 2.70 1.29 Flour 2.65 1.30 Calculator 2.70 1.40 School lunch 2.78 1.40 Pencils 2.65 1.30 Birdseed 1.96 1.25 Note. n=23, the maximum possible score for each item was 4.0 placed at the end of the test following an item on which students performed relatively well (M=2.65). This indicates that fatigue was not likely a factor in the score of this item, and confirms its placement. Students Written Work A vast majority of the students showed their work for each problem, making categorization of students problem solving processes possible. The 382 PAGE 398 Appendix F (Continued) diversity of student processes varied according to problem, with a low of four discernable problemsolving paths for the flower pot problem, and a high of ten different paths for solving the balloon problem. This level of diversity will allow for analysis of gender differences in the full study, though the greater sample size will likely result in a greater number of problemsolving processes and necessitate the combination of similar processes. The followup interviews confirmed the researchers categorization of students problemsolving processes. SelfEfficacy and SelfAssessment Scales The followup interviews were the only true source of information as to whether students understood and correctly employed the fourface scales for selfefficacy and selfassessment. Transcripts from the four interviews were reviewed, and students explanations for choosing each of the four faces were copied and pasted on a list according to face. Student explanations for choosing the happiest face were fairly consistent (convergent) and strongly differentiated from explanations for other faces (divergent). The same was true for the saddest of the four faces. The researcher discovered a problem with the middle two faces in that students comments for each of the faces were similar. Students viewed both the slightlyhappy and slightlysad faces as representing partial success and partial failure. The decision to use only four faces was originally made to avoid students potential overselection of a neutral, middle face. As a result of the pilot study, and in keeping with other similar research (Boekarts et 383 PAGE 399 Appendix F (Continued) al., 1995; Vermeer et al., 2000), the decision was made to include a fifth, neutral face in the selfefficacy and selfassessment scales for the full study. Testing Procedures For both test administrations, the students appeared to grasp the test format and the purpose and procedure for the selfefficacy and selfassessment scales. Six students neglected to mark one or more selfefficacy or selfassessment scores resulting in 17 missing scores out of a total of 552 scores (3.08%). Nine of the seventeen missing scores were from one student, the same student who was identified as an outlier during the data analysis. If this students scores were eliminated from the analysis, the percentage of missing scores would drop to 1.5%. In an effort to reduce this percentage for the full study, the researcher will remind students during the test administration to be sure to mark both scales for each question. Regarding test location, although both pilot locations were acceptable, the quiet and lack of distractions experienced in the multipurpose room provided an atmosphere more conducive to student concentration. When possible, students will be tested in a separate room for the full study. Scoring Procedures The FCAT fourpoint rubric proved to be an efficient scoring system that established clear categories for each score value. Qualities of answers given each score were recorded for use in scoring test items in the full study. Use of 384 PAGE 400 Appendix F (Continued) the researchercreated scoring rubric for linguistic explanations resulted in a lack of clarity within each score category. The researcher made anecdotal notes throughout the scoring process of difficulties encountered with the rubric wording. The notes were then used to produce the amended version of the scoring rubric found in Appendix B. The coding process for categorizing students problem solving processes produced meaningful categories. Informed Consent Process The process used for the pilot study to obtain parental permission was less than satisfactory. Only half of the potential participants obtained parental permission, a number that may not yield a large enough sample to detect any differences that may exist between groups. For this reason, the researcher has amended the process for obtaining parental permission for the full study. To begin, most of the parents who responded to the principals introductory letter indicated their willingness to have their child participate, and nearly all of these parents later signed the informed consent document. However, only slightly more than half of the parents responded to this letter. The researcher concluded that the difficulty was not so much in the parents lack of willingness to have their child participate as in the childrens failure to give the letter to their parents and return the letter to school. For the full study, the introductory letter will be attached to the informed consent documents, reducing the number of times students must be responsible for paperwork. The introductory letter will explain the attached informed consent 385 PAGE 401 Appendix F (Continued) and will also include a place for a parent to sign indicating their lack of willingness to have their child participate. The researcher will provide each teacher with a nonedible incentive (not to exceed 50 cents in value) for every child who returns to school either a signed informed consent document giving parental permission, or a signed introductory letter refusing parental permission. The researcher believes that these procedural changes will result in a higher percentage of participation. Results of the Data Analysis Individual student means for performance ranged from 0.42 to 3.50 on a fourpoint scale with a standard deviation of 0.80. The lowest score represents an outlier, with the next lowest score mean at 1.25. The student with the lowest score completed the test in 16 minutes, and the teacher indicated that the student had not put forth any effort on the test. As can be seen in Table 40, girls outscored boys on the measure of performance. An item analysis reveals that girls outscored boys on all questions except the birdseed problem, the one that NAEP data shows to be the most difficult. Girls also performed better on the measure of linguistic explanations, but boys and girls measures of their own performance through the selfassessment scale were nearly identical. Girls were slightly more selfefficacious, though boys selfefficacy standard deviation exceeded that of girls. Table 41 is provided as a summary of the pilot results on a peritem basis. The results indicate that students were consistently overconfident on all test 386 PAGE 402 Appendix F (Continued) 387 questions. The mean scores for selfa ssessment were higher than the mean scores for selfefficacy for ten of the twelve questions indicating that students tended to be more confident of their fi nished product than they were of their ability to correctly solve the pr oblem before they began working. Conclusion In summary, the pilot study showed the viability of the test instrument for producing the data necessary to answer t he research questions. Weaknesses revealed in the pilot study that have resu lted in procedural changes for the full study include the rubric for linguistic explanations, the process for obtaining informed consent, the use of only four faces for the selfefficacy and selfassessment scales, the use of three le vels of reading ability and mathematics ability levels, and the use of video alone to record the student interviews. The writing rubric has been amended to clarify each score point. The process for obtaining informed consent has been str eamlined and now includes an incentive for students. The selfefficacy and self assessment scales now include a fifth neutral face in the middle. Both the reading and mathematics ability grouping have been altered to include only two levels instead of three. Having three levels of ability should still allow for all cells to contain a large enough sample to allow for meaningful analysis. A full description of the revised categorization process is included in Chapter Three. PAGE 403 Appendix F (Continued) Table 40 Summary of Pilot Data by Gender Performance Linguistic Explanations SelfEfficacy SelfAssessment Gender M SD M SD M SD M SD Girls 2.67 1.21 1.21 1.16 3.46 0.79 3.45 0.87 Boys 2.33 1.26 0.73 1.09 3.35 0.86 3.47 0.86 Note. n=23. The maximum score for performance, selfefficacy, and selfassessment was 4.0. The maximum score for linguistic explanations was 3.0. 388 PAGE 404 Appendix F (Continued) 389 Table 41 Summary of Pilot Data by Item Item Performance Ling. Ex. SE SA Flower pot 3.35 1.04 3.83 3.83 Ms. Hernandez 2.87 0.74 3.61 3.78 Calories 2.30 0.48 3.59 3.18 Jeans class 2.61 0.96 3.83 3.86 Pizza 2.48 2.09 3.62 3.70 String 1.39 0.30 2.73 2.74 Balloons 2.70 0.61 3.36 3.43 Flour 2.65 0.65 3.50 3.52 Calculator 2.70 2.30 3.05 3.14 School lunch 2.78 0.83 3.77 3.82 Pencils 2.65 1.74 3.33 3.65 Birdseed 1.96 0.57 2.72 2.87 Note. n=23; the maximum score for performance, selfefficacy, and selfassessment was 4.0. The maximum score for written explanations was 3.0. SE = selfefficacy, SA = selfassessment, Ling. Ex. = linguistic explanations. PAGE 405 APPENDIX G TEACHER RATING SHEET 390 PAGE 406 TEACHER RATING SHEET Please indicate these students level of reading and mathematics ability by circling H (high ability), M (average ability), or L (low ability). Base your evaluation strictly on your professional judgment, not test scores. When evaluating reading, focus on the childs level of reading comprehension. When evaluating mathematics, focus on the childs general mathematics ability, including computation and problem solving. SCHOOL ________________________ TEACHER _____________________ STUDENT NAME READING ABILITY MATH ABILITY H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L 391 PAGE 407 APPENDIX H PARENTAL INFORMED CONSENT 392 PAGE 408 Parental Informed Consent Social and Behavioral Sciences University of South Florida Information for People Whose Children Are Being Asked to Take Part in a Research Study The following information is being presented to help you decide whether or not you want to allow your child to be a part of a minimal risk research study. Please read this carefully. If you do not understand anything, ask the person in charge of the study. Title of research study: The Impact of Gender, Reading Ability, and Mathematics Ability on Childrens Mathematical Problem Solving Person in charge of study: Patricia D. Trish Hunsader Where the study will be done: The study will be done at xxx The study will be done during the time your child is normally in school. Your child is being asked to participate because he/she is a fifthgrade student at xxx General Information about the Research Study The purpose of this research study is to find out how boys and girls differ in how they solve mathematical problems, their confidence in their ability to solve problems, their ability to assess their own work. Plan of Study To begin, every childs identity will be kept confidential throughout this study. This fall I will be coming to your childs school to administer a twelveitem math problemsolving test to all of the fifthgrade students whose parents agree to have their children participate. All twelve questions require students to show their work. Before solving each problem, your child will be asked to rate how they feel about the question by circling one of five smiley/sad faces. After they have completed their work on each question, they will rate how well they feel they did on the item by again circling one of five smiley/sad faces. The test should take about one hour to complete. Some time before the test, I will record your childs previous years FCAT scores for math and reading. If your child did not take the FCAT last year, I will record whatever standardized scores are available. Also, your childs teacher will be asked to provide an informal evaluation of your childs reading and math ability. These measures will allow me to look at the relationship between childrens math and reading ability, and their problemsolving performance on my test. 393 PAGE 409 Appendix H (C ontinued) After I have had a chance to review all the childrens work, I will be selecting 812 students to interview based on their pattern of answers on the test. The purpose of the interviews is to better understand what the children were thinking as they completed the test items. Each student will be interviewed separately in a private room, and the interview will last approximately 40 minutes. The interviews will take place during school hours at a time determined by the classroom teacher. During the interview, the student will be shown a copy of his/her original work on the test and will be asked to talk through their thinking on each test question. All interviews will be videotaped and audiotaped. The audiotape will be transcribed by a transcriptionist who will not know your childs identity. I will review the videotapes to add details about the interview that cannot be picked up by the audio recording, such as where your child is pointing on the paper, or when your child nods his/her head instead of saying yes or no. Both the video and audio tapes will be stored in a secure location. Payment for Participation You and your child will not be paid for your childs participation in this study. Benefits of Taking Part in this Research Study By taking part in this study, your child will have the benefit of practicing mathematical problem solving and explaining their solutions to problems. The results of this study will add to the knowledge available to educators about mathematical problem solving. Risks of Being a Part of this Research Study There are no known risks associated with participation in this study. Confidentiality of Your Childs Records You and your childs privacy and research records will be kept confidential to the full extent required by law. Authorized research personnel, employees of the Department of Health and Human Services, and the USF Institutional Review Board and its employees, its staff, and any other individuals acting on behalf of USF, may inspect the records from this research project. The results of this study may be published. However, the data obtained from your child will be combined with data from other children in the publication. The published results will not include your childs name or any other information that would personally identify your child in any way. Your child will be assigned a code number before the study begins, and that number, not your childs name, will appear on the testing materials that are viewed by other research personnel. I, as principal investigator, am the only person who will have access to the list that matches your childs name to his/her code number. Fictitious names will be used in any articles written about this study. All information about your child gathered from this study will be kept in my private home in a secured cabinet. The only people who will see this information are the research team and the university faculty who are giving advice on the study. 394 PAGE 410 Appendix H (C ontinued) If your child is selected for an interview, I am the only person who will see the videotape. The typed record of the interview will refer to your child by his/her code number, not his/her name. The audiotapes will also be labeled with only a code number, and will be reviewed only by me, the transciptionist, and other research personnel. Both tapes will be stored in a secure cabinet at my home. Volunteering to Take Part in this Research Study Your decision to allow your child to participate in this research study is completely voluntary. You are free to allow your child to participate in this research study or to withdraw him/her at any time. If you choose not to allow your child to participate or if you remove your child from the study, there will be no penalty and your childs school grades will not be affected in any way. Your decision to allow your child to participate (or not to participate) in this research study will in no way affect his/her status at school. If your child is selected to be interviewed, he/she will be told at the beginning of the interview that he/she can decide to stop the interview at any time and for any reason and return to the classroom. Questions and Contacts If you have any questions about this research study, contact Patricia Hunsader at 941xxxxxxxx (Home), 8139741034 (USF) 941xxxxxxxx (cell), or email me at phunsader@aol.com If you have questions about your rights as a person who is taking part in a research study, you may contact the Division of Research Compliance of the University of South Florida at (813) 9745638. Consent for Child to Take Part in this Research Study I freely give my consent to let my child take part in this study. I understand that this is research. I have received a copy of this consent form. ________________________ ________________________ ___________ Signature of Parent Printed Name of Parent Date of child taking part in study Investigator Statement: I certify that participants have been provided with an informed consent form that has been approved by the University of South Floridas Institutional Review Board and that explains the nature, demands, risks, and benefits involved in participating in this study. I further certify that a phone number has been provided in the event of additional questions. _________________________ _________________________ _____________ Signature of Investigator Printed Name of Investigator Date 395 PAGE 411 Appendix H (C ontinued) Childs Assent Statement Patricia Hunsader has explained to me this research study called The Impact of Gender, Reading Ability, and Mathematics Ability on Childrens Mathematical Problem Solving. I agree to take part in this study. ________________________ ________________________ ___________ Signature of Child Printed Name of Child Date taking part in study ________________________ ________________________ ___________ Signature of Parent Printed Name of Parent Date of child taking part in study ________________________ ________________________ ___________ Signature of person Printed Name of person Date obtaining consent obtaining consent ________________________ ________________________ ___________ Signature of Witness Printed Name of Witness Date 396 PAGE 412 397 APPENDIX I INFORMED CONSENT COVER LETTER PAGE 413 Date Dear Parent, My name is Patricia Trish Hunsader, and I am a doctoral student at the University of South Florida studying elementary mathematics education. I will be conducting my doctoral dissertation study with the fifthgrade students at XXXX XXXXXX, and XXXX Elementary schools this fall. The purpose of my study is to determine the impact of gender, reading ability, and mathematics ability on childrens mathematical problem solving. The success of this study, and my ability to draw conclusions that will help teachers in the future, depends largely on the number of children who participate. In order for your child to participate, I must obtain your signed permission on the attached Informed Consent document. Two copies of the Informed Consent are attached so that you can sign one and return it to school, and keep one for your records. The Informed Consent contains a detailed explanation of the study, and I encourage you to read it in its entirety. As the Informed Consent explains, the study consists of a onetime twelve item problemsolving test that will take approximately one hour to complete. The test questions are similar in format to the FCAT test, and will provide your child with an opportunity to practice problem solving. A small number of students (a total of 812 from the three schools) will be asked to complete a 3040 minute followup interview to talk about their problemsolving processes. Your childs identity will be strictly protected. No individual childrens names, not even the school names, will appear in print in any published material. Also, rest assured that this study is being conducted with the full knowledge and consent of your childs teacher, the school principal, and the University of South Floridas Division of Research Compliance. If you are willing to allow your child to participate, please provide signatures on one copy of the attached Informed Consent where highlighted on pages 3 and 4, then have your child return the form to his/her teacher. The second copy of the Informed Consent is for you to keep. If you are not willing to allow your child to participate, please enter your childs name and your signature in the spaces at the bottom of this page. Warmest Regards, Patricia D. Hunsader No, I am not willing to allow my child to participate in the study. _____________________________ ________________________________ Childs name Parent/Guardian signature 398 PAGE 414 APPENDIX J PROBLEMSOLVING TEST INSTRUMENT 399 PAGE 415 DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 400 Appendix J (continued) Name ___________________ Teacher _________________ School __________________ I am a girl _____ boy _____ PAGE 416 DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 401 Appendix J (continued) Missy and Bob wanted to compare thei r baseball card collections. Missy had 42 baseball cards, and Bob had 31 cards. How many more cards did Missy have in her collection? Show your wo rk, then explain how you know your answer is correct. Work space: Answer: ________ Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. PAGE 417 DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 402 Appendix J (continued) The picture shows the flower pots in wh ich Kevin will plant seeds. He needs 3 seeds for each pot. How many seeds will Kevin need for all of the pots? Show your work, then explain how you know your answer is correct. Work space: Number of seeds needed ________ Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Flowerpot Problem PAGE 418 DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 403 Appendix J (continued) Ms. Hernandez formed teams of 8 student s each from the 34 students in her class. She formed as many teams as possible, and the students left over were substitutes. How many students were substitutes? Show your work, then explain how you know your answer is correct. Work space: Answer: _________________ Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Ms. Hernandez Problem PAGE 419 DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 404 Appendix J (continued) Which two of the items above would provid e a total of about 600 calories? Show your work, then explain how you know your answer is correct. Work space: Answer: ___________________ and ___________________ Cheeseburger Hot Dog Yogurt Cookie 393 Calories 298 Calories 214 Calories 119 Calories Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Calories Problem PAGE 420 DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 405 Appendix J (continued) In Jeans class there are twice as many boys as girls. If there are 10 girls in the class, how many boys and girls are there in the class? Show your work, then explain how you know your answer is correct. Work space: Answer: ________ Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Jeans Class Problem PAGE 421 DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 406 Appendix J (continued) Think carefully about the following questi on. Write a complete answer. You may use drawings, words, and numbers to explain your answer. Be sure to show all of your work. Jose ate of a pizza. Ella ate of another pizza. Jose said that he ate more pizza than Ella but Ella said they both ate the same amount. Show how Jose could be righ t, then explain how you know your answer is correct. Work space: Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Pizza Problem PAGE 422 DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 407 Appendix J (continued) Jim has 3/4 of a yard of string which he wishes to divide into pieces, each 1/8 of a yard long. How many pieces will he hav e? Show your work, then explain how you know your answer is correct. Work space: Answer: __________________ Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. String Problem PAGE 423 DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 408 Appendix J (continued) To solve this problem you must ESTIMATE Do NOT find the exact answer. This arch of balloons will be placed at the finish line for a local WalkAThon. The section marked on the picture has about 32 balloons. Based on this information, ESTIMATE the total number of balloons in the arch. Show your work, then explain how you know your estimate is correct. Work space: Estimate _______________ balloons Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Balloon Problem PAGE 424 DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 409 Appendix J (continued) If 1 1/3 cups of flour are needed for a batch of cookies, how many cups of flour will be needed for 3 batches? Show your work, then explain how you know your answer is correct. Work space: Answer: __________________ Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Flour Problem PAGE 425 DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 410 Appendix J (continued) Think carefully about the following questi on. Write a complete answer. You may use drawings, words, and numbers to explain your answer. Be sure to show all of your work. Laura wanted to enter the number 8375 into her calculator. By mistake, she entered the number 8275. Without clear ing the calculator, how could she correct her mistake? Show your work, then explain how you know your answer is correct. Without clearing the calculat or, how could she correct her mistake another way? Show your work, then explain how you know your answer is correct. First way: Second way: Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Calculator Problem PAGE 426 DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 411 Appendix J (continued) Sam can purchase his lunch at school. Each day he wants to have juice that costs 50¢, a sandwich that costs 90¢, and fruit that co sts 35¢. His mother has only $1.00 bills. What is the least number of $1.00 bills that his mother should give him so he will have enough money to buy lunch for 5 days? Show your work, then explain how you know your answer is correct. Work space: Answer: _________________ Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. School Lunch Problem PAGE 427 DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 412 Appendix J (continued) Together, Sara and Brendan have 20 pencils. Sara says of the pencils are hers. Brendan says 15 of the pencils belong to him. Show how they could both be right, then explain how you kn ow your answer is correct. Work space: Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Pencil Problem PAGE 428 DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 413 Appendix J (continued) A package of birdseed costs $2.58 for 2 pounds. A package of sunflower seeds costs $3.72 for 3 pounds. What is the difference in the cost per pound ? Show your work, then explain how you know your answer is correct. Work space: Answer: ________ When you are done working, circle the face that be st shows how well you think you did on this problem. Before you begin working, circle th e face that best shows how well you think you can solve this problem. Birdseed Problem PAGE 429 APPENDIX K ITEMSPECIFIC RUBRICS FOR MATHEMATICS SCORING 414 PAGE 430 Task: FLOWER POT MATH 4 points MODEL RESPONSE Explanation must include context 20 pots x 3 seeds per pot = 60 seeds (minimum info needed for 4 points) Must include mention/labeling of 20 pots & 3 seeds/pot (or # seeds per row x rows) OR Accurate counting by 3s with mention of pots and seeds. 3 points SUCCESSFUL RESPONSE Correct procedure w/o explanation OR Correct procedure w/ explanation, but computational error OR Correct procedure w/ explanation that only refers to the operations used and the numbers involved. 2 points Correct procedure w/ computational error and no explanation OR Correct computation but incorrect number of pots and no explanation OR Correct procedure with incorrect explanation OR Correct answer w/o supporting work or explanation OR Correct answer w/ flawed explanation and no work 1 point Incorrect procedure, incorrect or missing explanation, but some correct work such as correct number of pots noted. 0 points Incorrect procedure and incorrect or missing explanation. OR Incorrect answer with no work or explanation. 415 PAGE 431 Appendix K (Continued) Task: MRS. HERNANDEZ MATH 4 points MODEL RESPONSE Correct solution of 2 substitutes with reasonable, contextualized, explanation (includes reference to groups/teams and that remainder is # of subs), and work shown. Explanation can have minor flaw if all steps are covered. 3 points SUCCESSFUL RESPONSE Correct solution with work shown but no explanation. OR Correct solution with explanation that lacks context. 2 points 34=4R2, no explanation, answer given as 4 R2. OR Answer given as 4 groups or 8 teams of 4 failure to answer the question re substitutes. OR 34=4 no mention or show of remainder, but correct answer of 2 on blank OR Correct solution w/ seriously flawed work or no work. 1 point Only partial grouping by 8s. OR 34=4 kids were subs OR Incorrect division, but some effort to answer question OR Some correct work 0 points Incorrect computation, no explanation and no correct work 416 PAGE 432 Appendix K (Continued) Task: CALORIES MATH 4 points MODEL RESPONSE Correct answer of cheeseburger and yogurt with work shown using either exact or rounded numbers. Includes contextualized explanation that refers to estimation, rounding, or close and target sum 3 points SUCCESSFUL RESPONSE Added 393+214=607, correct answer of item names, but explanation lacks mention of rounding or getting close. OR 400+200=600, answer correct, but explanation missing OR Added many combinations, answer correct, but explanation incomplete or missing OR Answer reported as 393 and 214 w/o explanation 2 points Correct answer, missing or flawed explanation, and some error(s) in work OR Added pairs of items but said no answer missed concept of about OR Correct answer unsupported by work OR Correct work but failed to answer question OR Incorrect answer but logical explanation 1 point Doubled the calories of items & chose 2 hotdogs OR Added one or more combinations, but no answer given OR Chose 3 items totaling close to 600 calories OR Added 2 items wrong items not close enough no explanation or explanation w/o work 0 points No work OR Gives more than 2 items as answer with incorrect total OR List two incorrect items w/ no work or no correct work 417 PAGE 433 Appendix K (Continued) Task: JEANS CLASS MATH 4 points MODEL RESPONSE Correct answer of either 30 (or 20 boys and 10 girls) with contextualized explanation that clearly shows how answer obtained. Explanation must refer to double or twice or multiplying by 2. 3 points SUCCESSFUL RESPONSE Correct solution with correct work but no explanation or explanation w/o context OR Minor flaw in solution process but explanation clear and correct 2 points Answers # of boys w/ some explanation or supporting work show understanding or relationship between # of boys and # of girls, but ignores question of total OR Correct answer with no supporting work or explanation 1 point Answered 20 (# of boys) with no explanation or supporting work OR Answered incorrect # of boys, but some work or explanation correct 0 points No response or completely incorrect response 418 PAGE 434 Appendix K (Continued) Task: PIZZA MATH 4 points MODEL RESPONSE Show of 2 separate pizzas (one big and one small) says Joses is bigger so his is bigger OR Explains that if Joses pizza is bigger the half he ate would be bigger than Ellas half Response must include in text or illustrated in picture 3 points SUCCESSFUL RESPONSE Explanation w/o picture and no mention of halves simply that Joses pizza could have been bigger OR Picture that shows Joses as bigger, but explanation focuses on number of pieces in logical way OR Shows of 2 pizzas w/ Joses bigger w/o explanation OR Correct response and explanation, except picture does not show Joses pizza bigger OR No conclusion but valid argument not completely supported 2 points Concludes they ate the same with either 2 halves of same pizza or of 2 samesize pizzas OR Erroneous conclusion but with some valid support OR Shows of 2 pizzas w/ Joses larger, but flawed explanation OR Concludes that Ella could have had of a half w/ supporting diagram 1 point Shows 2 halves of same pizza w/ labeling but no explanation OR Shows of 2 pizzas w/ missing or flawed explanation. Joses pizza not clearly bigger OR Ella ate of a w/o supporting diagram OR Says Jose right if pizza cut wrong, but picture doesnt support argument OR Jose ate more than half OR They ate the same w/o any support 0 points No valid work or explanation 419 PAGE 435 Appendix K (Continued) Task: STRING MATH 4 points MODEL RESPONSE Divided 3/4 by 1/8 w/ correct answer of 6 and explanation w/ context OR Showed 3/4 as = to 6/8 w/ correct answer of 6 and explanation w/ context 3 points SUCCESSFUL REESPONSE Correct answer with work shown but no explanation (may have minor flaw in work) OR Correct answer with explanation but no work shown (may have minor flaw in explanation) 2 points Correct answer w/ no work or explanation OR Correct process with computation error and insufficient/flawed explanation OR Correct answer with flawed work and flawed explanation OR Converts 3/4 to 6/8 but reports answer as 6/8 or 2/8 (doesnt answer question) OR Finds = 27 inches and divides that by 8 1 point Response shows student understands problem but unable to do computation OR Correct answer, but work and/or explanation doesnt support answer (lucky guess) OR Picture or work showing 6 out of 8 in some fashion w/ 3 out of 4, but nothing else makes sense OR Guess that is close (57) w/o any work shows some understanding of the problem OR Student knows division or repeated subtraction is necessary, but doesnt know what to divide or how to do it OR Shows cutting a length (string) into pieces 1/8 long 0 points Work doesnt indicate student understanding of problem or the work needed to solve 420 PAGE 436 Appendix K (Continued) Task: BALLOONS MATH 4 points MODEL RESPONSE Estimate is in range (150 280) based on 58 sections of 3035 balloons. Work is shown and explanation is complete and in context (refers to sections and # of balloons) 3 points SUCCESSFUL RESPONSE As above with work shown (# of sections and number in each section are included in computation, but no labeling is required) but w/o explanation. Minor computational error okay. 2 points Estimate in range without indication of valid strategy for obtaining estimate OR Marked off sections, valid strategy, but major computational error OR Valid strategy but erroneous conclusion OR Estimate out of range because counted either 4 or 9 sections 1 point Estimate out of range <150 or >280 without valid strategy or explanation OR Shows evidence of attempt to estimate with a result > 32 0 points Result less than or equal to 32. No valid strategy or valid explanation. 421 PAGE 437 Appendix K (Continued) Task: FLOUR MATH 4 points MODEL RESPONSE Correct solution of 4 cups with work shown (either multiplication, addition, or pictorial) with explanation that gives reference to context of batches and cups 3 points SUCCESSFUL RESPONSE Correct solution with some work but no explanation OR Correct solution with no work but with explanation OR Answered 3 3/3 with correct work and explanation 2 points Shows understanding of strategy, but error in fraction work (e.g. 3 3/9 for answer) OR Correct answer without valid work or explanation 1 point Shows some understanding of question, but erroneous strategy for solving (e.g. 1 1/3 + 1 1/3 + 1 1/3 = 1 3/9) with error in both whole number and fraction 0 points Work shows little or no understanding of problem or process needed to solve 422 PAGE 438 Appendix K (Continued) Task: CALCULATOR MATH 4 points MODEL RESPONSE 2 correct responses clearly explained in drawings, words, or numbers that refer to both what to do (e.g. add 100) and the result of 8375. OR 1 response like above, and one complete response w/o mention of result 3 points SUCCESSFUL RESPONSE 2 correct responses that do not explain and/or do not give result OR 1 correct, explained response, one with minor flaws or only partially valid 2 points 1 correct response, 2 nd response missing or invalid OR 2 partially valid responses (a response that says to add 100 to the other number or add 100 to the answer at the end is partially valid) 1 point Both responses show some understanding of question, but lack place value understanding (e.g., add 10, or add 20 then subtract 10, or add 1000, or add 500 + 500) OR Shows some understanding, but methods dont get you the correct readout OR 1 partially valid response 0 points No valid response or a response that does not indicate understanding of question OR Response such as times it by a number OR Silly response such as do it on paper 423 PAGE 439 Appendix K (Continued) Task: SCHOOL LUNCH MATH 4 points MODEL RESPONSE Correct answer of 9 with valid work shown and explanation or labeling of all work with mention of rounding and some context 3 points SUCCESSFUL RESPONSE Correct answer of 9 and valid work with no explanation OR Correct answer with labeling, but missing steps OR Minor flaw in computation but work clear and answer rounded up to nearest dollar OR All correct, but rounded single day to $2, then multiplied by 5 days 2 points All computation correct, but failed to round up to nearest dollar and did not explain OR Correct answer for one day, rounded up to the nearest dollar OR Little work, mostly correct, rounded up to nearest dollar 1 point Shows some understanding of question but work seriously flawed OR Found total for one day but did not round up 0 points Little or no correct work. Doesnt show understanding of problem situation. 424 PAGE 440 Appendix K (Continued) Task: PENCILS MATH 4 points MODEL RESPONSE Shows and explains Sarahs is 5 pencils, added to Brendans 15 totals 20 pencils 3 points SUCCESSFUL RESPONSE Shows Sarah has 5 and Brendan 15, but does not show or explain that 5 pencils = 1/4 of 20 pencils. Does show 5+15 = 20 OR Accurate computation without explanation or context. Shows of 20 = 5 and 5+15=20 OR Accurate computation of 20=5, of 20=15, but no context OR Sarahs of 20=5 & Brendans is 15 doesnt total 2 points Sarah has 5, Brendan has 15. Does not relate 5 to or total of 20 OR Does relate Sarahs to 5 but stops there 1 point Some understanding shown 0 points No correct work OR Only restates info in problem 425 PAGE 441 Appendix K (Continued) 426 Task: BIRDSEED MATH 4 points MODEL RESPONSE Shows $2.58 2 and $3.92 3, finds difference of 5 cents, and explains process in context of price per pound 3 points SUCCESSFUL RESPONSE Correctly shows division but answers 5 without context OR All work correct and shown but includes a minor computational error OR Shows understanding of finding price per pound and gives explanation but has more than one computational error OR All work correct without explanation 2 points Subtracts 3.722.58 correctly and provides some contextualized explanation OR Finds price per pound on one or two items, but stops there OR Shows some attempt at finding the price per pound 1 point Student subtracts total prices and explains (or label s) some, but has computational error(s) OR Student subtracts 3.72 2.58 correctly, but no correct explanation or labeling 0 points No correct work PAGE 442 APPENDIX L ITEMSPECIFIC RUBRICS FOR LINGUISTIC EXPLANATION SCORING 427 PAGE 443 Task: FLOWER POT WRITING 3 points Explains computation (either multiplication or repeated addition, or counting by threes), referring to 20 pots and 3 seeds per pot 2 points Explains computation and refers to either number of pots or # of seeds, but not both OR Labels seeds and pots but doesnt completely explain computation 1 point Refers to operation but not to context OR Labels # of seeds and # of pots OR I know my answer is correct because 20x3=60 OR Good explanation that does not support the work 0 points No text OR Restates answer but does not explain process (e.g., Kevin needs 60 seeds) as only text OR Completely incorrect or irrelevant explanation 428 PAGE 444 Appendix L (Continued) Task: MRS. HERNANDEZ WRITING 3 points Explain division with mention of groups/teams and relates remainder to the number of substitutes OR Explain multiplication and subtraction May have minor flaw 2 points Explain division and remainder w/ moderate flaws OR Explains computation process OR Tells number of teams and # left over OR States that the remainder was the number of substitutes 1 point Shows each team being added to total 32. No words, but team numbers labeled. OR Tells that divided 8 into 34 stops there OR States operation used 0 points No text OR Restates answer as only text; e.g., There will be 2 substitutes OR Completely incorrect explanation 429 PAGE 445 Appendix L (Continued) Task: CALORIES WRITING 3 points Explanation of using addition and finding a sum or total that is about or close to 600 calories Must include context 2 points Mention of adding and about or close, but no reference to target sum OR Good explanation of process w/o context 1 point Wrote out some possible item combinations OR States that must add OR States to guess and check OR Tells what two items were added for answer OR States rounding 0 points No text OR Completely incorrect explanation that in no way supports the work OR Simply restates answer 430 PAGE 446 Appendix L (Continued) Task: JEANS CLASS WRITING 3 points Text that addresses twice or double # of girls to get # of boys and adding them together must include boys and girls in explanation 2 points Addresses twice as many boys, omits addressing adding the girls or including the girls OR Complete except omits explanation of how got # of boys OR Complete explanation of computation process (no context) 1 point States boys 10 girls = 30 OR States the operation used OR Labeled boys and girls with words or abbreviations 0 points No text OR Simply restates answer OR Text that in no way supports the work 431 PAGE 447 Appendix L (Continued) Task: PIZZA WRITING 3 points Mentions Joses pizza bigger and that his half would be larger that Ellas half OR Joses pizza bigger, labels halves and both pizzas Must include in words or symbols and must draw conclusion 2 points Mentions Joses pizza bigger or Ellas pizza smaller OR Says that Ella is right, they ate the same amount OR Explains that Ella could have eaten of OR Good explanation but no conclusion OR Jose right if e.g. pizza cut wrong 1 point Labeled pizzas as Joses and Ellas OR Explains, but text is mostly contradictory with itself or picture OR Conclusion w/o explanation OR Poor explanation w/ no conclusion 0 points No text OR Text that in no way supports the work or picture 432 PAGE 448 Appendix L (Continued) Task: STRING WRITING 3 points Explained division and the number of pieces of string OR Explained 3/4 = 6/8 equivalency and the number of pieces of string 2 points Explains equivalency of 3/4 = 6/8, but omits context 1 point Attempted explanation, but seriously flawed OR Labels only 0 points No text OR No correct, relevant text OR Restates answer in words 433 PAGE 449 Appendix L (Continued) Task: BALLOONS WRITING 3 points Explains marking off a number of sections of balloons and the number of balloons in each section OR Explains # of groups and repeated addition 2 points Explanation includes either marking off sections of balloons or the number of balloons in each section, but not both OR Explanation includes the whole process with no context 1 point Labeled sections and balloons in each OR e.g. multiply 32 by 7 OR About 0 points No text or labeling OR No text that supports the work OR Restates answer in words 434 PAGE 450 Appendix L (Continued) Task: FLOUR WRITING 3 points Explains multiplication (or addition) with reference to the number of cups per batch and the number of batches 2 points Explains the whole computation process w/o context 1 point Explains only part of the computation process (e.g. 3 x 1/3 = 1) OR States the operation used OR Text attempts to support work that makes no sense OR Labels only 0 points No text OR No text that supports the work OR Labels or restates answer of needing 4 cups 435 PAGE 451 Appendix L (Continued) Task: CALCULATOR WRITING 3 points Both responses explain what to do and show what the result will be OR One response shows what to do and result while second response explains what to do 2 points Both responses explain what to do, but omit reference to the result (obtaining 8375) OR One response explains what to do and the result, but the other is not complete or doesnt make sense 1 point Tells to add OR Tells briefly what to do for one response OR Tells one response, but doesnt make complete sense OR Two responses that are vaguely worded 0 points No text OR No text that supports the work 436 PAGE 452 Appendix L (Continued) Task: SCHOOL LUNCH WRITING 3 points Explains process of solving the problem and includes the context of the items purchased, the number of days, and the number of bills needed OR Explains how got total for week, rounding, and reference to dollar bills 2 points Explains most of the process of solving the problem, but omits most or all of the context 1 point Explains rounding the final answer OR Labels food items OR Tells what operations were used 0 points No text OR No text that supports the work OR Simply restates answer of needing 9 dollars 437 PAGE 453 Appendix L (Continued) Task: PENCILS WRITING 3 points Problem solving process is explained that relates to 5 pencils and either mentions that Sarahs and Brendans pencils total 20 or that 15 would be left from Brendan 2 points Explains context or computation, but not both OR Says e.g. that Sarah has 5 pencils and Brendan has 15 pencils OR Good explanation, but error in understanding fractions 1 point Labels only OR Restates part of the problem OR Explains only one point OR Some correct explanation, but contradictory statements included 0 points No text OR No correct relevant text 438 PAGE 454 Appendix L (Continued) 439 Task: BIRDSEED WRITING 3 points Explains process of dividing to obtain cost per pound and subtracting to find the difference 2 points Explains either the process of dividing to obtain cost per pound OR subtracting to find the difference with some context included OR Explains all of the computation, but omits the context OR Labels cost per pound 1 point Labels only OR Minimal explanation that doesnt contribute much OR Restates some of the information given in the problem 0 points No text OR No correct relevant text OR Simply states answer PAGE 455 440 APPENDIX M SOLUTION PROCESSES IDENTIFIED FOR EACH QUESTION PAGE 456 FLOWER POTS Process Category Theme Freq. Percent Description 0 U 17 0 0.00 No meaningful process shown 1 S 1 164 78.48 Multiplied 20 x 3 2 S 1 22 9.28 Multiplied 5 x 4 then 20 x 3 3 U 13 6 2.53 Divided 20 by 3 4 S 4 17 7.17 Counted by 3s minor computational error okay 5 S 2 3 1.27 15 seeds per column and 4 columns either repeated addition or multiplication 6 S 2 6 2.53 12 seeds per row and 5 rows either repeated addition or multiplication 7 P 7 3 1.27 5 x 4 = 20, stopped there 8 P 9 4 1.69 Multiplied incorrect numbers (miscounted pots) 441 9 S 2 1 0.42 Grouped pots by 3, nine seeds per group, 6 groups + 6 seeds 10 S 2 1 0.42 Combined 20 3s to get 10 6s, then 5 12s, then 2 24s + 12 12 S 2 1 0.42 Grouped pots by 10 = 30 seeds, 2 groups 14 U 14 1 0.42 Added 20 + 3 15 S 5 3 1.27 Correct answer of 60 but no work 16 P 6 2 0.84 Like #6, but incorrect number of seeds per row or incorrect # rows 17 U 17 3 1.27 Multiplied incorrect numbers major error Note. Category: U = unsuccessful, P = partially successful, S = successful. Themes are defined in Table 13. Missing process numbers indicate processes for which the frequency dropped to zero when students with missing data were removed from the study. PAGE 457 Appendix M (Continued) MS. HERNANDEZ Process Category Theme Freq. Percent Description 0 U 17 2 0.84 No meaningful process shown 1 S 1 128 54.01 34 divided by 8 with final answer of 2 2 S 4 10 4.22 Repeated addition to get close to 34, or counted by 8s with final answer of two 3 S 3 10 4.22 Solved with a picture using groups of 8 with final answer of 2 4 S 3 3 1.27 Solved with a picture using 8 groups and final answer of 2 5 U 8 3 1.27 Multiplied 34 x 8 6 S 4 32 13.50 Multiplied 8s to get close to 34 and final answer of 2 7 P 7 7 2.95 Subtracted, 34 8 = 26 8 S 4 4 1.69 Repeated subtraction of 8s and final answer of 2 442 9 P 4 2 0.84 Calculated using wrong numbers but good logic 10 U 15 2 0.84 Unsuccessful attempt at solving through a picture 11 U 17 1 0.42 Divided 34 by 4 12 S 5 2 0.84 Correct answer, no work 13 P 7 21 8.86 Like #1, but either did not answer/misunderstood question 14 P 7 1 0.42 Like #3, but either did not answer/misunderstood question 15 P 7 2 0.84 Like #6, but either did not answer/misunderstood question 16 P 7 1 0.42 Like #8, but either did not answer/misunderstood question 17 P 6 6 2.53 Like #6, but mathematical work seriously flawed PAGE 458 Appendix M (Continued) CALORIES Process Category Theme Freq. Percent Description 0 U 17 1 0.42 No meaningful process shown 1 S 1 128 54.01 Added multiple pairs of exact numbers correctly correct answer minor flaws okay 2 S 1 3 1.27 Added multiple pairs of rounded numbers correctly correct answer minor flaws okay 3 S 1 30 12.66 Added only one pair the correct one with exact numbers 4 S 1 22 9.28 Added only one pair the correct one with rounded numbers 5 U 13 4 1.69 Added only one pair the wrong one with exact numbers 7 U 13 3 1.27 Tried to find multiples of one item or combinations of 34 items to total 600 443 8 S 1 2 0.84 Added pairs of exact numbers and rounded numbers with correct answer 9 S 5 2 0.84 Stated a strategy such as estimate or round, no work, correct answer 11 U 17 1 0.42 Explanation implies strategy, no work, answer incorrect 12 U 17 1 0.42 Like #9, but incorrect answer 13 U 17 17 7.17 Like #1, but incorrect answer and major flaws in computation and logic 16 P 6 2 0.84 Like #6, but with good logic, computational error 17 P 7 11 4.64 Like #1, good logic but incorrect answer, error in understanding question 18 P 6 10 4.22 Like #1, good logic but incorrect answer, computational error PAGE 459 Appendix M (Continued) JEANS CLASS Process Category Theme Freq. Percent Description 0 U 17 5 2.11 No meaningful process shown 1 S 1 81 34.18 10 x 2 = 20 + 10 = 30 2 P 7 71 29.96 10 x 2 = 20 (boys or kids) may have minor computational error 3 S 1 15 6.33 10 + 10 + 10 = 30 (or 10 + 10 = 20 + 10 = 30) 4 P 7 3 1.27 20 boys with no work shown 5 S 1 25 10.55 20 boys + 10 girls = 30 kids 6 P 8 5 2.11 30 boys or 12 boys misunderstands twice (thinks it means x 3 or +2) 7 P 7 14 5.91 10 + 10 = 20 boys 444 8 S 2 4 1.69 10 x 3 = 30 kids 9 U 13 1 0.42 Doubled girls and boys for 20 + 20 = 40 10 P 7 3 1.27 Misunderstood problem either read twice as many girls as boys or that there are 10 boys and girls in the class 11 S 1 8 3.38 10 x 2 = 20 or 10 + 10 = 20 with answer 20 boys, 10 girls 12 S 5 1 0.42 20 boys, 10 girls, no work 13 S 5 1 0.42 30, no work PAGE 460 Appendix M (Continued) PIZZA Process Category Theme Freq. Percent Description 0 U 17 10 4.22 No meaningful process shown 1 S 1 99 41.77 Joses pizza bigger, or Ellas pizza smaller 2 P 8 39 16.46 Both ate the same amount explanation logical 3 S 2 4 1.69 Jose right because he had more toppings on his pizza 4 S 2 9 3.80 Jose right because he ate more pieces (bigger pizzas have more pieces) 5 P 8 8 3.38 Jose ate more because the slices were bigger (not clear that the pizza was bigger) 6 U 14 4 1.69 Cutting different configuration changes area (e.g. horiz vs. vert or cutting one half into more pieces than another identical half) 445 7 U 16 21 8.86 Situation not as it appears (somebody cheated, ate another bite, didnt finish) 8 P 10 7 2.95 Ella ate of 9 P 10 11 4.64 Pizza cut wrong (not really in half) 10 P 11 7 2.95 and no stated conclusion 11 P 7 3 1.27 Pizzas different sizes, no conclusion 12 P 7 3 1.27 Concluded Jose right because they ate difference pizzas (no mention of size) 14 U 14 1 0.42 Pizzas different shapes (e.g. oval vs. round) 16 U 13 1 0.42 Both are right 17 U 14 2 0.84 Like #2 but explanation not logical 18 U 17 8 3.38 Explanation so unclear or contradictory, impossible to determine process PAGE 461 Appendix M (Continued) STRING Process Category Theme Freq. Percent Description 0 U 17 52 21.94 No meaningful process shown 1 U 15 18 7.59 Drew 3/4 of something, no other valid work 2 U 14 6 2.53 Added 3/4 + 1/8 3 P 11 10 4.22 Subtracted 3/4 1/8 and stopped 4 U 15 19 8.02 Drew something and divided it into 8 parts or eighths incorrect answer 5 S 1 13 5.49 Found 3/4 = 6/8 via computation answer correct 6 S 1 2 0.84 Did 3/4 divided by 1/8 and answer correct 8 P 9 26 10.97 Drew 3/4 of something and 1/8 of something incorrect answer 9 S 2 4 1.69 Converted problem to inches and feet and answer correct 446 10 U 17 1 0.42 Added + 11 U 14 16 6.75 Did computation using the numerals in the fractions w/o fractional meaning 12 U 17 1 0.42 Added various fractions (not 3/4 and 1/8) 13 U 17 1 0.42 Subtracted various fractions 16 U 14 1 0.42 1 foot divided by 1/8 yard 18 U 17 2 0.84 3/8 x 1/8 19 S 5 1 0.42 Correct answer, no work 20 S 3 24 10.13 Like #4 or #5, but solved via drawing/diagram 21 U 12 3 1.27 Like #14 but answer incorrect major flaw 22 U 12 10 4.22 Like #7 but answer incorrect major flaw 23 U 12 5 2.11 Incorrect due to conceptual/computational problems with fractions 24 P 7 1 0.42 Misunderstood question, but some valid work PAGE 462 Appendix M (Continued) STRING (Continued) Process Category Theme Freq. Percent Description 25 P 6 11 4.64 Like #6 but answer incorrect moderate flaw 26 U 12 3 1.27 Like #15 but answer incorrect major flaw 27 U 12 1 0.42 Like #9 but answer incorrect major flaw 28 U 15 1 0.42 Like #17, but attempted to solve through drawing/diagram 29 P 7 4 1.69 Like #5, but did not answer question (answered 6/8) 30 P 7 1 0.42 Like #20, but did not answer question (answered 6/8) 447 PAGE 463 Appendix M (Continued) BALLOONS Process Category Theme Freq. Percent Description 1 S 1 105 44.30 Multiplied 32 balloons by 58 sections minor computational error okay 2 S 1 34 14.35 Rounded to 30 balloons and multiplied by 58 sections minor computational error okay 3 S 4 18 7.59 Repeated addition of 32, 58 times 4 U 13 9 3.80 Answer of 32 or less without work 5 S 1 28 11.81 Multiplied or added using 32 balloon per section, rounded final answer within range (150280) 6 S 1 6 2.53 Added some sections mentally and did the rest on paper answer within range (150280) 448 7 S 2 5 2.11 Grouped sections to aid in addition answer within range (150280) 8 P 8 5 2.11 Added or multiplied 24 sections of 32 balloons 9 S 5 7 2.95 Reasonable estimate with no work shown 10 P 8 1 0.42 Rounded to 30 and multiplied or added for 810 sections 11 P 8 7 2.95 Added or multiplied by 32 with 810 sections 12 U 13 1 0.42 Manually counted the balloons 13 P 8 2 0.84 Doubled 32 for each of 58 sections to get very large answer 14 U 17 1 0.42 Multiplied numbers not in the problem 15 U 17 5 2.11 Unreasonable estimate not enough work shown to follow process 16 U 14 1 0.42 Divided 32 by number of sections 17 S 1 1 0.42 Same as #2 but rounded final answer answer in range 18 S 4 1 0.42 Combination of repeated addition and estimation final answer in range PAGE 464 Appendix M (Continued) FLOUR Process Category Theme Freq. Percent Description 0 U 17 8 3.38 No meaningful process shown 1 S 1 23 9.70 1 1/3 x 3 correct answer 2 S 1 53 22.36 1 1/3 + 1 1/3 + 1 1/3 correct answer 3 S 3 6 2.53 Solved strictly with a picture/diagram correct answer 4 P 6 20 8.44 1 1/3 x 3 but computational problems (e.g. answered 3 3/9) 5 P 6 4 1.69 Converted fractions to decimals (incorrectly) then did computation 6 S 2 5 2.11 Converted fractions to decimals (correctly) then did computation 7 P 6 73 30.80 1 1/3 + 1 1/3 + 1 1/3 with computational problems 449 8 P 6 8 3.38 Only computed the fractions (either answered 1 cup with fraction or ignored 1 cup) 9 U 12 5 2.11 Added 3 + 1 1/3 10 U 17 5 2.11 Computed with numbers not related to the problem 11 U 17 1 0.42 Divided numbers in problem 12 P 6 12 5.06 Only computed the whole numbers (e.g. answered 3 1/3) 13 S 5 3 1.27 Correct answer without work 14 P 6 4 1.69 No work shown, computational problems (e.g. answered 3 3/9) 15 S 5 1 0.42 Correct answer with logic explained but no work 16 P 9 4 1.69 Like #3 but only partially correct 17 U 14 2 0.84 Like #3 but incorrect PAGE 465 Appendix M (Continued) CALCULATOR (Frequencies and Percents represent combined totals for methods one and two) Process Category Theme Freq. Percent Description 0 U 17 92 19.41 No meaningful process shown 1 S 1 138 29.11 Add 100 2 S 1 32 6.75 Add a number and subtract a second number that is 100 less than the number added 3 S 1 47 9.92 Add a combination of numbers that totals 100 4 S 2 12 2.53 Erase back to the 3 and replace with a 2 5 U 12 6 1.27 Add a number that does not give correct result 6 P 8 31 6.54 Partially valid response one that works only with some operations (e.g., add 100 to the answer, add 100 to the other number) 7 U 16 64 13.50 Nonsense response (e.g., do it on paper, get another calculator, ask your parents) 450 8 S 1 8 1.69 Add a x b which equals 100 9 U 12 12 2.53 Subtract a number that does not give correct result 11 S 2 6 1.27 Compute to get zero and add correct number (e.g., multiply by 0, subtract 8375) 12 P 8 12 2.53 Response shows good logic but place value misunderstanding (e.g., add 1000, or add 1) 13 U 12 1 0.21 Combination of + and that does not give correct result 14 U 14 3 0.63 Operational misunderstanding (e.g., multiply by 100) 15 S 2 3 0.63 Hit = and start over 16 U 14 2 0.42 Enter correct number without clearing the incorrect number 17 U 17 4 0.84 Explanation so unclear impossible to determine 18 P 11 1 0.21 Good logic but does not address problem When student repeats a response, the second one is categorized as . PAGE 466 Appendix M (Continued) SCHOOL LUNCH Process Category Theme Freq. Percent Description 0 U 17 10 4.22 No meaningful process shown 1 S 1 86 36.29 $9 answer found by adding 3 items x 5 then rounding (minor computational error okay) 2 P 7 24 10.13 $8.75 answer found by adding 3 items x 5 (minor comp error okay) 3 P 7 17 7.17 $1.75 answer total for one day without rounding (minor computational error okay) 4 P 7 21 8.86 $2.00 answer rounded total for one day 5 S 1 23 9.70 $10.00 answer found by rounding total for one day x 5 (minor computational error okay) 6 P 7 1 0.42 Found cost of 1 item for 5 days 451 7 U 13 3 1.27 Added 3 items and $1.00 8 P 6 4 1.69 Like #1 with major computational error 9 P 6 2 0.84 Like #2 with major computational error 10 S 1 11 4.64 Multiplied each item by 5, added, and rounded (minor comp error ok) 11 P 7 3 1.27 Added each item five times without final rounding 12 P 7 3 1.27 Added 3 items then added $1.75 five times without final rounding 13 P 7 2 0.84 Multiplied each item by 5, added, but did not round 14 U 13 3 1.27 Interpreted question as what can Sam buy for $1.00 15 P 7 6 2.53 Worked problem with only 2 items or for less than five days 16 P 6 1 0.42 Rounded total for one day x 5 but with major comp error 17 S 4 9 3.80 Like #12, but rounded final answer 18 S 4 3 1.27 Like #11, but rounded final answer 21 P 6 2 0.84 Good logic, incorrect work 22 P 7 3 1.27 Some correct work, but misunderstood question PAGE 467 Appendix M (Continued) PENCILS Process Category Theme Freq. Percent Description 0 U 17 39 16.46 No meaningful process shown 1 S 1 76 32.70 of 20= 5 (stated or implied) and one of the following: 5+15=20, 2015=5, 15 left 2 S 3 13 5.49 Solved with picture only showing 20 pencils, Brendan with 15 and Sara with 5 3 P 7 25 10.55 Brendan 15 and Sara 5, or Sara 5 and 20 together 4 S 2 17 7.17 Brendan has (15) and Sara has (5) 5 P 7 11 4.64 2015=5 or 15+5 =20 6 P 7 2 0.84 Finds of 20 = 5 and stops 7 P 6 10 4.22 Incorrect due to minor fraction computational/conceptual problems (e.g. of 20 = 4 with balance of work correct) 452 8 U 13 19 8.02 Incorrect due to misunderstanding of problem 9 S 1 9 3.80 20=5 or 5x4=20, then 5+15=20 10 U 15 6 2.53 Solved with picture only, but picture incorrect or not productive 11 U 12 10 4.22 Like #7, but major problem PAGE 468 Appendix M (Continued) BIRDSEED Process Category Theme Freq. Percent Description 0 U 17 21 8.86 No meaningful process shown 1 S 1 62 26.16 Found price per lb. and difference (minor computational error okay) 2 P 11 102 43.04 Subtracted 3.72 2.58 3 P 7 4 1.69 Found price per pound for one item only 4 U 13 5 2.11 Subtracted 3 pounds 2 pounds = 1 pound 5 U 13 4 1.69 Added 2.58 + 3.72 6 P 6 5 2.11 Erroneous method for changing 3 pounds to 1 or 2 pounds (e.g., subtracting a dollar from the price for each pound less) 7 P 10 3 1.27 Found price per lb for birdseed, multiplied by 3, and found difference btwn price for 3 lbs of birdseed and 3 lbs of sunflower seeds) 8 P 11 2 0.84 Rounded prices then subtracted 9 U 13 3 1.27 Multiplied pounds by price (e.g., 2.58x2 and/or 3.72x3) 10 P 8 1 0.42 Finds price per pound for two items and adds them together 11 U 17 6 2.53 Did computation with numbers not in problem 12 U 14 4 1.69 Subtracted 2.583.72 13 S 4 2 0.84 Found price per pound by trial and error addition then subtracted minor computational error okay 14 S 5 1 0.42 Found price per pound without written work then subtracted 15 U 17 3 1.27 Added 2.58+1.14 or subtracted 2.581.14 16 P 6 1 0.42 Attempts to compute price per pound but cannot do computation 17 P 7 4 1.69 Found price per pound, but remainder of work not meaningful 18 P 8 1 0.42 Computed 3.722.58 19 P 8 1 0.42 Price per pound minus 1.14 20 P 12 2 0.84 Like #13 but with major error 453 PAGE 469 APPENDIX N DESCRIPTIVE STATISTICS FOR MATHEMATICS PERFORMANCE BY QUESTION BY GROUP 454 PAGE 470 Descriptive Statistics for Mathematics Performance by Question by Gender Boys Girls Question M SD M SD Flowerpots 2.98 0.89 3.19 0.73 Ms. Hernandez 2.72 0.87 2.63 0.91 Calories 2.94 0.77 2.70 0.98 Jeans Class 2.74 0.87 2.61 1.09 Pizza 2.28 1.30 2.34 1.31 String 0.91 1.09 0.88 1.04 Balloons 2.84 0.88 2.99 0.88 Flour 2.18 1.01 1.87 1.01 Calculator 1.81 1.38 1.77 1.46 School Lunch 2.20 1.02 2.23 1.06 Pencils 2.09 1.38 1.61 1.56 Birdseed 1.61 1.21 1.29 1.31 Note. N=237, max. mean score = 4 points. 455 PAGE 471 Appendix N (C ontinued) Descriptive Statistics for Mathematics Performance by Question by Mathematics Ability High Middle Low Question M SD M SD M SD Flowerpots 3.35 0.69 3.03 0.78 2.88 0.99 Ms. Hernandez 3.04 0.53 2.72 0.82 2.18 1.13 Calories 3.19 0.64 2.77 0.93 2.59 0.85 Jeans Class 3.16 0.77 2.67 0.91 2.18 1.09 Pizza 3.02 1.16 2.22 1.26 1.75 1.23 String 1.77 1.32 0.67 0.83 0.51 0.67 Balloons 3.32 0.66 2.86 0.84 2.57 1.04 Flour 2.79 0.80 1.99 0.91 1.31 0.97 Calculator 2.32 1.43 1.87 1.40 1.00 1.08 School Lunch 2.84 0.59 2.24 0.93 1.45 1.19 Pencils 2.88 1.25 1.72 1.47 1.12 1.09 Birdseed 2.53 1.20 1.35 1.14 0.57 0.70 Note. N=237, max. mean score = 4 points. 456 PAGE 472 Appendix N (C ontinued) Descriptive Statistics for Mathematics Performance by Question by Reading Ability High Middle Low Question M SD M SD M SD Flowerpots 3.19 0.70 3.02 0.90 3.03 0.76 Ms. Hernandez 2.99 0.54 2.62 0.93 2.20 1.10 Calories 3.05 0.76 2.78 0.91 2.50 0.86 Jeans Class 3.05 0.81 2.58 0.96 2.23 1.14 Pizza 2.90 1.22 2.16 1.22 1.50 1.28 String 1.51 1.28 0.63 0.81 0.60 0.89 Balloons 3.19 0.74 2.85 0.89 2.47 0.97 Flour 2.51 0.88 1.96 0.97 1.23 1.01 Calculator 2.25 1.49 1.73 1.35 0.93 1.05 School Lunch 2.78 0.65 2.10 0.98 1.33 1.23 Pencils 2.68 1.35 1.63 1.40 0.97 1.22 Birdseed 2.33 1.31 1.16 1.06 0.73 0.83 Note. N=237, max. mean score = 4 points. 457 PAGE 473 458 APPENDIX O DESCRIPTIVE STATISTICS FOR LINGUISTIC EXPLANATIONS BY QUESTION BY GROUP PAGE 474 Descriptive Statistics for Linguistic Explanations by Question by Gender Boys Girls Question M SD M SD Flowerpots 1.30 1.06 1.44 1.18 Ms. Hernandez 1.25 0.94 1.36 0.98 Calories 1.30 1.00 1.28 0.98 Jeans Class 1.27 0.97 1.38 1.04 Pizza 1.83 0.84 2.01 0.73 String 0.68 0.72 0.66 0.78 Balloons 1.15 1.05 1.28 1.19 Flour 0.88 0.85 0.95 0.92 Calculator 1.29 0.93 1.31 0.98 School Lunch 0.76 0.87 0.93 0.91 Pencils 1.40 1.00 1.20 1.05 Birdseed 0.71 0.75 0.85 0.76 Note. N=237, max. mean score = 4 points. 459 PAGE 475 Appendix O (Continued) Descriptive Statistics for Linguistic Explanations by Question by Mathematics Ability High Middle Low Question M SD M SD M SD Flowerpots 1.77 1.07 1.26 1.08 1.20 1.15 Ms. Hernandez 1.65 0.92 1.26 0.94 1.00 0.96 Calories 1.54 1.05 1.26 1.01 1.10 0.81 Jeans Class 1.61 1.03 1.28 0.99 1.10 0.92 Pizza 2.14 0.72 1.89 0.77 1.71 0.88 String 1.05 0.85 0.57 0.69 0.49 0.58 Balloons 1.75 1.02 1.09 1.11 0.88 1.03 Flour 1.42 0.91 0.78 0.81 0.67 0.82 Calculator 1.51 1.07 1.40 0.91 0.82 0.77 School Lunch 1.26 0.88 0.67 0.87 0.76 0.84 Pencils 1.88 1.00 1.23 1.03 0.86 0.72 Birdseed 1.16 0.84 0.71 0.73 0.53 0.54 Note. N=237, max. mean score = 4 points. 460 PAGE 476 Appendix O (Continued) Descriptive Statistics for Linguistic Explanations by Question by Reading Ability High Middle Low Question M SD M SD M SD Flowerpots 1.59 1.04 1.32 1.15 1.03 1.07 Ms. Hernandez 1.45 0.96 1.37 0.94 1.07 1.01 Calories 1.38 1.01 1.31 1.00 0.97 0.85 Jeans Class 1.55 0.99 1.25 1.01 1.10 0.92 Pizza 2.15 0.76 1.89 0.73 1.43 0.94 String 0.97 0.83 0.56 0.68 0.43 0.57 Balloons 1.63 1.11 1.06 1.10 0.83 0.91 Flour 1.23 0.89 0.80 0.86 0.63 0.76 Calculator 1.51 1.06 1.34 0.88 0.67 0.71 School Lunch 1.18 0.93 0.69 0.84 0.67 0.84 Pencils 1.77 1.03 1.18 0.96 0.77 0.86 Birdseed 1.16 0.88 0.64 0.63 0.43 0.57 Note. N=237, max. mean score = 4 points. 461 PAGE 477 462 APPENDIX P DESCRIPTIVE STATISTICS FOR SELFEFFICACY BY QUESTION BY GROUP PAGE 478 Descriptive Statistics for SelfEfficacy by Question by Gender Boys Girls Question M SD M SD Flowerpots 3.81 0.43 3.68 0.61 Ms. Hernandez 3.46 0.84 2.95 1.11 Calories 3.55 0.70 3.24 0.80 Jeans Class 3.82 0.46 3.44 0.91 Pizza 2.80 1.23 2.48 1.30 String 1.96 1.32 1.46 1.23 Balloons 3.29 0.83 3.06 0.82 Flour 3.26 0.96 2.61 1.21 Calculator 2.88 1.32 2.43 1.44 School Lunch 3.34 0.94 2.93 1.13 Pencils 2.77 1.34 2.28 1.48 Birdseed 3.02 1.23 2.52 1.26 Note. N=237, max. mean score = 4 points. 463 PAGE 479 Appendix P (Continued) Descriptive Statistics for SelfEfficacy by Question by Mathematics Ability High Middle Low Question M SD M SD M SD Flowerpots 3.89 0.31 3.74 0.52 3.61 0.67 Ms. Hernandez 3.81 0.44 3.21 0.98 2.63 1.15 Calories 3.54 0.71 3.35 0.78 3.41 0.75 Jeans Class 3.91 0.34 3.66 0.69 3.31 0.97 Pizza 3.02 1.14 2.64 1.27 2.29 1.32 String 2.11 1.30 1.67 1.30 1.47 1.24 Balloons 3.35 0.77 3.20 0.81 2.96 0.92 Flour 3.28 1.03 2.95 1.11 2.63 1.20 Calculator 2.91 1.41 2.74 1.27 2.24 1.58 School Lunch 3.56 0.80 3.16 1.03 2.69 1.17 Pencils 3.32 1.15 2.39 1.43 2.08 1.35 Birdseed 3.31 0.98 2.61 1.36 2.67 1.16 Note. N=237, max. mean score = 4 points. 464 PAGE 480 Appendix P (Continued) Descriptive Statistics for SelfEfficacy by Question by Reading Ability High Middle Low Question M SD M SD M SD Flowerpots 3.82 0.42 3.70 0.58 3.77 0.50 Ms. Hernandez 3.66 0.69 3.12 1.04 2.67 1.09 Calories 3.53 0.67 3.38 0.76 3.23 0.90 Jeans Class 3.92 0.40 3.54 0.81 3.43 0.77 Pizza 2.99 1.12 2.57 1.29 2.20 1.32 String 2.10 1.36 1.55 1.26 1.67 1.21 Balloons 3.30 0.79 3.17 0.87 2.97 0.72 Flour 3.22 1.12 2.87 1.13 2.73 1.05 Calculator 2.75 1.45 2.75 1.32 2.13 1.46 School Lunch 3.38 0.98 3.06 1.10 3.00 0.91 Pencils 3.19 1.22 2.31 1.42 2.00 1.39 Birdseed 3.26 1.01 2.54 1.32 2.77 1.28 Note. N=237, max. mean score = 4 points. 465 PAGE 481 466 APPENDIX Q DESCRIPTIVE STATISTICS FOR SELF=EFFICACY MEAN BIAS BY QUESTION BY GROUP PAGE 482 Descriptive Statistics for SelfEfficacy Mean Bias by Question by Gender Boys Girls Question M SD M SD Flowerpots 0.82 0.90 0.49 0.93 Ms. Hernandez 0.74 0.98 0.32 1.21 Calories 0.61 0.97 0.54 1.13 Jeans Class 1.08 0.97 0.82 1.25 Pizza 0.52 1.48 0.14 1.70 String 1.05 1.45 0.58 1.27 Balloons 0.46 1.15 0.64 1.15 Flour 1.08 1.20 0.74 1.16 Calculator 1.07 1.38 0.66 1.52 School Lunch 1.14 1.25 0.69 1.29 Pencils 0.68 1.29 0.67 1.56 Birdseed 1.41 1.59 1.23 1.48 Note. N=237, max. mean score = 4 points. 467 PAGE 483 Appendix Q (Continued) Descriptive Statistics for SelfEfficacy Mean Bias by Question by Mathematics Ability High Middle Low Question M SD M SD M SD Flowerpots 0.54 0.66 0.71 0.93 0.73 1.15 Ms. Hernandez 0.77 0.68 0.49 1.03 0.45 1.57 Calories 0.35 0.92 0.58 1.04 0.82 1.14 Jeans Class 0.75 0.91 0.98 1.05 1.14 1.40 Pizza 0.00 1.36 0.42 1.65 0.55 1.67 String 0.33 1.20 1.01 1.44 0.96 1.33 Balloons 0.04 1.02 0.34 1.21 0.39 1.18 Flour 0.49 1.04 0.96 1.21 1.31 1.19 Calculator 0.60 1.18 0.87 1.45 1.24 1.70 School Lunch 0.72 0.80 0.91 1.32 1.24 1.69 Pencils 0.44 1.09 0.67 1.44 0.96 1.64 Birdseed 0.79 1.36 1.26 1.63 2.10 1.19 Note. N=237, max. mean score = 4 points. 468 PAGE 484 Appendix Q (Continued) Descriptive Statistics for SelfEfficacy Mean Bias by Question by Reading Ability High Middle Low Question M SD M SD M SD Flowerpots 0.63 0.74 0.68 1.04 0.73 0.83 Ms. Hernandez 0.67 0.87 0.50 1.08 0.47 1.63 Calories 0.48 0.91 0.60 1.07 0.73 1.23 Jeans Class 0.86 0.77 0.96 1.19 1.20 1.40 Pizza 0.08 1.45 0.41 1.62 0.70 1.76 String 0.59 1.42 0.92 1.33 1.07 1.51 Balloons 0.11 1.09 0.32 1.20 0.50 1.17 Flour 0.71 1.11 0.91 1.19 1.50 1.25 Calculator 0.51 1.36 1.01 1.42 1.20 1.69 School Lunch 0.60 1.09 0.96 1.22 1.67 1.67 Pencils 0.51 1.24 0.69 1.45 1.03 1.65 Birdseed 0.93 1.31 1.39 1.64 2.03 1.38 Note. N=237, max. mean score = 4 points. 469 PAGE 485 470 APPENDIX R DESCRIPTIVE STATISTICS FOR SELFASSESSMENT BY QUESTION BY GROUP PAGE 486 Descriptive Statistics for SelfAssessment by Question by Gender Boys Girls Question M SD M SD Flowerpots 3.95 0.25 3.78 0.60 Ms. Hernandez 3.58 0.80 3.42 0.94 Calories 3.66 0.80 3.11 1.27 Jeans Class 3.84 0.61 3.51 0.87 Pizza 2.90 1.26 2.90 1.15 String 1.95 1.57 1.41 1.41 Balloons 3.32 0.87 3.26 0.88 Flour 3.22 1.11 2.60 1.38 Calculator 2.78 1.43 2.55 1.51 School Lunch 3.61 0.74 3.14 1.15 Pencils 2.95 1.46 2.25 1.69 Birdseed 3.14 1.27 2.75 1.36 Note. N=237, max. mean score = 4 points. 471 PAGE 487 Appendix R (C ontinued) Descriptive Statistics for SelfAssessment by Question by Mathematics Ability High Middle Low Question M SD M SD M SD Flowerpots 3.96 0.19 3.90 0.39 3.71 0.70 Ms. Hernandez 3.86 0.40 3.54 0.78 3.02 1.19 Calories 3.72 0.67 3.44 1.08 2.98 1.27 Jeans Class 3.93 0.42 3.68 0.74 3.45 0.83 Pizza 3.18 1.05 2.88 1.22 2.65 1.31 String 2.33 1.52 1.49 1.48 1.53 1.45 Balloons 3.44 0.66 3.33 0.90 3.04 0.98 Flour 3.53 0.76 2.95 1.30 2.25 1.34 Calculator 2.84 1.45 2.69 1.48 2.43 1.45 School Lunch 3.68 0.74 3.51 0.92 2.78 1.10 Pencils 3.47 1.20 2.50 1.68 2.04 1.44 Birdseed 3.46 1.05 2.86 1.37 2.63 1.36 Note. N=237, max. mean score = 4 points. 472 PAGE 488 Appendix R (Continued) 473 Descriptive Statistics for SelfAssessment by Question by Reading Ability High Middle Low Question M SD M SD M SD Flowerpots 3.97 0.16 3.83 0.54 3.83 0.46 Ms. Hernandez 3.77 0.61 3.43 0.95 3.23 0.46 Calories 3.56 0.93 3.46 1.02 2.83 1.42 Jeans Class 3.88 0.53 3.63 0.76 3.50 0.82 Pizza 3.23 1.09 2.82 1.19 2.43 1.41 String 2.08 1.59 1.48 1.44 1.77 1.55 Balloons 3.44 0.71 3.25 0.93 3.13 0.97 Flour 3.32 1.18 2.84 1.27 2.50 1.33 Calculator 3.77 1.49 2.65 1.49 2.53 1.33 School Lunch 3.70 0.64 3.28 1.11 3.20 0.85 Pencils 3.33 1.34 2.43 1.64 1.83 1.42 Birdseed 3.40 1.14 2.75 1.39 2.87 1.25 Note. N=237, max. mean score = 4 points. PAGE 489 APPENDIX S DESCRIPTIVE STATISTICS FOR SELFASSESSMENT MEAN BIAS BY QUESTION BY GROUP 474 PAGE 490 Descriptive Statistics for SelfAssessment Mean Bias by Question by Gender Boys Girls Question M SD M SD Flowerpots 0.97 0.84 0.59 0.85 Ms. Hernandez 0.86 0.96 0.79 0.90 Calories 0.72 0.92 0.41 1.37 Jeans Class 1.10 0.95 0.90 1.26 Pizza 0.62 1.35 0.56 1.38 String 1.03 1.41 0.53 1.29 Balloons 0.48 1.14 0.27 1.12 Flour 1.05 1.24 0.73 1.26 Calculator 0.67 1.39 0.78 1.42 School Lunch 1.41 1.04 0.91 1.06 Pencils 0.87 1.06 0.64 1.51 Birdseed 1.53 1.51 1.46 1.42 Note. N=237, max. mean score = 4 points. 475 PAGE 491 Appendix S (Continued) Descriptive Statistics for SelfAssessment Mean Bias by Question by Mathematics Ability High Middle Low Question M SD M SD M SD Flowerpots 0.61 0.73 0.87 0.80 0.82 1.11 Ms. Hernandez 0.82 0.47 0.82 0.93 0.84 1.27 Calories 0.53 0.73 0.67 1.23 0.39 1.31 Jeans Class 0.77 0.93 1.01 1.10 1.27 1.25 Pizza 0.16 1.33 0.66 1.33 0.90 1.39 String 0.54 1.10 0.82 1.44 1.02 1.48 Balloons 0.12 0.95 0.47 1.14 0.47 1.29 Flour 0.74 0.94 0.96 1.31 0.94 1.45 Calculator 0.53 1.15 0.82 1.51 1.43 1.22 School Lunch 0.84 0.68 1.27 1.06 1.33 1.38 Pencils 0.60 0.84 0.78 1.33 0.92 1.56 Birdseed 0.93 1.10 1.53 1.57 2.06 1.35 Note. N=237, max. mean score = 4 points. 476 PAGE 492 Appendix S (Continued) 477 Descriptive Statistics for SelfAssessment Mean Bias by Question by Reading Ability High Middle Low Question M SD M SD M SD Flowerpots 0.78 0.73 0.81 0.94 0.80 0.85 Ms. Hernandez 0.78 0.69 0.81 0.97 1.03 1.22 Calories 0.51 0.85 0.67 1.22 0.33 1.45 Jeans Class 0.82 0.81 1.05 1.15 1.27 1.46 Pizza 0.33 0.81 0.66 1.37 0.93 1.31 String 0.58 1.31 0.84 1.36 1.17 1.53 Balloons 0.25 1.00 0.40 1.17 0.67 1.27 Flour 0.81 1.05 0.87 1.31 1.27 1.48 Calculator 0.52 1.21 0.92 1.47 1.60 1.25 School Lunch 0.92 0.66 1.17 1.13 1.87 1.33 Pencils 0.64 0.95 0.81 1.37 0.87 1.61 Birdseed 1.07 1.22 1.59 1.56 2.13 1.31 Note. N=237, max. mean score = 4 points. PAGE 493 APPENDIX T MULTIPLE REGRESSION SUMMARY TABLES INCLUDING INTERACTIONS 478 PAGE 494 Multiple Regression Summary Tables with Interactions Summary of Multiple Regression Analysis for Mathematics Performance Variable b SE t p Intercept 16.9588 0.0 3.9026 4.35* <.0001 Gender 4.1253 0.2843 5.7378 0.72 0.4729 FCAT Mathematics Score 0.0774 0.4629 0.0132 5.86* <.0001 FCAT Reading Score 0.0527 0.3058 0.0145 3.65* 0.0003 Gender*Mathematics 0.0074 0.1662 0.0214 0.34 0.7305 Gender*Reading 0.0048 0.1160 0.0212 0.23 0.8211 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. 479 PAGE 495 Appendix T (Continued) Multiple Regression Summary Tables with Interactions Summary of Multiple Regression Analysis for Linguistic Explanations Variable b SE t p Intercept 9.0291 0.0 5.2142 1.73 0.0847 Gender 7.7868 0.5312 7.6662 1.02 0.3108 FCAT Mathematics Score 0.0375 0.2220 0.0176 2.13* 0.0345 FCAT Reading Score 0.0296 0.1701 0.0193 1.54 0.1260 Gender*Mathematics 0.0109 0.2430 0.0286 0.38 0.7034 Gender*Reading 0.0366 0.8757 0.0283 1.29 0.1975 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. 480 PAGE 496 Appendix T (Continued) Multiple Regression Summary Tables with Interactions Summary of Multiple Regression Analysis for SelfEfficacy Mean Bias Variable b SE t p Intercept 3.2043 0.0 0.4495 7.13* <.0001 Gender 0.6620 0.5180 0.6609 1.00 0.3176 FCAT Mathematics Score 0.0025 0.1711 0.0015 1.66 0.0988 FCAT Reading Score 0.0043 0.2814 0.0017 2.57* 0.0109 Gender*Mathematics 0.0000 0.0100 0.0025 0.02 0.9873 Gender*Reading 0.0010 0.2691 0.0024 0.40 0.6884 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. 481 PAGE 497 Appendix T (Continued) Multiple Regression Summary Tables with Interactions Summary of Multiple Regression Analysis for SelfAssessment Mean Bias Variable b SE t p Intercept 3.3864 0.0 0.4092 8.28* <.0001 Gender 1.2221 1.0481 0.6016 2.03* 0.0434 FCAT Mathematics Score 0.0027 0.2032 0.0014 1.97* 0.0498 FCAT Reading Score 0.0044 0.3151 0.0015 2.88* 0.0043 Gender*Mathematics 0.0025 0.6855 0.0023 1.09 0.2769 Gender*Reading 0.0004 0.1297 0.0022 0.19 0.8464 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. 482 PAGE 498 ABOUT THE AUTHOR Patricia Hunsader received her bac helors degree in Business from the University of South Florida, Sarasota in 1986. She retired fr om that career to raise a family of four boys and one girl. She returned to school at the University of South Florida to enter the Masters program in Elementary Education, and completed that degree in 2000. She t aught high school mathematics and fifthgrade while pursuing her masters degree, and remained in teaching until she entered the Ph. D. program in 2001. While in the Ph. D. program at t he University of South Florida, Mrs. Hunsader published six articles in state and national publications, gave presentations at several conferen ces and workshops, and taught undergraduate mathematics methods courses for pr eservice elementary teachers. 